A new, high visibility project is given to you that simply requires a dataset to be deployed on SQL Server with data that resides on an iSeries. What could be easier?
You have your choice of middleware tools – IBM iSeries Access, Microsoft DB2 .Net provider, or StarQuest StarSQL, to name a few. All you have to do is create a linked server and issue an “INSERT SELECT” statement to populate the destination table.
You got this! Everything looks right, so add a SQL Server Agent job to process the SQL, schedule a nightly “refresh” and you’re done.
All week long the agent job works well and you are ready to share the demonstration, scheduled for the following Monday at 10AM. Alas, at 9AM on Monday, you arrive and check the application to find that the table is only partially populated! Looking into the log you discover that the connection to the iSeries was interrupted by a weekend iSeries IPL that ran right in the middle of your job.
No problem, you’ll just run the job now, clearing the table and restarting the extraction and load– but suddenly rows are not being returned as quickly as before! After some quick consultation you find the daily application processing on the iSeries is already updating rows that your extraction logic has already inserted. The destination dataset is already invalidated before it is completed! What do you say? At 10AM you make an excuse instead of a presentation.
The preceding scenario is repeated time and again, except that it is often worse: The dataset to be extracted is huge and requires days – not hours – to complete. The connections between the source and the destination systems are unreliable.
It turns out that making a copy of a dynamic table data is a different and more involved problem than you realized. Extend this scenario to involve tens, hundreds or thousands of related tables and the issue becomes a career, not a simple exercise in using Linked Servers.
When it comes to data replication, what is the most reliable business solution? In order to maintain a replica of data in real-time, it is first necessary to create the replica. Many extraction applications overcome unreliable connections using a “checkpoint” strategy that permits a restart to occur without repeating the entire data load. Knowing where to restart an extraction process requires additional “bookkeeping” by the application. However, sometimes this is not enough, especially if the previously extracted data has subsequently changed.
The solution lies in change data capture technology – i.e. the ability to go back and “touch up” the destination copy with any changes that have occurred since the extraction process commenced. To know for certain which data has changed is a complex process with many different solutions – but suffice to say, touching up the destination copy as described is exactly what a change-data capture based replication tool provides. Trying to accomplish this with tools without change-data capture is a serious and risky time sink.
Taken together, check pointed extracts and change-data-based touchups should be foundational technology used to create an ad hoc accurate point-in-time replica. But rather than take these separate tasks on, it’s likely that a real-time replication solution could be your answer. Even if you don’t think you need real-time replication to accomplish this task, consider that a purpose built real-time solution includes this valuable change-data capture capability – one that would otherwise be missing from an ad-hoc design. Plus, a real-time solution gives you the added benefit of perfect replicas every time, any time.
If you are willing to use various techniques and tools to create accurate replicas, the job can certainly be done. But if you are curious about having a single solution that decreases the gotcha’s and takes the guesswork out of creating perfect replicas, ask us for an SQDR demo. Or learn more here.
[ photo credit]