Stelo Blog

High-Performance Support for Non-SQL Destinations | Stelo

Written by Jessica Sheridan | Apr 11, 2023 12:53:00 PM

Stelo recently released Stelo Data Replicator V6.1. Like V5, V6.1 offers robust, real-time data replication but with added features to support evolving data infrastructure. Over our 30-year history, we’ve developed best practices for moving data that still guide us today.

Over a three-part blog series, we’ll break down important new features in V6.1. Here, we’ll cover how we’re providing high-performance support for non-SQL destinations.

In short, we can support non-SQL destinations like Azure Data Lake Storage Gen2 (ADLS Gen2), AWS and Confluent Connectors and achieve an order of magnitude improvement in message delivery time by:

  • Switching from a language-oriented interface (i.e., ODBC) to a message-oriented mechanism (i.e., Kafka) or native interface (i.e., DataBricks or DataFrames)
  • Implementing extensible Stelo Data Sink Connectors
  • Changing the underlying transport methodology

Each of these changes represent a reaction to shifts in the way data is used and managed today. So, how did we get here?

The way we think about language- and message-oriented interfaces has evolved.

In V5, the language-oriented interface was strictly based on open database connectivity (OBDC). OBDC is a high-level application programming interface (API) for translating requests that leverage SQL to issue queries for data retrieval. Developed in the 1980s, ODBC is a well-established programming interface that accommodates different drivers for different databases while preserving a common programming interface. Previously, it was customary for each database vendor to have their own interface.

The emergence of web browsers and servers, while not databases, represented a new kind of client and server interface. Growing knowledge around HTML and TCP IP gave web browser and web server protocol a reputation for simplicity. JavaScript Object Notation (JSON) continued the trend with its human-readable format for web servers that communicate with databases. While comprehensive, this format was inefficient. What should take one hundred bytes to exchange from a database might take one thousand bytes with JSON. The benefit of moving to these new protocols was the relaxation of client and server interface requirements; the new paradigm was loosely coupled producers and consumers.

At the time, ODBC was a great solution and an ideal mechanism for change data capture applications to place data in a database for retrieval by another application, which would consume it. This allowed a variety of different applications to be built without specialized knowledge of the source of change data. Now, we are using the new protocols to eliminate the need for an intermediary database to host the data. The goal is to achieve greater independence between the two processes.

Moving forward, we started to see more and more programmers designing interfaces for modern databases with formats optimized for web servers, like JSON. While this created unnecessary work at times, something good came out of these efforts. We started to see a low coupling coefficient between data sources. In other words, companies wanted to make sure their information could be moved from place to place without concern for source or destination. As mentioned above, “self-defining” was the new ideal.

Today, we recognize that language-oriented interfaces are challenging to use and the information they deliver can be hard to use too. As of V6.1, Stelo Data Replicator no longer relies on an ODBC interface; however, we recognize when its capabilities are contextualized, ODBC can still offer efficiencies in modern data management.

There’s a growing need for data sink connectors.

Over time, it’s become more popular to move data into delta lakes, where messages are self-defining and can be encoded with JSON. To accommodate, Stelo stopped using ODBC in V6.1. Instead, we’re using Stelo Data Sink Connectors. These custom connectors extend Stelo Data Replicator by allowing it to send data through a simple application, much like a web server, that reads the data and moves it to a repository. By design, data experiences very little transformation between the source and the delta lake.

In essence, Stelo Data Sink Connectors are lightweight software components that understand how to obtain change data and original data, move it across a communication link and transform it into the native interface. Each custom connector features a code fragment, written in Scala, that requires no modification on the customer side, so it’s still destination agnostic. Expanding on the sink metaphor, Stelo Data Sink Connectors act as a hose between Stelo Data Replicator and the data sink, which drains into a delta lake.

There’s a better understanding of how certain underlying transport methodologies can cause inefficiencies.

Through trial and error, it became clear that while JSON was ideal for how the data is encoded, it was an inefficient format for transport. Through our custom connectors, we can leverage Java database connectivity (JDBC) and deliver highly efficient data movement by contextualizing the capabilities of JDBC. Efficiency translates to more communication bandwidth for data messaging.

With decades of expertise in data movement, Stelo uses best practice to help customers move change data to/from a wide variety of sources and destinations. In V6.1, we re-engineered how data is communicated and presented to achieve a level of performance that’s not available off-the-shelf.

Contact us to request a V6.1 demo. For more information on V6.1 features, check out Part 2 of our Unboxing Stelo V6.1 series.