The past decade has made two things clear: Data is the backbone of modern business, and data is only getting bigger.
The numbers are staggering. In 2010, the IDC estimated that 1.2 zettabytes (1.2 trillion gigabytes) of data were created – a notable increase from the year prior. Now, the IDC estimates that in 2025, about 175 zettabytes (175 trillion gigabytes) of new data will be created around the world.
This is game-changing.
It’s an enormous wealth of data, and it increasingly powers everything – from long-term decision-making within national governments and Fortune 100 companies to the details of the day-to-day workflows at mid-sized firms. But too often data can be overwhelming or simply poorly managed. It can be unusable or disconnected, even in single organizations, where it can flow in from a range of disparate sources and lack consistency.
In order for data to be useful, it needs to be properly compiled and configured. That’s where data ingestion comes in.
At StarQuest, we help organizations with data ingestion and replication via our SQDR software and StarSQL (used for DB2 access). In this article, we’ll unpack the meaning of data ingestion, so that you can have a better understanding of how the process might help in your business context.
Let’s start with an answer to our core question: What is data ingestion? Here’s our quick take:
Data ingestion is the process of absorbing and configuring multiple data sources into a single place of access for business use.
But that’s only the beginning. To give you a full sense of data ingestion, let’s break it down further.
We’ll look at what it means, why it matters, and how to do it correctly.
The term “data ingestion” is more obvious when it’s broken into its component parts. Data is “information in digital form that can be transmitted or processed.” Ingestion is “the process of absorbing information.” So, data ingestion is the process of absorbing digital information that can be transmitted or processed.
That’s the essence of the term, but, in a tactical business sense, it involves a bit more. To understand the full picture, it’s helpful to review a range of definitions.
In our opinion (as stated above), these descriptions can be synthesized into the following definition:
Data ingestion is the process of absorbing and configuring multiple data sources into a single place of access for business use.
With data ingestion defined, let’s take a look at why it matters. At a foundational level, this is pretty simple: data ingestion matters because it makes data actionable.
Ingestion is used to synthesize multiple data sources into a single place of access. It’s meant to eradicate data silos. As the business definitions of the term make clear, data ingestion, in practicality, is more than absorption – it’s absorption plus configuration for storage or immediate use.
For instance, a business might need to report on customer data. But different customer data might be collected or stored in different systems. Maybe purchase data is stored in one database, while service request data is stored in another. In order to get a full picture of customer activity, a centralized database is needed, with data from all sources factored in. To set this up would require data ingestion.
That’s just one use case. Data ingestion is helpful in any scenario where multiple streams of data need to be synthesized into a single source of truth.
With the value of data ingestion defined, let’s review a few mistakes to avoid so that you can get the most from the process.
Don’t ingest what isn’t necessary.
One costly mistake is to ingest unneeded data (often by simply including all of the objects in a database). The reality is that not every table, index, or constraint defined in an operational data store (ODS) is necessary in a data warehouse, and you’ll avoid unnecessary expenditures by focusing on what matters – not on what doesn’t.
Don’t forget to account for latency.
There are physical constraints involved in data management. Individual message exchanges expose applications to inherent latency delays due to the time required for signal propagation – often as much as 30ms per exchange, which leads to a theoretical maximum of around 120K transactions per hour (TPS). Solutions to this include block data transfers or parallel processing.
Don’t forget to account for network disruptions.
It’s dangerous to assume that your solution will maintain a continuous connection; if a disconnection happens, the ingestion will be compromised, and network disruptions tend to be the rule, not the exception. It’s critical to include a checkpoint and restart capability in your data ingestion solution so that, if the network is disrupted, your process will be protected.
Hopefully, the information above has helped you to clarify the definition of data ingestion, understand why it matters, and avoid common pitfalls. If you’re looking for data ingestion services to put data to use in your business context, let’s talk.
At StarQuest, we’re experts at data ingestion. Our powerful SQDR software can be utilized for replication and ingestion from an extensive range of data sources. And, importantly, our customer service team is regarded as some of the best in the business, with clients calling us “The best vendor support I have ever encountered.”
If you’re looking for data ingestion for migration, data warehousing, application development, auditing, disaster recovery, or another use case – we can help.
Get in touch with us to discuss your data ingestion needs. We can set you up with a no-charge trial of our software using the DBMS of your choice, and help you take the first step toward a solution that will benefit your business.