Data Intelligence-Ops: Re-thinking Data Engineering
- Arindom Banerjee
- Jul 3, 2024
- 2 min read

The modern enterprise is driven by intelligent processes & apps, built on intelligent data products
Question: Can today’s data curation & engineering adequately facilitate such intelligent products?
Step 1: What are intelligent enterprise applications?
Example-1: Actual case study - Collect multi-modal data from 500 suppliers, evaluate supplier risks, estimate/categorize them + finally use optimization constraints to reduce supplier risks.
Step 2: Complex/Compound-AI: As enterprise apps/processes become more intelligent – they have become more complex + often use compound multi-step inference chains (see The Shift from Models to Compound AI Systems – The Berkeley Artificial Intelligence Research Blog) .
Step 3: Intelligent Data Products: This app-complexity is often realized through the use of different kinds of data products working individually or collaboratively. Such data products can range from simple trusted reporting & tabular/vision- ML to complex-LLM apps & RL.
But why is data engineering for such apps difficult to do? Let’s use 2 examples to validate:
Example-2: Amazon is using RL based data product to optimize inventory, thus reducing inventory holding by 12% without revenue loss. But building RL data products is not for the faint of heart.
Example 1(cont’d): Supplier risk optimization needs multiple steps to resolve including KPI expression calculations, synthetic data generation, multi-modal data manipulations, sophisticated agents to generate optimization paths, etc. This multi-step composition is remarkably hard to build, expensive to maintain, often inaccurate and of course fragile.
Step 4: Problem: Today’s AI applications expect data engineering to enable knowledge & reasoning, as opposed to simple reporting or calculating KPIs. This gap is essentially left unaddressed by today’s data engineering.
Existing data engineering for the most part consolidates, cleans & manages data quite well, but does not change the inherent semantic intelligence of the data. Today, adding multiple steps of data enhancements from statistical transformations to feature matching, labeling/augmentation, & creating RAG retrievers to embedding model validation etc. can be done but usually leads to fragmented artifacts.
Insight: Dumb, semantically inefficient data don't allow firms to leverage the benefits of AI/gen-AI.
Step 5: What is DataIntelligenceOps?
DataIntelligenceOps is an abstract set of operations meant to increase (a) semantic intelligence (b) operational intelligence & (c) governance abilities of data. It builds on top of existing investments in data-lakes, cloud-EDW, dbt automation, ELT, feature-stores etc.
DataIntelligenceOps defines 3 architectural artifacts:
· Intelligence Enhancements: a broad set of components for complex data products, which can be aggregated & configured through a low code IDE.
· Connected DataOps: a fully connected DataOps architecture that “causally” ties together observability, automation, lineage, storage/gov/sec-Ops, programmable pipelines, data contracts – to create an embedding layer for the above intelligence enhancements.
· Data Product Co-Pilot: Agent Flows that tie together interactions across multiple data products, tied together as chains.
Why should customers care? DataIntelligenceOps enables a variety of complex data products to facilitate the intelligent enterprise. It bridges the gap between support for reporting or tabular -ML, and the needs of reasoning and knowledge enhancements that today’s AI promises. Structurally, it enables firms to escape the data technology shackles that hold their business strategies hostage.
Others in this space: Multiple vendors now provide some kind of Dat Intelligence support as piece parts (snorkel ai) or as data platform extensions (Data Bricks). But a programmable, abstracted layer that integrates seamlessly with most existing platform choices is hard to find.
Arindam Banerji



Comments