March 13, 2026 · 5 min read · mlai.qa Team

Data Pipeline Architecture for Real-Time ML

Architecture patterns for building real-time ML data pipelines — streaming vs batch, feature store design, and the tools that work at production scale.

Data Pipeline Architecture for Real-Time ML

Real-time ML is harder than batch ML for one fundamental reason: the data pipeline is in the critical path. In batch ML, a pipeline failure delays model updates until the next run. In real-time ML, a pipeline failure means your model is making predictions with stale or missing features — or not making predictions at all.

The data pipeline architecture for real-time ML requires explicit design decisions that batch pipelines don’t face: latency budgets, streaming vs. batch tradeoffs, online feature store design, and the training-serving consistency problem. Getting these decisions right at the start is significantly cheaper than retrofitting them into a production system.

Streaming vs. Batch: The Real Decision

The streaming vs. batch decision is frequently treated as a philosophical choice (“we want to be a real-time company”) rather than an engineering decision grounded in requirements. The right question is: what is the maximum acceptable staleness of your features at serving time?

Batch pipelines compute features on a schedule — hourly, daily, or weekly. They’re simpler to build, cheaper to operate, and easier to debug. If your features can tolerate an hour of staleness without degrading model quality, a batch pipeline is almost certainly the right choice.

Streaming pipelines compute features continuously from an event stream. They’re more complex, more expensive, and harder to debug. The architecture requires: an event streaming platform (Kafka, Kinesis, Pub/Sub), a stream processing engine (Flink, Spark Streaming, or a simpler option like Kafka Streams), and an online feature store that serves the computed features at low latency.

The use cases that genuinely require streaming are narrower than most teams assume: fraud detection (where an hour-old feature means missing a transaction), recommendation systems where recency is a primary signal, and systems where the event being processed contains the most important information (the transaction itself, the search query itself).

For everything else, batch pipelines are the right default. The operational simplicity difference is significant.

Feature Store Design for Real-Time Serving

The feature store is the component that bridges data pipeline computation and model serving. It has two components with fundamentally different requirements:

Offline feature store: Stores historical feature values for model training. Optimised for large-scale reads, time-travel queries (retrieving the feature value that was current at a specific point in time), and batch export. Typically built on columnar storage (Delta Lake, Iceberg, Hudi, BigQuery, Redshift).

Online feature store: Stores current feature values for low-latency serving. Optimised for single-entity lookups with sub-10ms response time. Typically built on key-value stores (Redis, DynamoDB, Bigtable, Cassandra).

The key design principle for both: point-in-time correctness. Training data should use the feature values that were available at the time of the training example — not features computed with information that wasn’t available yet. Violating this creates look-ahead bias: models that appear to perform well in training but underperform in production because they were trained with information that wouldn’t exist at serving time.

Most off-the-shelf feature stores handle point-in-time correctness for the offline store. The online store is simpler — it always serves current values — but must be populated from the same pipeline that generates training data to ensure training-serving consistency.

Apache Kafka is the event streaming backbone. It’s the right choice for durable, high-throughput event streaming — decoupling event producers from event consumers, providing replay capability for reprocessing historical events, and serving as the integration point between your operational systems and your ML pipeline. For most real-time ML use cases, Kafka is the event transport layer.

Apache Flink is the right streaming processing engine for complex, stateful streaming computations — windowed aggregations across event streams, joining multiple event streams, and streaming ML inference at scale. Flink’s state management and exactly-once processing semantics make it the right choice when streaming computation correctness matters. The operational complexity is high; managed Flink (Confluent, AWS Kinesis Data Analytics for Apache Flink) reduces this significantly.

Apache Spark Streaming (or Spark Structured Streaming) is appropriate when your team has existing Spark expertise, your streaming requirements don’t need Flink’s sub-second latency, and you want unified batch and streaming processing in the same framework. Spark Streaming has higher latency than Flink for micro-batching workloads, but for use cases tolerating second-level latency, the team productivity advantage of a unified Spark stack is significant.

For most real-time ML pipelines at Series A/B scale, Kafka + a simpler stream processor (Kafka Streams, or Spark Streaming) + Redis for online feature serving is the right architecture. Flink is justified when you have complex stateful streaming requirements or need genuine sub-second latency on high-volume streams.

The Training-Serving Skew Problem

Training-serving skew is the condition where features computed at training time are different from features computed at serving time. It’s the most common cause of models that perform well in evaluation and underperform in production — and in real-time ML systems, it’s particularly common because the training pipeline and serving pipeline are often implemented separately.

The failure mode: a data scientist writes feature computation logic in a Jupyter notebook for training. A software engineer reimplements that logic in the serving path for production. The reimplementation has subtle differences — different handling of nulls, different rounding behaviour, different timezone assumptions — that create systematic divergence between training and production feature distributions.

The solution is single-source-of-truth feature computation: a feature computation library that is used identically in training and serving, with tests that validate feature parity between the two environments. Feature stores that provide a unified SDK for offline (training) and online (serving) feature retrieval solve this architecturally — the same code path generates features in both contexts.

Building a real-time ML data pipeline correctly from the start saves months of debugging, monitoring, and architectural refactoring as your system scales.

Talk to us about your data pipeline architecture before production scale reveals the design gaps.

Build ML that scales.

Book a free 30-minute ML architecture scope call with our experts. We review your stack and tell you exactly what to fix before it breaks at scale.

Talk to an Expert