Real-time vs batch: choosing latency with intention

The first time I learned the real cost of latency, it wasn't a recommendation engine. It was an inventory system for a flash sale. We built a real-time pipeline to decrement stock counts the instant an item was added to a cart. The demo was flawless. Production was a nightmare of race conditions and distributed locks. We spent a month chasing bugs while the simpler, hourly-batch system we replaced would have been 99% as effective with 1% of the operational pain.

That lesson has become more critical than ever. The rise of LLM agents creates a powerful new demand for low-latency data to ground their responses. But this pressure can tempt us to infect our entire architecture with the complexity of real-time processing. The old choice between now and later is no longer just a data problem—it's the central tension in building durable AI systems.

The Latency Choice Framework

The Enduring Power of Batch

Batch processing is the architecture of stillness. It is simple, robust, and profoundly cost-effective. A script that wakes up at 2am, processes a day's worth of files, and writes the result to a database is one of the most resilient patterns we have. Its components aren't running most of the time. Failure is usually isolated and rerunning the job is a straightforward fix.

This simplicity has a direct impact on the bottom line. In my experience, the monthly cloud bill for a continuously-running stream processor can easily be ten times that of a simple batch job that runs for thirty minutes a day on the same data volume. You pay for simplicity and reliability, not for idle infrastructure waiting for the next event.

For financial reports, analytics rollups, or training an ML model, the value of the data does not decay in seconds. Stillness is the correct default.

The Allure of the Infinite Stream

Of course, some problems demand immediacy. The counter-argument is best articulated in the stream-first philosophy that gained traction over the last decade. Thinkers like Jay Kreps, in his foundational essay "The Log," made a powerful case for treating the event stream as the central source of truth.

This philosophy gave rise to patterns like the Kappa Architecture, which aims to handle all processing—both real-time and batch—within a single streaming paradigm. It’s an intellectually elegant idea. In practice, I've seen it impose a heavy operational tax. Building a system that must correctly handle event ordering, stateful aggregations, and exactly-once processing across a distributed environment is a massive undertaking. You are choosing to solve the hardest problems in distributed systems, whether your business case truly requires it or not.

A Litmus Test for Latency

Instead of defaulting to the more complex solution, I force the justification. The burden of proof is always on real-time. I ask three questions:

What is the decision horizon? If a human needs to see a dashboard to make a decision next week, you have a batch use case. If an automated system must block a fraudulent transaction in milliseconds, you have a real-time one.
What is the value decay curve? Try to quantify it. Does this data lose half its value in five seconds, or five percent of its value in a day? If you can't articulate a steep, immediate drop, you're looking at batch.
What is the blast radius of failure? If the batch job fails, can you rerun it overnight with minimal impact? If the real-time pipeline stops, does the business grind to a halt? The higher the uptime requirement, the greater the investment in operational complexity.

These questions help anchor a technical choice in business reality, steering away from complexity for its own sake.

Composing the Hybrid AI Stack

The most durable architectures I see today are not purely real-time or purely batch. They are intentional hybrids, especially in systems involving AI agents. The key is to draw a hard line between the agent's interactive needs and the system's data-processing backbone.

Consider an AI agent for a customer support team. Retrieving a customer's recent order history to ground an LLM's response must happen in real-time. That is a fast, targeted API call to a low-latency database. However, the pipeline that analyzes thousands of support tickets to find emerging product issues is a classic deterministic batch job. It can run overnight, generating a clean report that can be used to fine-tune the agent's knowledge base.

Hybrid Architecture for an AI System

Your Default Should Be Stillness

The modern stack is a composition. We use robust, simple batch jobs for the heavy lifting: data preparation, analytics, and model training. We reserve the complexity of real-time streaming for the thin, interactive layer where it directly serves a user or an agentic workflow.

The discipline is in protecting that separation. Don't let the low-latency demands of the agent's brain stem infect the entire data backbone. Respect the power of a simple script, a schedule, and the quiet reliability of work done while everyone else is asleep. Start with batch. Start with stillness. Introduce motion only when its value is undeniable and worth the cost.