jcardena.com Blog The pipeline I rebuilt three times before it was right
145 posts
EN ES

The pipeline I rebuilt three times before it was right

Data

A personal story of rebuilding a data pipeline three times, from a brittle monolith to an overly complex microservices architecture, to a pragmatic hybrid.

The alert went off at 2:17 AM. A sea of red on a dashboard confirmed it: the entire event processing pipeline was stalled. A minor schema change in a third-party API—something that should have been a non-event—had cascaded into total system failure. It wasn't the first time, but as I sat there in the dark, I knew it had to be the last. The architecture I had so carefully designed was fundamentally wrong.

Ingest andValidateReliably EnrichTransform and Load
The Three-Service Hybrid Flow

Attempt One: The Majestic Monolith Cracks

My first instinct was to build a single, cohesive application. It was an approach I valued, a philosophy that David Heinemeier Hansson champions as the "Majestic Monolith". In a single process, the pipeline would ingest, validate, enrich against three external APIs, transform, and load into our warehouse. Simple to deploy, simple to reason about—at first.

The pipeline I rebuilt three times before it was right
The pipeline I rebuilt three times before it was right

The problem was that the steps weren't equally reliable. The enrichment step was a source of constant chaos. Any one of the external APIs could time out, return bad data, or change its contract without warning. In the monolith, a failure in that single volatile step halted everything. The blast radius was the entire system, and debugging meant rerunning the whole expensive batch just to isolate one faulty API call.

Attempt Two: Paying the Microservice Premium

My reaction was a classic over-correction. I broke every logical step into its own microservice, each communicating over a message queue. An ingestor, a validator, three separate enrichment services, a transformer, a loader. On the whiteboard, it was a model of decoupling. In reality, we had traded a monolith for a distributed spaghetti monster.

The pipeline I rebuilt three times before it was right
The pipeline I rebuilt three times before it was right

We ran headfirst into what Martin Fowler calls the "Microservice Premium." We were paying the immense operational cost—managing a dozen deployments, containers, and queues—without the organizational scale or maturity to justify it. The true nightmare was observability. When a single event vanished, tracing it meant stitching together logs across half a dozen Kubernetes pods and Kafka topics. We spent more time wrestling with our service mesh than improving the business logic. The cure was worse than the disease.

Attempt Three: Finding the Fault Lines

The third rebuild was born not of architectural dogma, but of scar tissue. I stopped thinking about logical code units and started looking for the system's real-world fault lines—the boundaries defined by volatility, scaling needs, and transactional integrity.

  • Ingest & Validate: These were fast, stateless, and tightly coupled. A failure here meant bad input. They became one service.
  • Enrichment: This was the slow, unreliable, I/O-bound part. I grouped the three chaotic API calls into a single, hardened service designed for failure, with its own retries, circuit breakers, and dead-letter queue. It could be scaled independently to handle API latency spikes.
  • Transform & Load: This was CPU-bound, deterministic, and required transactional integrity. It became the final service.

We ended up with three services connected by a queue. Not one, not twelve. Three. It was simple enough to reason about, yet decoupled enough to be resilient. It was maintainable. It worked.

The Real Lesson: Isolate the Chaos

That pipeline taught me a principle that has become central to how I design systems today, especially as AI enters the stack. The most durable architecture isn't about choosing monoliths or microservices; it's about drawing a hard, defensive boundary between your deterministic, reliable code and the chaotic, unpredictable components.

SOURCESPartner EventStreamsInternal DatabasesSTATELESS INGESTIONIngestion ServiceMessage QueueVOLATILE PROCESSINGEnrichment ServiceCircuit BreakersDead-Letter QueueDETERMINISTIC PROCESSINGWarehouse LoaderTransactionalStateSERVINGData WarehouseAnalytics APIsDashboards
Pragmatic Hybrid Architecture

In this pipeline, the "chaos agent" was the set of external APIs. In modern systems, it’s often an LLM—a non-deterministic component that can fail in subtle ways, hallucinate, or produce outputs that break downstream logic. The pattern that finally brought stability to my old pipeline is the same one we need now: build a fortress around your core, deterministic logic, and treat any agentic or external system as an untrusted, volatile dependency that must be managed in a separate, resilient container.

Here’s what I actually carry with me from that 2 AM failure:

  • Your first build is for discovery. Its main purpose is to reveal the true fault lines of your problem. The cost of throwing it away is the tuition you pay for understanding the system's actual behavior.
  • Boundaries are operational, not just logical. Group code that fails together, scales together, and transacts together. Don't split services based on a diagram; split them based on what breaks at 3 AM.
  • Isolate non-determinism. Whether it's a flaky third-party API or a powerful LLM agent, the architecture must treat it as a source of chaos. Contain it, manage its failures gracefully, and never let its unpredictability poison your stable, deterministic core.
JC
Juan Cardena
Enterprise Architect, Data & AI

Enterprise architect with 25 years across web, software, data, and AI. MIT CDAO ’25. Writing on agentic AI in production.