jcardena.com Blog When I learned that the boring foundation is the whole game
145 posts
EN ES

When I learned that the boring foundation is the whole game

Software

Building reliable AI systems isn't about the latest agentic model; it's about the boringly reliable foundation that gives it leverage. A look at why data contracts and idempotency matter more than eve

The first system I ever lost a weekend to didn't fail because of a sophisticated algorithm. It failed because a message queue handler, written in a hurry, would occasionally process the same event twice. A simple oversight. Yet under load, this tiny flaw cascaded into a data corruption nightmare that took 48 hours of frantic work to unravel.

That was the weekend I started to understand. The demo is a lie. Not a malicious one, but a seductive one. It shows you the peak of the mountain without mentioning the years of tectonic pressure that formed it.

When I learned that the boring foundation is the whole game
When I learned that the boring foundation is the whole game

The Seduction of the Next Big Thing

Early in my career, the focus was always on the visible parts of the architecture. The novel algorithm, the component that made everyone in the review meeting nod. We celebrated the feature that shipped, not the system that supported it. The constant pushback against foundational work was, and still is, logical on the surface: we don't have time. Time to market is everything, and this work is a "tax" on feature velocity.

The problem is that it isn't a tax. It's a high-interest loan. Each shortcut—each loosely-defined interface, each stateful component that should have been stateless—becomes a landmine. You can walk through the field a dozen times without an issue, but eventually the conditions are right, and the whole thing blows up.

When I learned that the boring foundation is the whole game
When I learned that the boring foundation is the whole game
New FeatureRequestShiny Demo PathAccumulateTechnical DebtSystem Failure at3am
The Two Paths of System Development

This debt accumulates at a geometric rate, paid back with interest during late-night outages, customer data corrections, and eventually, the full-system rewrite that no one wanted.

What ‘Boring’ Actually Means

When I talk about the "boring" foundation, I'm not advocating for old technology. It’s a philosophy of engineering focused on durability over novelty, an obsession with the things that, when done right, become completely invisible. It's about building systems that acknowledge the messy reality described in papers like Pat Helland's classic, Life Beyond Distributed Transactions, where partial failures are a given.

In today's converged software, data, and AI stacks, this boring work looks like:

  • Aggressively Enforced Data Contracts. Not just a schema, but a binding agreement on semantics, ownership, and lifecycle. This is the central idea behind the modern data quality movement, championed by practitioners like Chad Sanderson in pieces such as What is a Data Contract?. It is the only way to prevent the "garbage in, garbage out" cycle that cripples both analytics and AI.
  • Idempotency By Default. Every operation that changes state should be designed so that running it once has the same effect as running it ten times. This isn't gold-plating; it's fundamental to building resilient, self-healing systems. It’s why premier API providers like Stripe build their entire platform around concepts like idempotent requests. It turns a transient network error from a five-alarm fire into a non-event.
  • Pragmatic Observability. Defining the critical signals of system health upfront. What are the three metrics that tell you this pipeline is working? What context does a log line need to be useful during an outage? Building this in from the start is craftsmanship; adding it later is archaeology.

The Foundation as an Amplifier for AI

There's a myth that the magic of AI absolves us from this rigorous engineering. My experience shows the exact opposite is true. Agentic systems are powerful amplifiers, and they amplify the quality of their foundation, for better or worse.

A small data error in a deterministic pipeline might corrupt a dashboard metric. That's bad. The same error fed to an autonomous agent could cause it to take a catastrophic, un-auditable, and very real action in the physical or financial world. The agent needs a set of reliable, predictable tools and a high-fidelity view of the world to act upon. That world is built by our deterministic systems.

When an agent needs to check inventory, it must call an idempotent API that returns data conforming to a strict contract. The boring foundation doesn't constrain the agent; it gives it leverage. It is the solid ground from which the rocket can launch.

SOURCES & INGESTIONStreaming EventsApplication DBsThird-Party APIsData LakeFOUNDATION: THE DETERMINISTIC COREData ContractEnforcementIdempotentProcessing…Reliable StateStorageObservabilityHooksCOOPERATING LAYERSDeterministic APIs& ToolsLLM AgentOrchestratorReasoning &Planning LoopOUTPUTS & ACTIONSUser-Facing UIsAutomated ActionsAnalyticalDashboardsAlerting
Architecture for Grounded AI Systems

Takeaways: Mandate the Invisible Work

The most impactful engineering is often the least visible. It’s building the platform that makes future innovation cheaper and safer. It's sweating the details on the parts no one sees, so the parts everyone sees can just work. To make this real, you have to move past philosophy and into policy.

  • Institute a Foundation Tax. Every project must dedicate a non-negotiable portion of its engineering time—say, 20%—to the underlying platform. This includes refactoring, improving tests, and hardening dependencies. It’s not overhead; it’s the cost of doing business sustainably.
  • Reward the Plumbers. In your teams, celebrate the engineer who reinforces a messy data pipeline or establishes a clear data contract with the same energy as the one who ships a flashy UI. Their work prevents future outages and has a far longer-lasting impact.
  • Ask the Failure Questions. When evaluating any new technology, look past the demo. Ask how it fails. Ask how you observe it. Ask what happens during a network partition. The answers to these boring questions are more important than any feature list.

The goal is to build systems that are resilient, comprehensible, and ultimately, boringly reliable. Because the most exciting thing a system can do in the middle of the night is nothing at all.

JC
Juan Cardena
Enterprise Architect, Data & AI

Enterprise architect with 25 years across web, software, data, and AI. MIT CDAO ’25. Writing on agentic AI in production.