When a single wrong number cost a meeting its trust
Data
A real-world story of a silent data pipeline failure and how it underscores the need for verifiable data contracts and assertions in the age of LLM agents.
The number on the screen showed explosive, double-digit growth. It was the result of weeks of analysis, the core justification for the project we were there to greenlight. Then the VP in the corner paused. “That feels high,” she said. In that instant, the trust my team had spent months building vanished. The conversation was no longer about strategy; it was an impromptu audit of a single, dubious number.
Her intuition was right. We later found the real figure was a respectable but modest single-digit gain. The number wasn't just for a slide deck; it was from a pipeline intended to feed a new generation of automated, agentic systems. If a human with domain expertise could spot the error with a gut check, what would an LLM do with it? It would have ingested that lie with perfect confidence and built a tower of bad decisions upon it.

The Anatomy of a Silent Failure
On the surface, the system was a standard deterministic data pipeline. It ingested events, applied a series of transformations, and wrote to an aggregate table. It was designed for reliability, the kind of boring, predictable architecture that has been the bedrock of data engineering for years. When this kind of system breaks, you expect it to fail loudly—a crashed job, a monitoring alert. The problem was, it didn't break. It just started lying.
After a frantic deep dive, we found the culprit: a subtle caching error. A previous engineer, chasing performance, had implemented an intermediate caching layer. A recent configuration change caused the cache invalidation for one specific data partition to fail silently. The job reported success, but it was joining fresh events against week-old user attributes, inflating the final number. The system had no awareness of its own flawed state.

A Lying System Is Worse Than a Broken One
That meeting was a wash, but the real cost was the erosion of credibility. For weeks, every number we produced was met with skepticism. This experience cemented a core belief: a system that silently produces plausible but wrong data is infinitely more dangerous than a system that is visibly down.
A crashed pipeline is an engineering problem. A lying pipeline is a trust crisis. This is the central challenge as we compose deterministic automation with agentic systems. An LLM can't have a "gut feeling" that a number is wrong. It takes the data it is given as ground truth. If the foundation is faulty, the entire sophisticated structure of agents, RAG systems, and AI-driven features will fail in ways that are subtle, confident, and catastrophic.
Building a Verifiable Foundation
Fixing the bug was easy; rebuilding trust required a shift in architecture. We moved from a model of "assume it's correct unless it fails" to "prove it's correct at every step." This meant finally adopting well-known patterns for data quality that the industry has been developing for years.
First, we embraced the discipline of data contracts. As practitioners like Chad Sanderson have championed, this involves treating data handoffs like API endpoints, with explicit schemas, freshness guarantees, and semantic meaning. A downstream job now programmatically refuses to run if its source data hasn't updated on time or if the schema has drifted.
Second, we embedded automated data assertions directly into our pipelines using tools conceptually similar to the open-source library Great Expectations. These are simple, automated checks: do row counts fall within an expected range? Does the sum of the parts match the source total? If an assertion fails, the pipeline halts and sends a high-priority alert. It fails loudly, at the source, before a lie can ever reach a user or an agent.
An Architecture for Trust
This approach isn't free; it adds engineering overhead. But the cost of implementing these guardrails is trivial compared to the cost of a single bad decision made at scale by an automated agent. You are not just building a pipeline; you are building the sensory input for a new class of decision-makers.
This architecture treats data integrity as a first-class feature, not an afterthought. It contrasts sharply with relying on ML-based anomaly detection, which can drown you in false positives, or manual QA, which doesn't scale. Instead, it relies on simple, deterministic rules that enforce the system's own promises about itself.
The goal is to create a verifiable data supply chain where both humans and AI agents can consume information with a high degree of confidence. The output is no longer just a table of numbers, but a table of numbers with a provable lineage of quality checks attached.
Takeaways for Building Hybrid Systems
That painful meeting is now a permanent reminder of what's at stake when software, data, and AI converge. These are the principles I now build by:
- Trust is the API contract for your agents. The most important output of your data system isn't a number; it is a verifiable guarantee that the number is trustworthy.
- Architect for loud failures. A dashboard showing an error is a success. A dashboard showing a plausible but wrong number is a time bomb, especially when an LLM is the user.
- Automate verification at the source. Data quality checks aren't a separate process; they are non-negotiable stages in the pipeline itself. Quarantine bad data before it can propagate.
- Data lineage is your debugging tool for reality. When a number is questioned, the ability to trace its journey from source to serving layer via a standard like OpenLineage is how you rebuild trust in minutes, not weeks.
- Your users' intuition is a critical signal. When an experienced person says something feels wrong, listen. They are your most valuable, and often un-instrumented, anomaly detector.