Learning to say 'I don't know yet' to a number I couldn't defend

The question landed like a stone. A senior leader cut through the presentation, pointed at me, and said, "Just give me the number. How many daily active users are on the new platform?"

I could feel the pressure to be helpful, to run a query in under a minute and produce a large, impressive-looking integer. It would satisfy the room and move the meeting along. It would also be a lie of omission, built on a mountain of unstated caveats. In that moment, the choice wasn't between knowing and not knowing; it was between being cooperative and being honest.

The Illusion of the On-the-Spot-Number

In many cultures, especially in fast-moving startups, there's immense pressure for a "good enough" number to maintain velocity. An estimate feels harmless, a way to establish a baseline and avoid slowing down. But these numbers are never harmless. A figure given in a meeting, no matter how many caveats I whisper, gets stripped of its context. It lands in a slide deck, becomes gospel, and morphs into the immutable fact upon which next quarter's goals are built.

From Ad-Hoc Answer to Durable Asset

I could have run a quick SELECT COUNT(DISTINCT user_id) FROM event_stream and called it a day. But that simple count ignores crucial details. It doesn't filter bot traffic by inspecting user_agent strings, exclude internal test accounts, or correctly de-duplicate sessions that move from anonymous to authenticated states. The plausible number is a dangerous fiction. The real engineering work isn't running the query; it's building the system that can answer the question repeatably and truthfully.

A Defensible Metric Is a Product

A reliable metric isn't found; it's manufactured. It is the end product of a data pipeline, a piece of software in its own right. Getting a number you can stand behind means treating it with the same rigor we apply to production services. This idea has been formalized in modern data thinking, most notably in Zhamak Dehghani's "Data as a Product" principle from her work on Data Mesh, which you can read about in a foundational post on Martin Fowler's blog.

When you treat a metric as a product, it has an owner, a version, and a clear, explicit "Data Contract" that defines its schema, semantics, and quality guarantees. It stops being a vague concept like "active user" and becomes a concrete asset produced by a deterministic, observable system. This system is responsible for cleaning, filtering, sessionizing, and applying the specific business logic that constitutes the definition. Without this, you don't have a metric; you have a series of one-off, contradictory snapshots.

The Hard 'No' That Builds Trust

Back in that meeting, I took a breath. "I can't give you a number I can defend right now," I said. "But I can tell you exactly what it will take to get one."

The air in the room changed, but I didn't leave a vacuum. "A raw query will give us a big, misleading number. To get a real daily active user count, we need to build a small, automated pipeline to clean and aggregate the data. It will take a few days. When it's done, we'll have an auditable, repeatable number we can all trust."

This reframes the conversation from a personal failing ("Juan doesn't know") to an engineering requirement ("We need the right tool"). Over time, this builds a different kind of reputation. Stakeholders learn that when your team provides a number, it's real. They start asking better questions—not just "what's the number?" but "what's the definition behind it?" You slowly create a culture of intellectual honesty.

The Bedrock for Agentic Systems

This discipline is more critical now than ever. The industry is moving from deterministic automation to agentic work, where LLM-powered systems take on complex, multi-step tasks. The new, harder questions aren't about user counts; they're about agent performance. "What's the success rate of our new customer support agent?" or "What's the cost-per-resolved-task?"

If our data culture can't even produce a defensible count of daily active users, we have zero chance of reliably measuring the fuzzy, non-deterministic outputs of an AI agent. The same rigor applies, but the stakes are higher. Defining "task success" requires an even more robust deterministic pipeline to interpret agent logs, user feedback, and downstream impacts. The boring, meticulous work of building defensible metrics is the absolute bedrock for the exciting future of agentic AI. You cannot build the second floor of a house on a foundation of sand.

Architecture for Defensible Metrics and Agent Evaluation

My takeaways from years of navigating these conversations are simple engineering principles:

Turn Asks into Requirements. A request for a number is a feature request for a metric. Document it, define it, and build it properly.
Treat Metrics as Code. The logic that generates a number must live in version control, be tested, and have a clear owner.
Build the Faucet, Not the Bucket. Instead of one-off queries to fetch a bucket of data, build the automated pipeline—the faucet—that delivers it reliably.

Learning to say "I don't know yet" was the moment I shifted from seeing my job as providing answers to seeing it as building systems that produce truth. The former is easy and fleeting; the latter is hard, but it's the only thing that lasts.