Discovering that naming things is genuinely the hard part

I once lost an afternoon to a bug caused by two variables named status. One was an integer enum from our database; the other was a free-text string from a partner API. They shared a generic name but meant entirely different things. The system didn't crash; it just produced subtly wrong results, a quiet corruption that's the most dangerous kind of failure. That day, an old cliché became a core principle for me.

The saying, widely attributed to engineer Phil Karlton, is that there are only two hard things in computer science: cache invalidation and naming things. As I’ve spent more time at the intersection of software, data, and AI, I've found the second part is less a clever quip and more a fundamental law of building systems that last.

The Cascade of Ambiguity

Ambiguity Is the Most Expensive Debt

A name like data, item, or record is a blank check for future confusion. It forces every developer, including my future self, to re-derive context by reading surrounding code. This cognitive overhead is the interest payment on technical debt, and it compounds faster than any other kind.

This problem is magnified in the systems I build today, where deterministic pipelines feed non-deterministic AI agents. Imagine an LLM agent tasked with acting on a user's behalf. If the agent's toolset includes a function called get_info(user_id), what is it getting? The user's profile? Their recent orders? A JSON blob of notification settings? The agent has to guess, and its guesses create a vast state space of potential, hard-to-reproduce failures.

A function named fetch_user_shipping_address(user_id), however, is a clear contract. It removes ambiguity, making the agent's behavior more reliable and the overall system more robust. The cost of a vague noun is an explosion in uncertainty.

Naming Is an Architectural Act

Some argue for brevity, pointing to the elegant terseness of shell commands or mathematical notation. This is a misplaced ideal. While brevity has its place, it is a liability in collaborative, long-lived systems where clarity for the next person is paramount. Choosing a good name is not decoration; it is an act of design that draws boundaries.

This idea is central to what Eric Evans calls the "Ubiquitous Language" in his book Domain-Driven Design. A precise, shared vocabulary for concepts is the foundation of a sound architecture. A well-named component defines its contract with the rest of the system. Consider the difference between a variable holding a new invoice:

invoice_file
pending_validation_customer_invoice_pdf

The first is a guess. The second is a specification. It tells you the state (pending_validation), the entity (customer), the content (invoice), and the format (pdf). You know what it is—and what it is not—without reading another line of code. This is the bedrock of a maintainable system.

The Convergence of Clarity

This discipline is most critical where software, data, and AI engineering converge. In a data warehouse, a table named user_activity is a time bomb. What timezone are the timestamps in? Is the activity de-duplicated? I’ve seen an analytics project get derailed for weeks because one team's definition of that table (raw clickstream events) was incompatible with another's (deduplicated user sessions). The failure wasn't in the SQL; it was in the name that allowed two valid but opposing interpretations.

This is why battle-tested conventions like prefixing data tables with their purpose—stg_ for staging, fct_ for facts—are so valuable. They encode a component's lineage and purpose into its identity, fighting ambiguity at the source. This isn't just a data engineering pattern; it's a principle for the whole stack. Clear naming prevents the subtle semantic drift that corrupts systems from the inside out.

The 3 AM Litmus Test

The tactical advice for good naming has been established for decades, best articulated in books like Steve McConnell's Code Complete. Name functions with strong verbs that describe what they do, not what they return. Give variables names that explain their context and state. The most reliable pattern I've found is to make a name answer basic questions: What is it? What is its state? What is its context?

Yes, this leads to longer names. This is a trade-off I will make every time. Storage is cheap and IDEs have autocomplete. The cognitive cost of ambiguity during a crisis is astronomically high. This leads to the ultimate heuristic: the 3 AM test. The value of a good name is revealed under pressure. Can a colleague, woken by an alert, look at your code and grasp its intent immediately?

If a name isn't clear enough for a stressed, sleep-deprived engineer, it isn't clear enough. Before you commit your code, look at the names you chose and ask: would this make sense to a stranger in a crisis? If the answer is anything but a confident "yes," you're not done yet.

Architecture of Clarity: Naming as System Design

This isn't about pedantic rules. It is about craftsmanship and respect for the people you work with, including your future self. Ambiguity is the most expensive technical debt, paid for in long debugging sessions and production incidents. We must optimize for the reader, because code is read far more often than it is written. Take the extra thirty seconds to find the right words. It’s one of the highest-leverage acts in engineering.