Learning to read someone else's code without judgment
Web
To build reliable AI agents, we must first learn to read legacy code without judgment. This is how to become a technical archaeologist for modern systems.
Our new LLM agent kept giving nonsensical answers about customer accounts. It wasn't a flaw in the model, the prompt, or the retrieval pipeline. The agent was crashing against a ghost in our data warehouse—a single integer column named cust_status that held thirty years of unwritten business rules, encoded as a bitmask.
The agent, trained on a world of clean schemas and descriptive strings, had no way to know that a value of 13 meant "Active," "On-Hold for Billing," and "Subscribed to Newsletter." It saw a number and failed. This is the new front line of architecture: building intelligent systems that must coexist with the ghosts of a thousand past decisions.
The Siren Call of a Rewrite
The impulse to judge old code is a powerful one. When we see something like a bitmask in a modern database, our first reaction is often indignation. The desire for a total rewrite, a clean slate, is intoxicating. It’s a fantasy of control, where we can impose our superior patterns on a messy world.
But this instinct is a trap, and it's one of the most expensive mistakes a team can make. As Joel Spolsky argued years ago in his seminal essay, “Things You Should Never Do, Part I,” a rewrite often discards years of accumulated, hard-won knowledge. That "ugly" code is ugly for a reason; it has been shaped by the sharp edges of reality—bugs, deadlines, and platform limitations you can no longer see.
Ignoring that history doesn't erase it. It just ensures you'll have to painfully rediscover it all over again, one production outage at a time.
From Critic to Archaeologist
The most important shift in my career was moving from critic to archaeologist. My first encounter with this was a Perl script orchestrating a nightly data load. It used global variables and file-based locks, and I thought it was garbage. I was wrong.
Years later, I learned the script was a clever solution to a brutal set of constraints. It ran in an environment with no configuration management and on a shared file system where proper distributed locks were impossible. This wasn't reckless technical debt; it was what Martin Fowler calls prudent and deliberate debt. The authors knew it wasn't ideal, but it solved the problem and, crucially, it worked. My judgment was a failure of curiosity.
The job of an architect in a modern, hybrid system isn't just to design new things. It's to excavate the unwritten rules of the old things. You must ask "why is it this way?" before you can safely build on top of it.
Excavating the Unwritten Rules
This archaeology requires a systematic approach, not just a vague change in mindset. My toolkit is simple but effective.
git blameis your map. It points you to a person and a point in time. The real treasure is the commit message or the linked ticket. A message reading "Hotfix for #719" is a breadcrumb that leads you to the original context: the emergency, the trade-offs, the reason.- Pull requests are the lost dialogues. The comment threads on a pull request often contain the entire debate. You'll find the argument for the "clean" solution and the pragmatic reason it was rejected in favor of the one that shipped.
- Reconstruct the environment. What version of the libraries were they using? What were the cloud provider's API limits back then? Bizarre workarounds often snap into focus when you realize they were written to sidestep a long-fixed bug in a dependency.
Approaching a system this way transforms your role. You stop being a critic of the past and start being a collaborator with it.
Why Agents Fail on Human-Made Systems
This skill is no longer optional. A human developer looking at that cust_status bitmask might be confused, but they can ask a senior teammate for context. An LLM agent cannot. It takes the schema at face value, and when the schema lies by omission, the agent's logic collapses.
Agentic systems are powerful, but they are also profoundly literal. They lack the context and intuition that humans use to navigate decades of architectural scar tissue. When we task an agent to "determine which active customers have overdue invoices," it will fail unless it knows that "active" is represented by the first bit in an integer. That knowledge exists in commit messages from 2003 and the memory of three retired engineers, not in the database schema.
The brittleness of modern AI is a direct function of the historical context it can't access. Our job is to bridge that gap. We are building systems where new agentic components must safely query and operate on deterministic systems that carry a generation of these hidden constraints.
Documenting for a Hybrid Future
The pragmatic takeaway is that we must change how we think about documentation. The code you write today will be someone else's legacy tomorrow. That "someone" might be a new hire, your future self, or an autonomous AI agent.
Be the ancestor you wish you had. Write commit messages that explain the *why*, not just the *what*. Use architecture decision records (ADRs) to capture the constraints and trade-offs that led to a design. "We chose to use a bitmask here to maintain compatibility with the legacy mainframe billing system (Project Phoenix) until it is decommissioned in 2028."
That one sentence is worthless to a code-completion bot, but it's a Rosetta Stone for the next human architect or the person tasked with building a knowledge base to fine-tune an agent. This slow, empathetic work of excavation and documentation is what allows us to build durable systems that don't just work, but evolve.