Replicating a legacy system at scale: the integration nobody blogs about

We want to build systems where AI agents can safely take action. But many of our most critical business functions are trapped inside legacy monoliths—brittle, undocumented black boxes. You can't let an LLM agent operate on a system whose side effects nobody fully understands. The first, unglamorous step toward an agentic future is often a journey into the deterministic past: replicating that legacy core.

This is an act of architectural archeology. The original blueprints are lost, the builders are gone, and the only source of truth is the system's behavior in production. The goal is to forge a new, reliable, and observable deterministic core that you can trust before you ever let an agent near it.

Establishing Ground Truth

The first casualty in these projects is the belief in a clean specification. The real spec is not a document; it's buried in the source code, the database triggers, and the half-forgotten cron jobs. My first step is to treat the running system as the artifact to be studied. This means observing its behavior under load, not just reading its code.

The challenge isn't just deciphering business logic. It's finding the implicit contracts between components. I once found a critical status change that wasn't in any application code. It was a database trigger, written by a long-gone engineer, that fired only when a separate process updated a seemingly unrelated table. A clean, modern rewrite would have missed this entirely. The only way to find these ghosts is to log every input and its resulting state change, reconstructing the state machine one transition at a time.

The Comparator Validation Flow

The Comparator Pattern in Practice

You cannot validate a replica with a test suite alone. The only test that matters is production traffic. This leads to the most critical piece of temporary infrastructure you will build: the validation harness, or comparator.

The principle is simple. For every write operation sent to the legacy system, you fork the request and send it to the new system running in "shadow mode." The new system processes the request and saves its state, but its outputs are isolated. Then, a dedicated comparator service fetches the resulting state from both the legacy and new systems and performs a deep, field-by-field comparison.

This isn't a novel trick; it's a battle-tested pattern for high-stakes migrations. The team at GitHub famously formalized this approach in their open-source library, Scientist, which they used to rewrite critical parts of their platform. The comparator provides brutal, empirical honesty about whether your new system behaves identically to the old one.

When Not to Replicate: The Strangler Fig Alternative

This level of parallel validation is a heavy investment. It's justified for monolithic, stateful cores where functional equivalence is non-negotiable—think billing engines or trading ledgers. But it isn't always the right tool.

The primary alternative, described by Martin Fowler as the Strangler Fig Application, offers a different path. Instead of replicating the entire system in parallel, you incrementally carve off pieces of functionality. You place a proxy in front of the legacy system and route calls for a specific feature—say, user profiles—to a new, standalone service. Over time, the new services "strangle" the old monolith, which shrinks until it can finally be retired.

The choice is a key architectural trade-off. Choose the comparator pattern for indivisible cores. Choose the Strangler Fig when you can safely decompose the system and replace it piece by piece.

The Long Grind to Zero Discrepancies

The comparator's output is a dashboard of discrepancies, and it becomes the project's true status report. Early on, it's a firehose. You find subtle data type mismatches, character encoding differences, and inconsistent handling of nulls. These aren't just bugs; they are implicit definitions of correctness you must now make explicit.

Each discrepancy requires a decision. Is this a bug in the new system? Or is it an intentional improvement—like adding idempotency—that the comparator needs to be taught to ignore? The legacy system also continues to change, with other teams shipping fixes and features. A new field added to the legacy database will immediately appear on the diff dashboard, alerting you that your target has moved.

The work is a long, slow grind toward zero. You ship a fix, and a class of diffs vanishes. When the comparator has been silent for weeks, you have earned the trust needed to begin the cutover. That final switch is often the quietest moment of the project, because equivalence has already been proven.

Architecture for AI on a Replicated Core

Principles for a Successful Replication

Replicating a legacy core is less about writing new code and more about managing risk. The entire process is designed to build confidence that the new system is a true functional replacement. From that stable foundation, you can then build the more interesting agentic systems of the future.

The running system is the spec. Trust observation and instrumentation over historical documents.
Build the comparator first. This validation harness is the most important component; it guides and proves the entire effort.
Define "equivalence" with the business. Seemingly minor data differences can have major impact. Get explicit sign-off on what it means for the systems to be the same.
Use production traffic from day one. Shadowing real requests is the only way to uncover the long tail of edge cases and undocumented behaviors.
A cutover is a dial, not a switch. The confidence you build allows for a gradual, controlled, and reversible transition, starting with reads and moving slowly to writes.