The year I stopped writing apps and started moving data

It was 3 a.m. and the bug made no sense. A user’s shopping cart showed items they’d removed hours ago. The front-end state was correct, the API calls looked fine, but the database told a different story. For weeks, we treated it as a caching issue, a race condition, a client-side glitch. We were wrong. The problem wasn’t in the application code; it was in a silent, failing background job meant to sync inventory from a legacy system.

That was the moment the scales fell from my eyes. We weren't writing an application; we were orchestrating a fragile, poorly understood data ballet. That was the year I stopped thinking of myself as an app developer.

The year I stopped writing apps and started moving data

The Application is a Lens, Not the Center

In my early career, it felt like we spent the vast majority of our time on the top slice of the stack—the part the user touched. The world was defined by UI frameworks and API endpoints. The goal was always the feature, the user story, the visible change on the screen. The database was just a place to put things, a passive byproduct of application logic.

This is a dangerous illusion. The application is just a thin, transient lens for viewing a much more powerful and permanent force: the flow of data through the business. The real complexity wasn't in the button clicks. It was in the replication jobs, the impedance mismatch between data models, and the eventual consistency that was more “eventual” than “consistent.”

Shift from App-Centric to Data-Centric View

This isn't to say every project needs a complex data pipeline from day one. For a simple MVP proving out an idea, a classic monolithic app and database is often the right, pragmatic choice. But the moment a system needs to serve more than one master—an analytics team, a search index, another service—that app-centric model begins to break down.

Data Has Gravity and Truth Has Latency

The first major data migration I led cemented this new perspective. We had to move a monolithic customer database to a new set of services. The easy part was writing the new service code. The brutally hard part was moving five years of transaction history for millions of users, with zero downtime, while the old system was still taking live writes.

This is where you stop thinking about code and start thinking about physics. As Pat Helland articulated years ago in his foundational paper Data on the Outside versus Data on the Inside, data shared between systems has different rules than the ephemeral data inside a service. This "outside" data has gravity. It pulls services and processing toward it. We couldn’t just “move” it; we had to build dual-write systems, reconciliation jobs, and verification pipelines to prove that two different systems agreed on the definition of reality.

These aren't application problems; they are systems architecture problems where data is the primary citizen. Success is when nothing appears to happen. The work is in building deterministic automation that is boring, reliable, and correct.

From State to a Log of Facts

The next evolution was to stop thinking about data as mutable state and start seeing it as an immutable stream of events. A user’s address isn't just a string in a database; it’s the result of a series of "AddressUpdated" events. This shift turns the database from a snapshot of the present into a replayable, durable log of everything that has ever happened.

This is the core idea behind what Jay Kreps, a creator of Apache Kafka, calls The Log. The event stream becomes the single source of truth, decoupling producers from consumers. Of course, the demos never show the failure modes. I once spent a week debugging a production system where a downstream analytics consumer kept failing because our team added a new, supposedly optional field to an event schema. The consumer’s brittle parser choked on it, halting a critical business report. It was a painful lesson: data contracts are APIs, and they must be treated with the same rigor.

The Deterministic Bedrock for AI

This focus on data movement felt important a decade ago. Today, it has become the foundational challenge for building reliable AI. You cannot build a meaningful agentic system on top of a messy, untrustworthy data foundation.

An LLM agent is a powerful, non-deterministic engine. To ground it in reality for an enterprise context, you must feed it clean data from deterministic pipelines. The quality of a retrieval-augmented generation (RAG) system is not a function of your vector database choice; it's a function of the quality and provenance of the data chunks you put into it. The "boring" work of data contracts, quality checks, and monitoring is the single greatest enabler of the "exciting" work in AI. The deterministic platform is what makes the probabilistic agent useful.

Architecture for Deterministic and Agentic Systems

The Real Work is in the Flow

That late-night bug taught me that the most valuable work is often invisible. It’s not about adding more features to the surface, but strengthening the foundation underneath. I still write code that results in user interfaces, but I no longer believe that’s where the most durable, high-leverage work lies.

The core task is to model the flow of data first. Before sketching a wireframe, map how information moves, who transforms it, and who consumes it. The UI is just one of many consumers. Invest in the observability to trace a single fact as it moves through a dozen asynchronous systems. This isn't a luxury; it's a prerequisite for running anything serious in production. The architecture of the plumbing, it turns out, is what determines whether the house stands or falls.