jcardena.com Blog When the database became the most important part of the app
145 posts
EN ES

When the database became the most important part of the app

Data

Application code is transient, but data is permanent. Learn why shifting to a data-centric architecture is the key to building durable, reliable systems for AI.

I still remember the bug. It was a Tuesday, around 1 AM, and the on-call pager had been screaming for an hour. A batch job was corrupting customer accounts, but not all of them, and not in any way that made sense. The application logs were a firehose of noise. We traced the logic, we added more logging, we stared at the code until our eyes burned. The problem wasn't in the code we were looking at. It was in the code we had forgotten.

The root cause was a state machine, implemented entirely in the application layer, that had drifted out of sync with a second service that also modified the same data. The application thought an account was in state `A` while the database clearly held state `B`. Everything we built on that flawed assumption was an exercise in futility. The fix wasn't another line of code; it was a `CHECK` constraint in the database. That was the night I stopped seeing the database as a bit bucket and started seeing it as the heart of the system.

When the database became the most important part of the app
When the database became the most important part of the app

The Dumb Bucket Fallacy

Early in my career, the application was king. We spent our time debating object-oriented patterns and service layers. The database was an afterthought, an implementation detail hidden behind an Object-Relational Mapper (ORM). We’d talk about making it "swappable," as if the Oracle database holding petabytes of critical transaction data could be hot-swapped for PostgreSQL on a whim.

This wasn't an irrational choice. We had good reasons. We wanted testable business logic, not inscrutable stored procedures. We needed to scale the stateless application tier horizontally, a much easier problem than scaling a single, stateful database. We were trying to avoid vendor lock-in, believing an ORM would give us portability.

When the database became the most important part of the app
When the database became the most important part of the app

But this abstraction came at a steep cost. By treating the database as a dumb persistence layer, we invited chaos. I’ve seen systems grind to a halt under the weight of thousands of N+1 queries, each one firing off a separate database call because a developer was just accessing a property on an object. More insidiously, this approach encourages data models that mirror the transient state of the user interface, not the durable, canonical state of the business. The application is a guest; the data is the home. The code you write today will be replaced in five years. The data, if you’re successful, will live for decades.

User RequestApplication LogicORM AbstractionInefficientQueriesDatabase
The 'Dumb Bucket' Architecture Flow

Finding the System's True Center

Every system has a center of gravity. For any application that does more than calculate pi, that center is its state. It's the data. All your microservices, your serverless functions, your front-end frameworks—they are just satellites pulled into orbit around the gravitational mass of the database.

This isn't a new or radical idea, though it's often forgotten in the rush for the next framework. It’s a core theme in foundational texts on system design, like Martin Kleppmann’s Designing Data-Intensive Applications. The principle is simple: application code is transient, but the data and its structure endure.

When you get the data model right, the application logic often becomes radically simpler. I once worked on a complex scheduling system where we spent weeks writing validation code to prevent double-bookings. We had race conditions and locking issues. The eventual solution was a single, elegant exclusion constraint (`EXCLUDE USING gist`) at the database level. It made a dozen classes of application-level bugs impossible by definition. The database guaranteed the integrity of the state, and the application code could just trust it.

When Logic Belongs to the Data

The pendulum in our industry swings hard. We fled from massive, thousand-line stored procedures—the "fat database" pattern—for very good reasons. They were often untestable, impossible to version control, and written in arcane dialects of SQL. The move to put most business logic in the application layer was, on the whole, a good one.

But we went too far. In our zeal, we decided that *no* logic belongs in the database. This is throwing the baby out with the bathwater. Certain classes of logic are safer, more performant, and more correct when they live right next to the data.

  • Integrity Constraints: A `FOREIGN KEY` or `CHECK` constraint is business logic. It’s a rule that says, "an order must belong to a valid customer." Enforcing this in one place—the database—is infinitely better than trying to enforce it in five different microservices that might touch that data.
  • Data-Intensive Operations: If you need to aggregate millions of rows to produce a single number, it is profoundly inefficient to pull all that data over the network into an application server to process. This is what databases are built for. A view or database function can perform this work orders of magnitude faster.
  • Stable Data APIs: Database views are a fantastic way to create a stable, backward-compatible API for other systems, particularly for analytics. You can refactor the underlying tables, but as long as the view remains the same, you don’t break downstream consumers.

The Foundation for Modern AI Work

This database-centric philosophy has become more critical, not less, in the age of AI. An LLM agent’s ability to perform useful work is completely dependent on the quality and structure of its world model. That world model, in a business context, is the database. An agent trying to operate on a poorly defined, inconsistent data model is like a person trying to navigate a city with a wrong map.

This has become tangible as databases evolve. The logic isn't just about constraints anymore; it's about capability. The rise of extensions like `pgvector` for PostgreSQL shows this trend clearly. Instead of pulling data out to a separate service for vector search, the work happens *inside* the database, right next to the source of truth. This reduces latency, simplifies architecture, and ensures consistency for agentic systems that rely on semantic search.

The convergence of software, data, and AI is happening right here. The integrity of that truth store is paramount. When an automation pipeline reads that a customer's status is "active," it must be able to trust it absolutely. That trust isn't built in the Python script running the pipeline; it's built in the schema and transactionality of the database.

What This Means in Practice

Shifting to a data-centric view isn't about writing stored procedures for everything. It's a change in priority. It’s about respecting the database as a powerful, active partner in your architecture, not a passive storage bucket. In my work now, this translates to a few concrete principles:

FOR DATA INTEGRITYConstraintsForeign KeysChecksExclusionFOR DATA-INTENSIVE WORKViewsMaterialized ViewsFunctionsFOR APPLICATION FLOWAPI EndpointsUI LogicState Machines
Where Should Logic Live? A Decision Guide
  • Model the data first. Before writing a single API endpoint, we whiteboard the core entities, their relationships, and the invariants. This is the most important design document.
  • Treat the schema as code. Schema migrations are reviewed with the same rigor as application code. They are part of the same pull request. They are the most critical code we write.
  • Learn to read a query plan. An `EXPLAIN ANALYZE` is the most honest form of architectural feedback you will ever receive. It tells you the truth about how your object model translates to disk I/O.
  • Push logic down to the data. When a rule is about data integrity, enforce it where the data lives. When an operation is data-intensive, perform it where the data is stored. Don't pull data over the network just to satisfy an architectural dogma.

The application code will change. The frameworks will be replaced. But the data will endure. Give it the respect it deserves.

JC
Juan Cardena
Enterprise Architect, Data & AI

Enterprise architect with 25 years across web, software, data, and AI. MIT CDAO ’25. Writing on agentic AI in production.