Master data management, explained by my own mistakes
Data
A personal story of building a Master Data Management (MDM) hub after a disastrous meeting. Learn the real-world architecture and trade-offs of this essential system.
I remember the meeting perfectly. We were in a long, narrow conference room, and the CFO asked a simple question: "How many customers do we have?"
The VP of Sales went first, pulling up a dashboard from their CRM. Then the Head of Product showed a chart from the application database. Finally, the controller presented a number from the billing system. Three different senior leaders, three different systems, and three wildly different numbers. The silence that followed was heavy. Nobody was wrong, but we were collectively useless. That was my first, sharp lesson in the necessity of master data.
The Anatomy of a Simple Question
The problem wasn't a technical bug; it was a failure of definition. For Sales, a "customer" included active leads. For Product, it was a unique user account. For Finance, it was a legal entity that had paid an invoice. Each definition was correct within its own operational context, but their collision created system-wide chaos. We couldn't trust our own reports because we couldn't agree on the most fundamental nouns of the business.
This is the classic entry point to MDM. It’s not about bad data, but about fragmented, context-specific data masquerading as universal truth. The instinct is to just pick one system as the "source of truth," but that rarely holds up.
Choosing an Architecture, Not Just a Source
The naive fix is to declare one system—usually the CRM—as the definitive source. I’ve seen this fail. No single operational system is built to serve every other system's needs. The real architectural shift is to stop looking for a passive "source of truth" and start building an intentional "system of record." This isn't just semantics. A system of record is an active, curated component purpose-built for this job.
We considered buying a large enterprise MDM suite, but the cost and complexity were too high for solving just one entity. We also knew that simply anointing the data warehouse wouldn't work, as its role is analytical, not operational. So we chose to build a small, focused service.
Our Pragmatic MDM Hub
The service we architected had a clear, deterministic job. It ingested candidate records for "Customer" from the source systems into a staging area. A matching engine then applied a set of "survivorship rules" we had painstakingly defined in workshops. For example: the legal name was authoritative from the billing system, while the primary contact email was authoritative from the CRM. The logic merged these pieces into a single "golden record" in a dedicated database, which held a unique, immutable master ID and linked back to the original source records for lineage.
This hub-and-spoke model broke the N-to-N communication mess. All roads led to and from the master data hub. When analytics needed a customer list or a new microservice needed customer data, they consumed it from the hub's simple, stable APIs.
The Scars and Honest Trade-Offs
This solution wasn't free. The biggest cost was political, getting stakeholders to agree on those survivorship rules. But the biggest scar came from bypass. A team on a deadline once connected directly to the CRM for an "urgent" campaign, polluting a key report for a week until we caught it. It taught us that an MDM architecture isn't just code; it's a social contract you have to constantly defend.
Our centralized hub was the right call for us then. Today, many advocate for a more decentralized "Data Mesh" approach, as Zhamak Dehghani detailed in her foundational work on the topic. While powerful, that pattern requires a level of domain-team maturity we didn't have. A single, deterministic hub was a more pragmatic first step to establish control.
What Actually Worked in Practice
Mastering data is a foundational piece of architecture, not a background task. The pain of that project taught me a few durable lessons. First, start with one entity—the one causing the most pain—and solve it well to demonstrate value. Second, recognize that MDM is an ongoing process of governance, not a one-time product install. The hard part is the human agreement, and that work is never done. Finally, the single most valuable output is that stable, permanent master identifier. It's the key that unlocks a unified view of your business, and it needs an executive sponsor who can enforce its adoption when the political battles begin.
Getting a straight answer to a simple question shouldn't be a luxury. It's the price of entry for building reliable systems, from simple dashboards to complex AI agents. And sometimes, the architecture required to deliver it is forged in the fire of a really embarrassing meeting.