When 'it works on my machine' stopped being an excuse
Software
The journey from 'it works on my machine' to reproducible, containerized systems. Why codifying the entire environment is non-negotiable for modern AI and data.
The bug report was baffling. A critical batch ETL for customer segmentation was failing in production, but the error logs were maddeningly vague. I pulled the latest code, ran the exact same process against a data sample on my local machine, and watched it complete flawlessly. For a moment, I felt that familiar, toxic relief: "It works on my machine."
That feeling used to be a shield. It implied the code was correct and the world was wrong. But staring at the green "SUCCESS" message on my screen, it felt like an admission of failure. My machine wasn't the world. It wasn't even a meaningful representation of it. My pristine, perfectly-configured laptop was a fantasy island, and the system was failing for real users on the mainland.

The Local Environment as a Comfortable Lie
For years, a developer's local setup was an artisanal craft. We'd install databases, tweak configuration files, and set environment variables by hand. We created delicate, unique artifacts, like a Python environment where every project mutated the global site-packages. The whole setup was a sandcastle built from memory and shell history.
The problem is that a workshop is not a factory. The code doesn't run in a vacuum; it runs on an operating system with specific patch levels, shared libraries, network endpoints, and permissions that are almost always more generous than in production. "It works on my machine" is a symptom of this drift. It’s a statement that your N-of-1 experiment succeeded, while the scaled-up, locked-down reality is failing.

From Shipping Code to Shipping Environments
The first meaningful assault on this problem came from virtualization, but it was containerization, specifically Docker, that truly ended the excuse. The paradigm shifted completely. The atomic unit of deployment was no longer a script; it became a container image—a self-contained package including the application, its dependencies, and a slice of the filesystem. The Dockerfile became a deterministic, version-controlled recipe for the exact environment.
This wasn't a trend; it was a durable architectural pattern that enforced honesty through principles of immutability and isolated dependencies. It's the modern expression of ideas like Chad Fowler's "Phoenix Servers"—the notion that we should be able to burn down and rebuild our servers from a clean, automated state. Suddenly, the developer's machine and the production cluster could run the exact same artifact. The conversation changed from "it works on my machine" to "does the container run everywhere?"
This rigor isn't free. It demands an upfront investment in tooling and a steeper learning curve. Yet, my experience shows the long-term returns in stability and reduced debugging time far outweigh this initial cost.
AI Magnified the Problem by an Order of Magnitude
Just as we solved the environment problem for stateless software, a new beast emerged. The "machine" in a modern AI/data system is far more than an OS and some libraries. As the classic 2015 paper Hidden Technical Debt in Machine Learning Systems highlighted, the scope of dependencies explodes. The environment now includes:
- Data Schemas and Distributions: A model trained on clean data behaves very differently when it sees the messy, late-arriving data of a production stream.
- Model Artifacts: A multi-gigabyte model file is a critical dependency. A slight mismatch in the tokenization library between training and inference can cause silent catastrophe, producing subtly different token IDs that make model predictions drift fatally.
- Hardware and Drivers: An agentic system that runs on a local NVIDIA GPU can fail on a cloud instance due to subtle differences in CUDA drivers or architecture.
The excuse is back with a new vocabulary: "The model performed well in the notebook." It's the same comfortable lie, applied to a more complex stack. If the entire pipeline isn't reproducible, local performance is irrelevant.
Building Systems That Cannot Lie
The solution is to apply the same deterministic principles to the entire stack. We must codify everything. As Martin Fowler describes in his work on Infrastructure as Code, this means defining the network, compute, and permissions in version-controlled files. Tools for data and model versioning, like those explained in DVC's documentation, treat these artifacts with the same rigor as source code. Declarative pipelines define the exact steps to build, test, and deploy every component.
The goal is to create a system where a single command can stand up the entire stack, identically every time. The most durable patterns are the ones that force this honesty upon us, like hermetic builds or immutable infrastructure deployments. They make it impossible to hide on our private, perfectly-working islands.
Key Takeaways
Abandoning "it works on my machine" is a change in professional mindset. It’s the transition from being a writer of code to being an engineer of a running system. It is about intellectual honesty and a respect for the complexity of production.
- Treat your local environment as a simulation, not the truth. Strive to develop inside production-like containers from the first line of code. Its purpose is to provide a high-fidelity preview of reality.
- Codify everything. If you can't recreate it from a script in version control, it doesn't really exist. This includes infrastructure, dependencies, data schemas, and pipeline definitions.
- Shrink the feedback loop. The CI/CD pipeline is the true arbiter of "does it work?" Make it fast and easy to run, so it becomes a natural part of your workflow, not a final gate.
- Own the entire system. A practitioner's responsibility is a working, reliable system in production, not just a clever algorithm in a notebook.