jcardena.com Blog Learning version control the hard way (after losing a week of work)
145 posts
EN ES

Learning version control the hard way (after losing a week of work)

Web

A hard-won lesson on version control after losing a week of work. Basic Git is not enough for today's AI and data stacks. An architect's take on DVC, Git LFS, and why process is architecture.

The cold dread of a machine that won't boot is a specific kind of horror. For me, it happened after a week of intense, uncommitted work on a complex new system. The disk was dead, and with it, every line of code, every database migration, every small breakthrough—gone. This wasn't a minor setback; it was a professional gut punch that taught me a permanent lesson: version control isn't a feature, it's the absolute foundation of durable architecture.

A Week Erased

Early in my career, source control felt like overhead for big teams. My workflow was pure naivety: write code, save file. I trusted my local disk implicitly. The week I lost was a blur of progress, weaving together business logic, data transformations, and API integrations. I was in the zone, solving one hard problem after another, saving my files each night with a dangerous sense of security. There was no concept of a "commit," just the blind faith that bits on a spinning platter were permanent.

Write Code LocallySave File to DiskHardware FailureTotal Data Loss
My Early, Fragile Workflow

That single point of failure—the local disk—was a direct threat to the resilience of the system I was trying to build. When it failed, the recovery attempts were useless. I had to recreate an entire week of high-intensity work from memory, fighting the constant, nagging feeling that the lost version was better. The experience burned a principle into my philosophy: anything not committed to a remote repository is ephemeral and must be treated as such.

From Habit to Architecture

The immediate lesson was to make frequent, granular commits a non-negotiable habit. Every logical unit of work, every passing test, every small refactor gets its own commit with a message explaining the why. A local commit protects against my own mistakes; a remote push protects against fire, theft, and hardware failure. This discipline became the bedrock of my personal workflow. But as systems evolved, I realized the habit alone wasn't enough. The tools and process of version control are an architectural decision in themselves, especially now.

The Challenge of the Modern Stack

Today's systems are a convergence of software, data, and AI. A single feature might involve application code, a sequence of versioned LLM prompts, a Jupyter notebook for experimentation, a multi-gigabyte model file, and a terabyte-scale training dataset. Simply dumping all of this into a standard Git repository is a recipe for disaster. Git is optimized for text, not for the large binary artifacts that define modern AI development.

This is where a deeper architectural perspective is required. We have to acknowledge Git's limitations and adopt patterns that address them. For large binary files like model weights, Git Large File Storage (LFS) is a common first step. It keeps large files out of the primary Git history while maintaining a seamless workflow for the developer. For the much harder problem of versioning massive datasets and tying them to specific model experiments, a more specialized tool is needed. I've found Data Version Control (DVC) to be an incredibly durable pattern. DVC works alongside Git, using it to version small metadata files that point to the actual data, which lives in cheaper object storage. This gives you reproducibility—the ability to check out a specific commit and have the exact code, model, and data that produced a result—without breaking Git.

Process is an Architectural Choice

Beyond the tools, the process your team follows is also a critical architectural choice that impacts velocity and stability. The simple "commit and push" workflow of a solo developer doesn't scale. You're immediately faced with decisions about branching strategy. As practitioners like Martin Fowler have written about for years, the trade-offs are significant. Do you use long-lived feature branches that keep work isolated but can lead to painful merge conflicts? Or do you adopt trunk-based development, which encourages smaller, frequent merges to the main branch and is a prerequisite for true continuous integration? The answer depends on your team, your release cadence, and your risk tolerance. It's not a git command; it's an architectural pattern for how work flows through your system.

DEVELOPMENT ARTIFACTSApplication CodeLLM PromptsNotebooksML ModelsLarge DatasetsVERSIONING & CONTROL PLANEGit (for code)Git LFS (formodels)DVC (for datasets)CI/CD PipelineSTORAGE & REGISTRYCode RepositoryObject Storage(S3/GCS)Model RegistrySERVING LAYERAPIs & EndpointsAgentic RuntimesDashboards
A Durable Versioning Architecture for AI Systems

The Durable Takeaways

That painful data loss years ago instilled a discipline that has scaled from simple scripts to complex, distributed AI platforms. The core lesson wasn't just to "use version control," but to think of versioning as a foundational architectural layer that must be designed with intent.

  • Treat local work as temporary. If it isn't committed and pushed to a replicated, remote location, it doesn't exist.
  • Choose the right tool for the artifact. Use Git for code, but use Git LFS, DVC, or similar specialized tools for the models and datasets that break vanilla Git.
  • Your branching strategy is architecture. The way your team merges code directly impacts your ability to ship reliably and quickly. Treat it as a first-class design decision.
  • Version everything. In the modern stack, the code, prompts, model configurations, and data pipeline definitions are all interdependent parts of the system. If it can change, it must be versioned.

The least glamorous parts of our work are often the most important. A well-designed version control strategy isn't exciting, but it’s the quiet bedrock that ensures the exciting parts can be built, maintained, and trusted to work at 3am.

JC
Juan Cardena
Enterprise Architect, Data & AI

Enterprise architect with 25 years across web, software, data, and AI. MIT CDAO ’25. Writing on agentic AI in production.