What I wish I'd learned about AI five years sooner

I remember the first project where we plugged a generative model into a critical business process. The demo felt like magic. But two weeks into integration testing, the effort collapsed. The model started hallucinating valid-looking but nonexistent object IDs, and the JSON it returned failed schema validation in roughly one out of every 20 calls. We had built a system around a miracle, and when the miracle sputtered, we had nothing.

That failure taught me the most important lesson of my recent career: the specs for the AI model are not the specs for your system. The real work is building the architecture around the model's fundamental nature.

What I wish I'd learned about AI five years sooner

From Magic Box to Probabilistic Cog

My initial mental model was all wrong. I saw the LLM as an endpoint, a function call that would just return the right answer. This is the "magic box" fallacy, and it leads to brittle, untrustworthy systems.

The correct model is to view the LLM as a single, probabilistic cog in a much larger, mostly deterministic machine. It introduces managed uncertainty. Your job as an architect is to design the machine to function reliably even when that cog slips. This aligns with what others are seeing at scale; I see it in concepts like Andrej Karpathy's "LLM OS," where the raw model is just one process managed by a more rigid, deterministic outer system. The real architectural work is building that operating system, not just calling the kernel.

Mental Model Shift

Nondeterminism is a Constraint, Not a Bug

Software engineers spend their careers trying to eliminate nondeterminism. With generative AI, that instinct is a liability. The model's ability to produce varied outputs from the same prompt is its core strength, but for a system, that variation is a risk you must explicitly manage.

This means you never trust a single generation for a critical task. Instead, you build resilience patterns:

Validation Layers: Does the output conform to the required schema? Does it meet business rules? This is your first line of defense.
Self-Correction Loops: If validation fails, re-prompt the model with the error. "You provided invalid JSON. Please correct it." This works, but has trade-offs—an aggressive loop can obscure the root cause of failures or lead to spiraling costs.
Deterministic Fallbacks: If multiple attempts fail, the system must have a safe path. That could mean routing to a human, using a simple template, or returning a known-good default state.

The key is to assume the model's output will vary and to define, in code, what "acceptable variation" means for your system.

The Architecture of Truth

Five years ago, I thought we could "fine-tune" bias and hallucinations away. That was naive. Bias is a fossil record of the training data. Hallucinations are a fundamental byproduct of a system designed to generate plausible sequences, not to state verified facts.

You don't solve these problems inside the model; you mitigate them in the surrounding architecture. The most durable pattern for this is Retrieval-Augmented Generation (RAG), first formally described in the 2020 paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" by Lewis et al. The principle is simple: never let the model be its own source of truth. The system provides the facts from a trusted source, and the model's job is to synthesize and summarize. The reliability comes from your data pipeline, not the LLM's raw knowledge.

Human-in-the-Loop is a Permanent Pattern

My biggest mistake was seeing human-in-the-loop (HITL) as a temporary crutch. It felt like an admission of failure. This was completely backward. For any high-stakes workflow, HITL is an essential and permanent architectural pattern.

The goal isn't full autonomy; it's to build a powerful "centaur." This isn't a new idea—it's a pattern Garry Kasparov observed decades ago in "Advanced Chess," where a human player with a computer could beat both a grandmaster and a supercomputer alone. The human provides judgment, context, and ethical oversight. The AI provides speed and scale. The real architectural challenge becomes designing the interface between them so a human can make a better, faster decision.

Architecture for a Hybrid System

What I Would Tell Myself

Looking back, the details of specific models mattered far less than this shift in architectural perspective. If I were starting over, I would ground every design in three principles:

Isolate the Probabilistic Core. Treat the LLM as an untrusted component, not an oracle. Wrap it in a clear, deterministic contract that expects and handles failure.
Architect for Variation. Your system's reliability is a function of the deterministic code around the model. Build robust validation, retry logic, and safe fallbacks as a first-class concern.
Design the Human Interface. For anything that truly matters, human oversight is a permanent feature. The system for presenting evidence and enabling human judgment is as important as the model itself.

Building with AI today feels less like software engineering and more like systems engineering for a hybrid workforce of deterministic code and probabilistic agents. Accepting that reality five years sooner would have been the best trade I ever made.