The first agent I built that actually did something useful

My first half-dozen AI agents were impressive failures. They excelled at demos, stringing together API calls in a way that felt like magic. But when faced with the messy reality of production—a slightly malformed input, a transient network error—they collapsed. They were glorified prompt chains, houses of cards that looked intelligent until a breeze came along. The first time I built something that actually worked day-in and day-out came when I stopped trying to build an artificial brain and focused on building a better robot.

Reliable Agent Workflow

From Brittle Chains to Predictable Failure

The initial allure of agentic patterns is powerful. You read foundational papers like ReAct: Synergizing Reasoning and Acting in Language Models, and you imagine an LLM reasoning its way through any problem. My first attempts were exactly this: give a model a high-level goal, a list of tools, and let it generate a "plan" to execute.

It worked, but only on the perfect happy path. The moment reality intruded, the sequence fell apart. The agent would see an error message about a missing customer_id and, instead of looking it up, would try to pass the customer's name as the ID to the billing API, causing it to fail repeatedly. It had no state, no memory of what had succeeded, and no robust way to recover. It was a black box of hope, and hope is not a strategy for reliable systems.

The Breakthrough: A Deterministic Skeleton

The pivot felt counterintuitive. To make the agent more capable, I had to make it less autonomous. Instead of asking the LLM to invent a workflow, I defined the workflow myself as a rigid, deterministic state machine. This is a classic software pattern, really—Inversion of Control. The framework, not the component, dictates the flow.

Think of it as a factory assembly line. The line is fixed; it dictates the order of operations. The LLM agent is a sophisticated robotic arm on that line. It doesn't decide what happens next, but at each station, it performs complex, nuanced tasks—like picking the right bolts based on a fuzzy natural language order. The overall process is predictable and auditable, but the individual steps are powered by intelligent adaptation.

The Job Too Annoying for a Human

The perfect test case was a reporting task everyone hated. The business team needed to pull data from three different internal systems to answer customer questions. A typical request arriving by email might be: "What was the total spend for the customer on `project-alpha` last quarter, but exclude the `widget-beta` service fees?"

A human would have to:

Parse the natural language to identify the key entities: customer, timeframe, exclusions.
Query the project system API to find the customer ID for `project-alpha`.
Write a SQL query for the data warehouse to get spend data.
Write another query to identify and subtract the specific service fees.
Combine and format the results into a clean summary.

This was too complex for a simple script due to the fuzzy front-end, yet too repetitive and error-prone for a person to do all day. It was the ideal candidate for an agent built on a deterministic skeleton.

How It Worked: Tools, State, and Guardrails

The architecture was straightforward and, frankly, a bit boring. Which is why it worked.

The State Machine: The core was a simple state machine defined in code: PARSING_REQUEST, FETCHING_CUSTOMER_ID, QUERYING_SPEND, and so on. The agent always knew exactly where it was. A failure in one state didn't corrupt the entire run; it just meant that state needed to be retried or escalated.

A Curated Tool Belt: The agent had a small number of hardened functions, like getCustomerId(projectName) or executeSpendQuery(...). The LLM's only job was to generate the JSON arguments for these pre-defined functions based on the current state and the user's request. This bounded its creativity and prevented a world of security and reliability risks.

The Self-Correction Loop: This was the critical piece. If a tool failed—say, getCustomerId returned null because the project name was misspelled—the system didn't crash. The state machine would catch the error and pass the error message back to the LLM with a prompt like: "Executing getCustomerId with projectName='project-alfa' failed with 'not found'. Based on the original request, can you correct the parameter and retry?" The LLM, great at fuzzy matching, would often correct "alfa" to "alpha," and the process would continue seamlessly.

Deterministic Agent Architecture

The Durable Lesson: Automations, Not Brains

The final agent wasn't a thinking machine. It was a durable, observable, and reliable piece of automation. The business team began to trust it, and the weekly flood of ad-hoc reporting requests slowed to a trickle. That was the only metric I needed. Every action, every tool call, and every retry was logged. We had a clear, step-by-step trace to debug.

This is the lesson that has stuck with me. The most potent applications of LLMs in enterprise systems are not about replacing human thought with a magical black box. They are about augmenting deterministic, well-architected systems. We use their linguistic and reasoning capabilities to handle the fuzzy parts—unpredictable inputs and complex error conditions—while keeping the core process as simple and auditable as possible. This pattern of a structured graph executing LLM-powered nodes is now being formalized in modern frameworks like LangChain's LangGraph, confirming that durability is a shared goal.

For now, the goal is not to create a digital colleague. The goal is to build a tool that reliably executes a valuable business process. Start there. The results are far less magical, but infinitely more useful.