jcardena.com Blog Agentic AI in production: the failure modes and cost curves
145 posts
EN ES

Agentic AI in production: the failure modes and cost curves

AI

Agentic AI looks powerful in demos, but its recursive loops create silent financial risks in production. Learn the key failure modes and architectural patterns to prevent them.

The demo looked incredible. An autonomous agent, given a high-level goal like "find the best-value flight and hotel for a conference in Tokyo," broke the problem down, searched APIs, compared options, and even handled a booking failure by trying an alternative. It felt like the future. But when you run these systems live, a different reality emerges, often in the form of a surprising cloud bill.

The very thing that makes an agent powerful—its ability to loop, reflect, and retry—is also its most dangerous failure mode. In deterministic software, an infinite loop freezes a thread. In agentic AI, an infinite loop silently liquidates your bank account.

The Unbounded Reasoning Loop

The idea of a self-correcting agent is compelling. It promises to handle complex decision trees that would be a nightmare to code with explicit if-then logic. This is often implemented with a loop, like the one described in the foundational paper "ReAct: Synergizing Reasoning and Acting in Language Models". The agent thinks about what to do, takes an action with a tool, observes the outcome, and then reasons again. When it works, it's magic.

The problem is that the "end" of the loop is often subjective. When is a research task "complete"? When is the deal "good enough"? For a deterministic process, the termination condition is crisp. For an agent, it's a fuzzy judgment call delegated to an LLM. I've seen systems where an agent, tasked with summarizing a document, got stuck in a refinement loop, continuously trying to make its own summary "just a little bit better" with each pass, burning thousands of tokens per cycle.

This isn't a bug in the LLM. It's a failure in our system architecture. We've built a powerful engine and attached it directly to our wallet, but we've forgotten to install brakes.

Reason and PlanExecute ToolObserve ResultRefine Plan
The Unbounded Agentic Cost Spiral

Two Silent Killers: Runaway Cycles and Cascading Costs

The financial risk comes from two coupled factors: execution cycles and per-cycle cost. They multiply each other with terrifying efficiency.

First, the cycles. An agent can get trapped for perfectly logical reasons. It might encounter a circular dependency in its toolset or get conflicting information that sends it into a back-and-forth debate with itself. In one case I analyzed, an agent designed to manage cloud infrastructure got stuck trying to provision a resource that another part of its own logic was simultaneously decommissioning. It was a race condition played out by an LLM, and every cycle cost money.

Second, the cost per cycle is not just the LLM call. It's the sum of everything the agent does. The LLM inference cost is the most obvious part, but there is also the tool and API cost for any paid service it calls, plus the compute cost for the orchestration layer, vector lookups, and all the "glue" code that holds it together. When an agent is running normally, these costs are manageable. But in a 50-step runaway loop, you pay for everything 50 times over. The bill doesn't grow linearly; it compounds.

Architecture for Survival: Budgets and Circuit Breakers

Hope is not a strategy. While much research focuses on maximizing agent autonomy, I've found that for production systems, the priority must be on defining strict operational boundaries. We have to build guardrails into the agent's execution environment. Relying on the LLM to be "sensible" is not a production-ready solution. The answer lies in borrowing a classic, durable pattern from distributed systems described well by Martin Fowler: the Circuit Breaker.

Here’s what that looks like in practice for an agentic system:

  1. Hard Cycle and Time Limits: Every agent task must have a non-negotiable max_cycles (e.g., 15 steps) and a max_wall_time (e.g., 120 seconds). If it hits either limit, execution halts immediately. This is the simplest backstop against a true infinite loop.
  2. Per-Task Budgeting: This is the most critical control. When a task is initiated, it's allocated a specific budget (e.g., $0.50). The agent's orchestrator must be cost-aware. Before every single step, it must estimate the cost. If the cost exceeds the remaining budget, the step is rejected and the agent is forced into a failure state.
  3. Deterministic Fallbacks: A robust system gracefully fails over to a simpler, deterministic process when a circuit breaker trips. If the smart agent fails to find the "best" hotel deal within its budget, it falls back to a script that just returns the top 3 results from a single, cheap API. This composition is key to reliability.

Observability is Non-Negotiable

You can't control what you can't see. When an agent task completes or fails, you need a detailed audit trail. For every step in the execution trace, you must log the agent's reasoning, the tool it chose, the observation it received back, and the cumulative cost and cycle count at that point in time. This detailed tracing is the only way to debug agent behavior.

This isn't theoretical; it's a practical necessity that's becoming a standard feature. For instance, the documentation for popular frameworks like LangChain includes specific guides on tracking token usage for just this reason. When you see a task consistently failing on step 8 because it exhausted its budget, you can analyze the trace to see why it spent its money so poorly on the first seven steps. This closes the loop, allowing you to refine the prompts, tools, or budgets to build a more efficient system.

INPUTSTask RequestAPI CallUser PromptORCHESTRATIONBudget AllocatorCircuit BreakerCycle LimiterCost EstimatorEXECUTIONLLM AgentDeterministicFallbackTool LibraryAudit LogOUTPUTSFinal AnswerFailure StateLogs
Bounded Agentic Architecture with Cost Controls

Durable Patterns for Bounded Agents

Agentic systems introduce new capabilities, but also new operational risks. Building them to last means treating them not as magical black boxes, but as complex, cost-bearing components that require disciplined engineering. The key is to design for bounded autonomy, where creativity operates within a safe, predictable envelope.

  • Instrument Everything: Add cost estimation and tracking to your agent orchestration layer before you write another line of agent logic.
  • Enforce Hard Limits: Implement non-negotiable cycle counts and time limits for every agent-driven task. No exceptions.
  • Design for Failure: Build deterministic, cheaper fallbacks for when your agent inevitably hits a limit. The happy path is easy; the recovery path makes a system robust.
  • Treat Budgets as a Feature: Think of the task budget as a primary input, just as important as the user's prompt. It defines the operational envelope.

The most durable systems will be hybrids, skillfully blending the creative, exploratory power of LLM agents with the predictable, efficient, and safe execution of deterministic code. The architecture challenge is building the harness that lets the agent run, but never run away.

JC
Juan Cardena
Enterprise Architect, Data & AI

Enterprise architect with 25 years across web, software, data, and AI. MIT CDAO ’25. Writing on agentic AI in production.