MCP and tool-use: first impressions from a systems person

The first time I saw a large language model parse a complex request into a perfect JSON payload, I felt that familiar mix of excitement and dread. The excitement was for the obvious power: a truly natural language interface. The dread was for the 3 AM page when that probabilistic, creative system would inevitably clash with a deterministic API that demands perfection.

For two decades, my work has been about building reliable systems on top of contracts. APIs, schemas, and type systems are the load-bearing walls of enterprise software because they are ruthlessly explicit. An LLM is the opposite. It operates on suggestion and semantic likelihood. The architectural gap between those two worlds is where production systems fail.

The Non-Negotiable API Contract

The core friction is simple: an API contract is a promise, while an LLM's output is a high-quality suggestion. Your backend service has a function signature like update_order(order_id: int, status: str) where status must be one of "PENDING", "SHIPPED", or "CANCELLED". That's not a guideline; it's a law enforced by the runtime.

The happy path shown in many examples, like the official OpenAI function-calling guide, makes this look simple. And it is, for a demo. But when prompted to "cancel order 12345", a model might one day generate {"status": "CANCELED"} with one 'L', or {"status": "voided"}. To a human, the intent is clear. To the API endpoint, it’s a 400 Bad Request. Every time. This isn't a bug in the model; it's a feature of its probabilistic nature.

The LLM-to-API Impedance Mismatch

The Hardened Adapter: Your System's Shield

In traditional software, input validation protects you from malicious users. In an agentic system, it serves a new role: protecting your deterministic core from your probabilistic brain. You cannot let the LLM call your business logic directly. There must be a hardened "adapter" layer in between.

This layer’s only job is to catch the LLM's output, parse it, and validate it against a strict schema. For this, I've found it's essential to use a library like Pydantic, or even better, a purpose-built tool like Jason Liu's instructor which is designed for this exact problem of getting structured, validated data from models. The adapter then either conforms the data or rejects it.

This pattern also solves the "hallucinated parameter" problem. If the model helpfully adds a "reason": "user request" field that your function doesn't accept, a well-designed adapter simply strips it away, logs it, and passes the valid fields on. The system absorbs the probabilistic weirdness so the deterministic core never sees it.

Planning vs. Execution: A Durable Pattern

After building a few of these systems, a pattern emerges that feels durable because it embraces the strengths of both worlds. The LLM is used for planning, and deterministic code is used for execution.

This separation of concerns isn't a new idea; it builds on established concepts for agentic reasoning, like the ReAct framework described in the paper "ReAct: Synergizing Reasoning and Acting in Language Models" from Google Research. The model's job is to understand ambiguous language and produce a structured plan: "first, call get_user_by_email, then call cancel_order."

That plan is then handed to a deterministic orchestrator—a simple state machine or workflow function. This code is responsible for actually executing the steps. It calls the adapter, makes the real API call, handles the return value, and manages state and retries. The LLM is a "sense and plan" engine; the "act" cycle is handled by boring, reliable code that you can actually unit test.

Durable Architecture for an LLM Agent

What This Means in Production

The initial hype around tool-use suggests a world where an LLM can safely manipulate all our software. The reality is that we need to build new architectural joints to connect these two domains safely. For anyone building these systems, I've found a few principles hold up.

Treat the LLM as an Untrusted User. The single most important mental model is to treat any output from an LLM with the same skepticism you'd apply to un-sanitized input from a web form.
Build a Hardened Adapter. Don't let the model's output touch your core systems directly. A dedicated validation and adaptation layer is not optional; it is the central piece of a reliable system.
Separate Planning from Execution. Use the LLM to create a plan of action. Use deterministic code to execute that plan, manage state, and handle failures. This gives you flexible understanding and reliable execution.
Embrace the Boring. The solutions that hold up at 3 AM use new tech for a narrow purpose, surrounded by well-understood patterns for reliability and observability. That hasn't changed.