The Loop That Started It All: a close read of ReAct (2022)

Paper review. ReAct: Synergizing Reasoning and Acting in Language Models — Yao, Zhao, Yu, Du, Shafran, Narasimhan, Cao, 2022 (arXiv:2210.03629). Companion to my short film, In the Loop.

People keep telling me agentic AI started in 2024, the year the word "agent" was on every slide. It didn't. The idea that matters here was written down quietly in October 2022, in a paper most people cite and few have read closely. Its contribution is not a model and not a benchmark. It is a shape: a loop. And once you see the loop clearly, most of what gets sold as "agency" turns out to be marketing wrapped around one small, durable structural truth.

I build with these systems every day, as an architect, not a researcher. So I want to read ReAct the way I read a system in production: not "is it clever," but "what does it actually buy you, and where does it quietly fail." Let me start before the loop existed.

Before the loop: two half-minds

By mid-2022 large language models could do two impressive things, but usually as separate skills, rarely the two fused. They could reason out loud, chain-of-thought prompting had shown a model could narrate its way through a problem, and they could act, early tool-using systems could call a search engine or a browser. The trouble was that reasoning happened with its eyes closed, and acting happened with its mind off.

Two halves of a mind that had not yet been joined.

This is the honest history, and it matters for the brand of this blog: the loop was not invented from nothing. Robotics had sense-reason-act loops for decades (the classical BDI and SOAR agent architectures). In 2021, WebGPT had a model navigate a browser across multiple steps. In mid-2022, work like Inner Monologue and SayCan were already feeding environment feedback back into an LLM's reasoning, in robotics. So when I say ReAct "started it," I mean something narrower and more defensible: ReAct gave the LLM era one of the cleanest, promptable forms of the loop, the one a builder could pick up and use in an afternoon. That is a real contribution. It just isn't a virgin birth, and pretending otherwise is the kind of silent overstatement this blog exists to avoid.

Thousands of cold, scattered points of light drifting unaligned in deep navy — reasoning without grounding. — Reasoning without contact: confident, luminous, and pointed at nothing in particular.

The fusion: think, act, observe, repeat

ReAct's move is almost embarrassingly simple. Make the model produce, in one interleaved stream, both a Thought ("I should look up X") and an Action (actually look up X), then feed back an Observation (what came back), and let it think again. The paper's own words:

"...we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between reasoning and acting."

One subtlety that almost every explainer gets wrong, including my own first draft: the loop is not all "inside the model." The model writes the Thought and the Action. The Observation comes from outside, from a search engine, an API, a file system, a test runner, a compiler. That asymmetry is the entire point, and it is the hinge the rest of this piece turns on.

Read as feedback control: the Observation returns the gap between what the model expected and what the environment reports.

The dark geometry: the loop is error-correction

Here is the part worth slowing down for, because it is where the loop stops being a diagram and becomes a principle. We call ReAct "agentic," but a loop is not agency. A thermostat has a loop. What ReAct actually is, structurally, is error-correction, feedback control. The Thought proposes, the Action intervenes, and the Observation returns the gap between what the model expected and what the world says. The next step is then constrained by what happened, not merely by what sounded plausible.

That reframing changes the hero of the story. Most writing about agents centers the model, its reasoning, its cleverness. But the real protagonist is the Observation, and behind it, the environment. An ungrounded language model is a closed system: left to talk to itself, it drifts toward fluent, confident nonsense, coherence with no contact. Every trustworthy Observation is a small injection of outside order, the compiler that refuses to lie, the test that goes red, the search result that contradicts the plan. The loop's deepest function is not motion. It is to make error visible so that no step lies quietly. That is the whole brand of this blog in one sentence, and ReAct is where I can finally point at the mechanism.

A luminous golden ring, a test failing red then resolving to clean green as the loop runs — error correction. — The loop is not motion, it is correction: the red is a failure caught, not hidden.

The loop learns, in memory, not in weights

Five months later, in 2023, Reflexion (lead author Noah Shinn, with ReAct's Shunyu Yao among the coauthors, arXiv:2303.11366) added a second turn to the screw. After a failure, the agent writes a short verbal reflection, "I assumed the file was sorted; it wasn't", and keeps it in an episodic memory, so the next attempt is better. This is the trial-to-trial improvement effect, and it is the moment the loop appears to learn.

I want to be precise, because the hype is not: Reflexion does not train the model. No weights change. It learns in context, by carrying a written lesson forward. That is powerful and cheap, and it is also fragile, because a memory can preserve a wrong lesson just as faithfully as a right one. Which brings us to the chapter almost nobody writes.

When the loop lies quietly

The seductive version of this story ends at the previous section: the agent reasons, acts, observes, reflects, and converges on truth. The honest version does not, and an architect who has shipped these systems knows exactly where the floor gives way:

The Observation is not truth. Tools return mediated signals, stale search, a poisoned web page, a flaky test, a misleading benchmark. "Grounding" is only as honest as the instrument doing the grounding.
The loop can launder error. A wrong first premise leads to a wrong tool call, which retrieves confirming evidence, which makes the model more confident. Iteration does not guarantee correction; sometimes it is just recursive rationalization with better footnotes.
Reflection can rationalize instead of repair. The model must diagnose its own failure using the same flawed reasoning that caused it. Often it writes a tidy, wrong lesson and marches on.
Loops are expensive, and they drift. Tokens, latency, API quota, and real money. In practice quality often degrades after a handful of turns. "Every answer is earned" needs an unglamorous companion: not every answer is worth earning this way.

None of this means the loop is a lie. It means the loop is a tool, with a blade and a handle, and most demos only ever show you the handle.

From loop to agentic development: "earned" is a contract

Point the loop at a codebase and it becomes a coding agent: read the file, run the test, read the failure, fix, run again. This is why ReAct's little loop is one ancestor of many agentic dev tools you use now. But shipping one taught me that "the agent earns its answer" cannot be a vibe. It has to be a contract, with every link present:

Miss any link and the answer is not earned, only fluent.

And the last link, the stop, is where the human belongs. Not as a sentimental "human in the loop" gesture, but with specific decision rights: approve the irreversible actions, inspect the evidence, set the risk tolerance, own the final judgment, and pull the cord on a loop that has started spending money to convince itself. The honest reading of the human at the center is uncomfortable: we are still there because the loop is not closed. We have not solved autonomous grounding. The person is not decoration; the person is the part that isn't finished.

A single still point of warm gold light at the center of a fast-spinning ring blurred into one calm circle on deep navy. — Earned: violent iteration at the edges, a human holding the still center.

What an architect keeps

Strip away the format and the hype, and ReAct leaves one durable idea: a model becomes useful the moment it is forced to leave language, touch the world, receive a correction, and carry that correction forward. Much of the recent LLM-agent stack, planning, memory, multi-agent teams, can be read as scaffolding around that primitive. The loop is leverage on a capable model plus a trustworthy observation plus a human who owns the stop. Take any of those three away and iteration becomes only a more elaborate way to be wrong.

That is why the tool was always the fashion and the principle is the work. The ReAct prompt format will date; some of us already write agents that look nothing like it. But the shape underneath, propose, intervene, get corrected, decide whether you are done, is what makes an answer something other than a confident guess. On this blog the AI layer has one slogan, and ReAct is finally where I can show you the machinery behind it: every answer is earned.

The short film In the Loop is the wordless version of this idea: fast at the edges, still at the center. A deeper field guide (the failure modes, security, evaluation, when not to loop) is on the way.