A field guide to ReAct, Reflexion, and why agentic development is error-correction, not magic.
Agentic AI is not intelligence in a box. It is a loop: reason, act, observe, repeat. This guide reads two papers that shaped the modern LLM-agent loop (ReAct, 2022; Reflexion, 2023), then tells you the part the demos leave out, how the loop quietly lies, and how to build one that earns its answers.
People keep telling me agentic AI started in 2024, the year "agent" was on every slide. It didn't. The idea that matters was written down quietly in October 2022, and its lasting contribution is not a model or a benchmark. It is a shape: a loop.
Once you see the loop clearly, most of what gets sold as "agency" turns out to be marketing wrapped around one small, durable structural truth. This guide is for the builder who wants that truth, not the slide. I read these systems the way I read a system in production: not "is it clever," but "what does it buy you, and where does it quietly fail."
And I want to be honest about history, because the brand of everything I write is trust. The loop was not invented from nothing. Robotics had sense-reason-act loops for decades (the classical BDI and SOAR architectures). In 2021, WebGPT had a model navigate a browser across multiple steps. In mid-2022, Inner Monologue was already feeding environment feedback into an LLM's reasoning, and SayCan was grounding an LLM's action selection in robotic affordances. So when I say ReAct "started it," I mean something narrower and defensible: ReAct gave the LLM era the cleanest, promptable form of the loop, the one a builder could pick up and use in an afternoon. That is a real contribution. It just isn't a virgin birth.
ReAct's move is almost embarrassingly simple. Make the model produce, in one interleaved stream, both a Thought ("I should look up X") and an Action (actually look up X), then feed back an Observation (what came back), and let it think again. In the authors' words: it generates "both reasoning traces and task-specific actions in an interleaved manner."
Here is the subtlety almost every explainer gets wrong, including my own first draft. The loop is not all "inside the model." The model writes the Thought and the Action. The Observation comes from outside: a search engine, an API, a file system, a test runner, a compiler. That asymmetry is the entire point.
We call this "agentic," but a loop is not agency; a thermostat has a loop. Structurally, ReAct is error-correction. The Thought proposes, the Action intervenes, and the Observation returns the gap between what the model expected and what the environment reports. The next step is constrained by what happened, not by what merely sounded plausible.
That reframing changes the hero of the story. Most writing centers the model and its cleverness. The real protagonist is the Observation, and behind it the environment. An ungrounded model is a closed system: left to talk to itself, it drifts toward fluent, confident nonsense. Every genuine Observation is a small injection of outside order, the compiler that refuses to lie, the test that goes red.
In 2023, Reflexion (lead author Noah Shinn, with ReAct's Shunyu Yao among the coauthors) added a second turn to the screw. After a failure, the agent writes a short verbal reflection, "I assumed the file was sorted; it wasn't", and keeps it in an episodic memory, so the next attempt is better. This is the trial-to-trial improvement effect.
Be precise, because the hype is not: Reflexion does not train the model. No weights change. It learns in context, by carrying a written lesson forward. That is powerful and cheap. It is also fragile, because a memory can preserve a wrong lesson just as faithfully as a right one, and the model must diagnose its own failure using the same reasoning that caused it.
The seductive version of this story ends at the previous chapter: the agent reasons, acts, observes, reflects, and converges on truth. The honest version does not, and anyone who has shipped these systems knows exactly where the floor gives way.
None of this means the loop is a lie. It means the loop is a tool, with a blade and a handle, and most demos only ever show you the handle.
Point the loop at a codebase and it becomes a coding agent: read the file, run the test, read the failure, fix, run again. This is why ReAct's little loop is one ancestor of many agentic dev tools you use. But "the agent earns its answer" cannot be a vibe. It has to be a contract, with every link present.
Miss any link and the answer is not earned, only fluent. In practice that means five disciplines:
The uncomfortable truth in the human gate: the person is not decoration. The person is the part that isn't finished. We are still there because the loop is not closed; we have not solved autonomous grounding.
The moment your loop can act, it can be made to act for someone else. Treat these as first-class, not afterthoughts:
If you cannot measure it, you are not running an agent; you are running a slot machine with good manners. The minimum kit:
Strip away the format and the hype, and ReAct leaves one durable idea: a model becomes useful the moment it is forced to leave language, touch the world, receive a correction, and carry that correction forward. Much of the recent LLM-agent stack, planning, memory, multi-agent teams, can be read as scaffolding around that primitive.
The loop is leverage on a capable model, plus a trustworthy observation, plus a human who owns the stop. Take any of the three away and iteration becomes only a more elaborate way to be wrong. That is why the tool was always the fashion and the principle is the work. The ReAct prompt format will date; some of us already write agents that look nothing like it. But the shape underneath, propose, intervene, get corrected, decide whether you are done, is what makes an answer something other than a confident guess.