jcardena.com Blog Council of Mirrors: how a human and a council of models actually decide
152 posts
EN ES

Council of Mirrors: how a human and a council of models actually decide

AI

A human convening a council of reasoners that debate and converge beats a smarter single model — and the dark geometry of how consensus can look like agreement while being quietly wrong.

Council of Mirrors: how a human and a council of models actually decide
Field note. On human + multi-agent consensus, grounded in Du et al. 2023 (Multi-Agent Debate), Wang et al. 2022 (Self-Consistency), and Mixture-of-Agents 2024. Companion to my short film, The Fifth Mind. This piece was itself defined by a council of models — which is either cheating or the point.

Every day I ask machines hard questions, and the most dangerous answer I get is a confident one. A single model, however strong, answers from one angle. It has a blind spot it cannot see, not because it is stupid, but because it is one viewpoint, and one viewpoint always casts a shadow it stands inside. The temptation is to fix this with a bigger model. The more durable fix is structural, and older than AI: when a decision matters and you are not sure, you don't ask one expert louder. You convene a council.

This is the idea I want to read carefully, the way I read a system in production rather than a pitch: a human working not with a model but with several reasoners that argue toward consensus. What it actually buys you, how one reasoner sharpens another, and, the part almost nobody writes, where consensus quietly lies.

One mind, one shadow

Start with the failure the council exists to fix. A capable model, asked a hard question, produces something fluent, structured, convincing. You can circle it, probe it, ask it to critique itself, and it will, brilliantly, from the same angle that produced the blind spot in the first place. Self-critique inside one model is a flashlight trying to illuminate its own shadow: the harder it strains, the sharper the shadow it casts. The problem isn't intelligence. It's that a closed system has no outside.

A single white beam on a symmetrical object with one quadrant lost in shadow.
One light, however pure, leaves a quadrant it can never reach from where it stands.

From answers to decisions

Here is the reframing that makes the rest click, and it is the thesis of the whole piece. A council does not exist to produce a better answer. It changes the unit of work from answer generation to decision formation. A lone model hands you a conclusion. A council hands you something more useful and more honest: the assumptions in play, the objections that survived, the minority report that wouldn't die, the boundary of what's actually known. It does not vote truth into existence. It makes disagreement legible, so a human can finally see the shape of their own blind spot in the gaps between the reasoners.

Consensus is not the absence of conflict. It is conflict made useful.

Note what does not move: responsibility. The council widens cognition; it never inherits accountability. The human stays the one who decides and the one who answers for it. Keep that fixed point in mind, because the dark geometry below is mostly the story of systems that blur it.

The room you build

The shape that works in practice is not a flat panel of equals shouting. It is one orchestrator that works directly with the human, plus a handful of reasoners that answer independently and then critique each other. The orchestrator's job is not to be the smartest voice; it is to set the question, protect independence, decide what evidence counts, and convert the resulting tension into something the human can own.

HumandecidesOrchestr.convenesblueambervioletwhitereasoners answer independently, then critique each other
Not a flat panel: a human + orchestrator, then independent reasoners that debate before any synthesis.

Why independence first? Because the value is in the parallax. Two eyes see depth only because they see from different points; collapse them together and you lose the third dimension. The same question put to genuinely different reasoners returns the same problem seen as a key, a blade, a seed, a mirror, and the differences are the data. Let one reasoner hear another's polished answer too early and you don't get debate; you get an echo with extra steps.

The modules, and when each earns its keep

"Multi-agent" is not one technique; it's a small toolbox, and using the wrong tool is most of how these systems fail. The ones that matter:

ModuleWhen it actually helps
DebateOpen problems with competing valid framings — surface the rebuttals before you synthesize.
Self-consistency / votingDiscrete, checkable answers (math, code, logic) where single samples are noisy but the truth is verifiable.
LLM-as-judgeRanking many candidates cheaply — but rotate the judge and watch it for position and verbosity bias.
Mixture-of-agentsLayered work: one layer proposes, another critiques, a final layer synthesizes — stitching different strengths.
Devil's advocate / red teamHigh-stakes, high-ego calls where the real risk is everyone agreeing too smoothly.
Orchestrator-synthesisThe one that ultimately matters: turns the tension into a decision the human can own, and names what dissent survives.

What the research actually shows

This is not just an intuition, though it's worth being precise about what each result actually is, because they are not the same mechanism. In 2022, Self-Consistency (Wang et al., arXiv:2203.11171) sampled many reasoning paths from a single model and took the convergent answer, beating the one greedy path. That is not yet a council of different minds, it's one mind allowed to think more than once, but it's the cleanest evidence that diversity-then-aggregation helps. In 2023, Multi-Agent Debate (Du et al., arXiv:2305.14325) had several model instances propose, read each other, and revise across rounds, improving factuality and reasoning in their tested settings (it did not abolish hallucination). In 2024, Mixture-of-Agents (arXiv:2406.04692) layered proposers and synthesizers, closer to structured aggregation than free debate, so a set of open models could together rival a larger one on benchmarks.

I want to be precise about what these do and don't prove, because the hype rounds the corners off. They show that, on the right kind of task and with enough genuine diversity, structure beats scale. They do not show that more agents are always better, that consensus equals correctness, or that any of this removes the need for a human to decide. The fine print is the whole story, and the fine print is where the geometry turns dark.

The dark geometry of consensus

This is the section to slow down for. Everything above is the bright side, the part that ships in a slide deck. Here is what I have learned to watch for, the failure modes that look exactly like success.

Five identical masks of light in a row, the same face wearing different colored glows.
The trap that looks most like rigor: five masks, one face.

1. Correlated priors, or consensus among copies. Frontier models are trained on overlapping data and tuned for similar helpfulness. Ask five of them and you may not get five minds; you may get one prior wearing five masks. Their agreement then certifies nothing, it is duplication mistaken for verification. What looks like a debate can be phase-shifted autocorrelation, and a council of near-copies is just overfitting with better production values. Consensus among copies is not validation. It is duplication.

2. The confidence illusion. A three-to-one majority of confident, articulate, jointly wrong agents is more dangerous than open disagreement, because humans read unanimity as proof. Smooth agreement should raise your suspicion, not lower it.

3. Sycophantic anchoring. Whoever speaks first, eloquently, tends to write the room's constitution. Later reasoners drift toward that frame even while appearing to critique it, orbiting a boundary nobody voted on. First-mover advantage in token-space is real and quietly decisive.

decision qualityagents added (by similarity)a weak-but-different voice helpsa strong-but-correlated one hurts
"More agents" is not monotonic: independence, not headcount, is what buys accuracy.

4. The diversity–accuracy tradeoff is real. Adding a weaker but genuinely different reasoner can improve a council; adding a stronger but highly correlated one can degrade it by reinforcing a shared error. The instinct to "add the best model" is often exactly wrong. A vote is only useful when the voters are allowed to be wrong in different ways.

5. The orchestrator's shadow. The human's own bias leaks in through three doors: how the question is framed, which agents are chosen, and how the final answer is synthesized. A council can become an expensive machine for laundering your original opinion and handing it back with citations. The most dangerous council is not the one that fights you; it is the one that flatters you with rigorous argument. And synthesis is where the best dissent quietly dies, the polite sentence "some concerns remain," followed by a decision that behaves as if they don't.

6. The evaluation gap, the blind spot inside this very essay. Everything above quietly assumes you can tell when the council got it right. Often you cannot. On questions with no ground truth, or where every reasoner shares the same hidden blind spot, a council can converge beautifully and be confidently wrong, and the convergence makes the answer sound more reliable, not less. The lone violet dissent in my film might have been the voice that was right. Several fluent agents agreeing can manufacture confidence the way a committee manufactures consensus, by exhausting objection rather than finding truth. Architecture does not dissolve this. It moves the judgment problem up one level, to whoever decides the council is finished, which is the same hand that has to own being wrong. And that hand is not neutral: faced with a confident, unanimous-looking council, a tired human tends to ratify, not re-examine. Authority bias doesn't vanish because the authority became plural, it gets louder. Making dissent visible is not the same as making a human heed it.

Consensus systems do not remove judgment. They move judgment into architecture.

Designing an honest council

If the geometry is that treacherous, why build one at all? Because the alternative, one confident voice in a closed room, is worse, and because the failure modes are designable-against once you can name them. What I try to do:

  • Force real independence. Reasoners answer before they see each other. Diversity of source and framing beats diversity of temperature.
  • Reward useful wrongness. Pick voices allowed to fail in different ways, not the four strongest near-copies.
  • Protect the dissent. The minority report must survive into the final output as itself, not be averaged into a beige middle.
  • Name the tie-breaker out loud. Majority, judge model, confidence, or human? Each is a form of government. Choose it deliberately, because tie-breaking is hidden governance.
  • Keep the human accountable. The council informs; it never absolves.

The fifth mind

When a council works, something appears that none of the individual reasoners had alone: not the loudest view, not the average, but a composition that holds the tension instead of erasing it. I want to be careful here, because this is exactly where the piece could overclaim: that fifth thing is a better composite, not a higher truth. It can still be wrong. What it gains is not objectivity but the preservation of tension a single answer would have hidden, and that is only useful if the structure is honest enough not to flatten it. In the film I made to sit beside this piece, four lights converge into a solid that emits a fifth color, but it flickers unstably before it settles, and it holds its shape only because one small dissenting light stays lit outside it, load-bearing. That is the honest picture of consensus: composition, not compromise, provisional, with the dissent preserved rather than extinguished.

An impossible solid emitting a new gold-white color, with one violet light staying lit at its edge.
The fifth color emerges from the structure, not from any one voice, and the dissent stays lit.

But the last move belongs to a person. The council can argue itself into a brilliant, well-cited, internally consistent answer and still be wrong in a way only a human standing outside the room can feel. So the structure does not end the act of judgment; it relocates it, into who you convene, what you let them disagree about, which dissent you refuse to bury, and the moment you stop the argument and decide. The mirror argues so you no longer have to. Then you, the fifth mind, step into the flow and choose. That part was never going to be automated, and it shouldn't be.

A note on how this was made — and how I work

I should tell you something, because it is the most honest argument in this whole piece: I did not write it with one model. The essay, the film, and the field guide were each built the way I build everything now — by convening a council. I put the work in front of GPT‑5.4, Grok, DeepSeek and others, let them disagree, and one of them caught me committing the exact overclaim this piece warns against. I kept what survived the argument. The judgment — and the mistakes — are mine.

And I do not stop at agreement. I question their answers even when they look right, and push them to defend it from a different angle — because a confident consensus is exactly where a blind spot hides. That refusal to take the agreement at face value is the human staying in the loop. The council does not get the last word. I do.

This is how I actually work in 2026: I do not hand my hardest questions to a single model and trust the confident answer. I leverage the best of what the market offers, set them against each other, interrogate the result, and keep the human seat. Not one model. The best of many — and the human who decides. A council talking about councils. If that feels recursive, good — it is the method proving itself.

— Juan Cardena

Watch the companion film, The Fifth Mind, and read the field guide of the same name.

JC
Juan Cardena
Enterprise AI & Agentic Systems Architect

Enterprise architect with 25 years across web, software, data, and AI. MIT CDAO ’25. Writing on agentic AI in production.