Skip to content

How to Set Up a Planner-Coder-Reviewer Multi-Agent AI Workflow

Problem

I kept running into the same issue with AI coding assistants. I’d ask the AI to implement a feature, the code would work, but I’d still need 2-3 manual refactoring passes to clean up poor naming, missing edge cases, and accumulated technical debt.

my-typical-workflow.txt
Me: Implement user authentication
AI: [generates working code]
Me: Refactor for better naming
AI: [renames some things]
Me: Add error handling for edge cases
AI: [adds try-catch blocks]
Me: Fix the duplicated logic in validators
AI: [refactors validators]
Me: [still finds issues, gives up and fixes manually]

This manual intervention defeated the whole point of using AI. The problem wasn’t that the AI couldn’t code - it was that the coding agent couldn’t effectively review its own work.

What I tried first

My initial approach was to use a single agent for everything:

single-agent-approach.py
class SingleAgentWorkflow:
def __init__(self):
self.agent = Agent(model="claude-sonnet-4")
async def implement(self, request: str):
# Plan
plan = await self.agent.plan(request)
# Code
code = await self.agent.implement(plan)
# Review - SAME AGENT, SAME CONTEXT
review = await self.agent.review(code)
if not review.is_approved:
# Agent "fixes" its own mistakes
code = await self.agent.fix(code, review.suggestions)
return code

This didn’t work. The coding agent had already filled its context window with implementation details. When I asked it to review, it would miss the same issues it created - it couldn’t see its own blind spots.

The insight

In a Reddit discussion about AI coding tools, someone mentioned:

“A separate reviewer agent with their own context window can find these problems for you, send it back to the coding agent to fix it. With this you can say goodbye to the ugly stuff you don’t want to live with.”

The key phrase was “separate context window”. A fresh agent, untainted by the implementation process, could see what the coder missed.

Multi-agent architecture

I redesigned my workflow with specialized agents:

agent-roles.txt
+------------------+-------------------+------------------+
| Agent | Role | Best Model |
+------------------+-------------------+------------------+
| Planner | Break down tasks | Opus (reasoning) |
| Coder | Implement code | Sonnet (coding) |
| Reviewer | Find issues | Opus (analysis) |
| Tester | Generate/run tests| Haiku (fast) |
+------------------+-------------------+------------------+

Each agent has:

  1. A specific role
  2. Its own context window
  3. An appropriate model for the task

The workflow

Here’s the architecture I implemented:

multi-agent-workflow.txt
+----------------+
| Developer |
| Request |
+-------+--------+
|
v
+-------+--------+
| Planner Agent | (Opus - deep reasoning)
| - Analyze |
| - Break down |
| - Create plan |
+-------+--------+
|
v
+-------+--------+
| Implementation |
| Plan |
+-------+--------+
|
v
+---------------+---------------+
| |
v v
+-------+--------+ +-------+--------+
| Coder Agent | <---------- | Reviewer Agent | (Opus - fresh context)
| (Sonnet) | feedback | - Find bugs |
| - Implement | | - Check style |
| - Fix issues | | - Edge cases |
+-------+--------+ +-------+--------+
| |
| +--------+ |
+->| Issues |<------------------+
| Found? |
+---+----+
|
+-------+-------+
| |
v v
[Yes] [No]
| |
| v
| +--------+--------+
| | Tester Agent | (Haiku - fast)
| | - Run tests |
| | - Check coverage|
| +--------+--------+
| |
| v
| +--------+--------+
| | Tests Pass? |
| +--------+--------+
| |
| +-------+-------+
| | |
| v v
| [Yes] [No]
| | |
| | +---> Back to Coder
| v
| +----+----+
| | Merged |
| | Code |
| +---------+
|
+---> Back to Coder with feedback

Implementation with LangGraph

I implemented this using LangGraph:

multi_agent_workflow.py
from langgraph.graph import StateGraph, END
from typing import TypedDict, Optional
class WorkflowState(TypedDict):
request: str
plan: Optional[str]
code: Optional[str]
review: Optional[dict]
test_results: Optional[dict]
iterations: int
MAX_ITERATIONS = 3
async def planner_node(state: WorkflowState) -> WorkflowState:
"""Analyze request and create implementation plan."""
planner = Agent(model="opus", role="planner")
plan = await planner.create_plan(state["request"])
return {**state, "plan": plan}
async def coder_node(state: WorkflowState) -> WorkflowState:
"""Implement code based on plan."""
coder = Agent(model="sonnet", role="coder")
code = await coder.implement(
plan=state["plan"],
previous_code=state.get("code"),
feedback=state.get("review", {}).get("suggestions")
)
return {**state, "code": code}
async def reviewer_node(state: WorkflowState) -> WorkflowState:
"""Review code with fresh context window."""
reviewer = Agent(model="opus", role="reviewer")
review = await reviewer.review(
code=state["code"],
plan=state["plan"]
)
return {
**state,
"review": review,
"iterations": state["iterations"] + 1
}
async def tester_node(state: WorkflowState) -> WorkflowState:
"""Run tests on the code."""
tester = Agent(model="haiku", role="tester")
results = await tester.run_tests(state["code"])
return {**state, "test_results": results}
def should_continue_coding(state: WorkflowState) -> str:
"""Decide if more coding iterations needed."""
if state["iterations"] >= MAX_ITERATIONS:
return "approve"
if state["review"]["is_approved"]:
return "approve"
return "fix"
def test_decision(state: WorkflowState) -> str:
"""Decide based on test results."""
if state["test_results"]["passed"]:
return "done"
return "fix"
def create_workflow():
workflow = StateGraph(WorkflowState)
# Add nodes
workflow.add_node("plan", planner_node)
workflow.add_node("code", coder_node)
workflow.add_node("review", reviewer_node)
workflow.add_node("test", tester_node)
# Define flow
workflow.set_entry_point("plan")
workflow.add_edge("plan", "code")
workflow.add_edge("code", "review")
# Conditional: review -> code or test
workflow.add_conditional_edges(
"review",
should_continue_coding,
{"fix": "code", "approve": "test"}
)
# Conditional: test -> done or code
workflow.add_conditional_edges(
"test",
test_decision,
{"fix": "code", "done": END}
)
return workflow.compile()

Running the workflow

run_workflow.py
async def main():
workflow = create_workflow()
result = await workflow.ainvoke({
"request": "Implement user authentication with JWT tokens",
"iterations": 0
})
print(f"Completed in {result['iterations']} iterations")
print(f"Final code: {result['code']}")
# Output
# Completed in 2 iterations
# Review issues found: Missing token expiration, No refresh token logic
# Fixed in iteration 2
# Tests passed: 12/12

Why this works

1. Context isolation

The reviewer starts with a clean slate. It doesn’t know the implementation decisions the coder made, so it can objectively evaluate the code.

context-window-comparison.txt
Single Agent:
+------------------------------------------+
| Request | Plan | Code | Review Context |
| | | | (tainted by |
| | | | implementation) |
+------------------------------------------+
=> Reviewer sees what coder expects it to see
Multi-Agent:
+-------------+ +-------------+
| Planner | | Reviewer |
| [Fresh] | | [Fresh] |
+-------------+ +-------------+
| ^
v |
+-------------+ |
| Coder |----------->+
| [Fresh] | passes code only
+-------------+ (no implementation bias)
=> Reviewer sees code objectively

2. Specialized models

Different tasks need different capabilities:

model-selection.py
MODEL_CONFIG = {
"planner": "opus", # Deep reasoning, task breakdown
"coder": "sonnet", # Best coding model
"reviewer": "opus", # Strong reasoning for edge cases
"tester": "haiku", # Fast, cheap for test generation
}

Using Opus for planning and review ensures deep reasoning where it matters. Using Haiku for testing reduces cost on routine tasks.

3. Automated feedback loop

The workflow doesn’t stop when issues are found. It routes feedback back to the coder automatically:

feedback-loop.py
# Reviewer finds issues
review = {
"is_approved": False,
"suggestions": [
"Add input validation for email field",
"Handle null case in user lookup",
"Extract duplicate validation logic"
]
}
# Coder receives focused feedback
coder.implement(
plan=plan,
previous_code=code,
feedback=review["suggestions"] # Targeted fixes
)

Common mistakes

When I first implemented this, I made several mistakes:

Mistake 1: Same model for all agents

mistake-1.py
# WRONG: Using expensive Opus for everything
agents = {
"planner": Agent(model="opus"),
"coder": Agent(model="opus"), # Overkill
"reviewer": Agent(model="opus"),
"tester": Agent(model="opus"), # Wasteful
}

This worked but cost 3x more than necessary.

Mistake 2: Skipping the planning phase

mistake-2.py
# WRONG: Coder without plan
code = await coder.implement(request) # Inconsistent results

Without a plan, the coder made inconsistent design decisions across iterations.

Mistake 3: Shared context between coder and reviewer

mistake-3.py
# WRONG: Passing full context to reviewer
review = await reviewer.review(
code=code,
plan=plan,
coder_thoughts=coder.reasoning, # Taints review!
decisions=coder.decisions
)

This defeated the whole purpose - the reviewer became biased by the coder’s rationale.

Mistake 4: No termination criteria

mistake-4.py
# WRONG: Could run forever
while not review.is_approved:
code = await coder.fix(code, review.suggestions)
review = await reviewer.review(code)

Without a maximum iteration count, the loop could run indefinitely on contentious issues.

The results

After implementing the multi-agent workflow:

results-comparison.txt
Before (single agent):
- Average iterations: 3+ manual refactoring passes
- Code quality: Often needs cleanup
- Time spent: 30-60 min per feature
- Technical debt: Accumulated
After (multi-agent):
- Average iterations: 2-3 automated passes
- Code quality: Reviewer catches issues
- Time spent: 5-10 min review only
- Technical debt: Caught before merge

Summary

In this post, I showed how to set up a planner-coder-reviewer multi-agent AI workflow. The key insight is that separate context windows for each agent enable effective self-correction - the reviewer sees what the coder missed.

Start with three agents (planner, coder, reviewer) using appropriate model tiers. Add a tester agent once the core loop works. Define clear quality gates to prevent infinite loops. The result: cleaner code, less manual intervention, and a sustainable AI coding practice.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments