How to Set Up a Planner-Coder-Reviewer Multi-Agent AI Workflow
Problem
I kept running into the same issue with AI coding assistants. I’d ask the AI to implement a feature, the code would work, but I’d still need 2-3 manual refactoring passes to clean up poor naming, missing edge cases, and accumulated technical debt.
Me: Implement user authenticationAI: [generates working code]Me: Refactor for better namingAI: [renames some things]Me: Add error handling for edge casesAI: [adds try-catch blocks]Me: Fix the duplicated logic in validatorsAI: [refactors validators]Me: [still finds issues, gives up and fixes manually]This manual intervention defeated the whole point of using AI. The problem wasn’t that the AI couldn’t code - it was that the coding agent couldn’t effectively review its own work.
What I tried first
My initial approach was to use a single agent for everything:
class SingleAgentWorkflow: def __init__(self): self.agent = Agent(model="claude-sonnet-4")
async def implement(self, request: str): # Plan plan = await self.agent.plan(request)
# Code code = await self.agent.implement(plan)
# Review - SAME AGENT, SAME CONTEXT review = await self.agent.review(code)
if not review.is_approved: # Agent "fixes" its own mistakes code = await self.agent.fix(code, review.suggestions)
return codeThis didn’t work. The coding agent had already filled its context window with implementation details. When I asked it to review, it would miss the same issues it created - it couldn’t see its own blind spots.
The insight
In a Reddit discussion about AI coding tools, someone mentioned:
“A separate reviewer agent with their own context window can find these problems for you, send it back to the coding agent to fix it. With this you can say goodbye to the ugly stuff you don’t want to live with.”
The key phrase was “separate context window”. A fresh agent, untainted by the implementation process, could see what the coder missed.
Multi-agent architecture
I redesigned my workflow with specialized agents:
+------------------+-------------------+------------------+| Agent | Role | Best Model |+------------------+-------------------+------------------+| Planner | Break down tasks | Opus (reasoning) || Coder | Implement code | Sonnet (coding) || Reviewer | Find issues | Opus (analysis) || Tester | Generate/run tests| Haiku (fast) |+------------------+-------------------+------------------+Each agent has:
- A specific role
- Its own context window
- An appropriate model for the task
The workflow
Here’s the architecture I implemented:
+----------------+ | Developer | | Request | +-------+--------+ | v +-------+--------+ | Planner Agent | (Opus - deep reasoning) | - Analyze | | - Break down | | - Create plan | +-------+--------+ | v +-------+--------+ | Implementation | | Plan | +-------+--------+ | v +---------------+---------------+ | | v v +-------+--------+ +-------+--------+ | Coder Agent | <---------- | Reviewer Agent | (Opus - fresh context) | (Sonnet) | feedback | - Find bugs | | - Implement | | - Check style | | - Fix issues | | - Edge cases | +-------+--------+ +-------+--------+ | | | +--------+ | +->| Issues |<------------------+ | Found? | +---+----+ | +-------+-------+ | | v v [Yes] [No] | | | v | +--------+--------+ | | Tester Agent | (Haiku - fast) | | - Run tests | | | - Check coverage| | +--------+--------+ | | | v | +--------+--------+ | | Tests Pass? | | +--------+--------+ | | | +-------+-------+ | | | | v v | [Yes] [No] | | | | | +---> Back to Coder | v | +----+----+ | | Merged | | | Code | | +---------+ | +---> Back to Coder with feedbackImplementation with LangGraph
I implemented this using LangGraph:
from langgraph.graph import StateGraph, ENDfrom typing import TypedDict, Optional
class WorkflowState(TypedDict): request: str plan: Optional[str] code: Optional[str] review: Optional[dict] test_results: Optional[dict] iterations: int
MAX_ITERATIONS = 3
async def planner_node(state: WorkflowState) -> WorkflowState: """Analyze request and create implementation plan.""" planner = Agent(model="opus", role="planner") plan = await planner.create_plan(state["request"]) return {**state, "plan": plan}
async def coder_node(state: WorkflowState) -> WorkflowState: """Implement code based on plan.""" coder = Agent(model="sonnet", role="coder") code = await coder.implement( plan=state["plan"], previous_code=state.get("code"), feedback=state.get("review", {}).get("suggestions") ) return {**state, "code": code}
async def reviewer_node(state: WorkflowState) -> WorkflowState: """Review code with fresh context window.""" reviewer = Agent(model="opus", role="reviewer") review = await reviewer.review( code=state["code"], plan=state["plan"] ) return { **state, "review": review, "iterations": state["iterations"] + 1 }
async def tester_node(state: WorkflowState) -> WorkflowState: """Run tests on the code.""" tester = Agent(model="haiku", role="tester") results = await tester.run_tests(state["code"]) return {**state, "test_results": results}
def should_continue_coding(state: WorkflowState) -> str: """Decide if more coding iterations needed.""" if state["iterations"] >= MAX_ITERATIONS: return "approve"
if state["review"]["is_approved"]: return "approve"
return "fix"
def test_decision(state: WorkflowState) -> str: """Decide based on test results.""" if state["test_results"]["passed"]: return "done" return "fix"
def create_workflow(): workflow = StateGraph(WorkflowState)
# Add nodes workflow.add_node("plan", planner_node) workflow.add_node("code", coder_node) workflow.add_node("review", reviewer_node) workflow.add_node("test", tester_node)
# Define flow workflow.set_entry_point("plan") workflow.add_edge("plan", "code") workflow.add_edge("code", "review")
# Conditional: review -> code or test workflow.add_conditional_edges( "review", should_continue_coding, {"fix": "code", "approve": "test"} )
# Conditional: test -> done or code workflow.add_conditional_edges( "test", test_decision, {"fix": "code", "done": END} )
return workflow.compile()Running the workflow
async def main(): workflow = create_workflow()
result = await workflow.ainvoke({ "request": "Implement user authentication with JWT tokens", "iterations": 0 })
print(f"Completed in {result['iterations']} iterations") print(f"Final code: {result['code']}")
# Output# Completed in 2 iterations# Review issues found: Missing token expiration, No refresh token logic# Fixed in iteration 2# Tests passed: 12/12Why this works
1. Context isolation
The reviewer starts with a clean slate. It doesn’t know the implementation decisions the coder made, so it can objectively evaluate the code.
Single Agent:+------------------------------------------+| Request | Plan | Code | Review Context || | | | (tainted by || | | | implementation) |+------------------------------------------+=> Reviewer sees what coder expects it to see
Multi-Agent:+-------------+ +-------------+| Planner | | Reviewer || [Fresh] | | [Fresh] |+-------------+ +-------------+ | ^ v |+-------------+ || Coder |----------->+| [Fresh] | passes code only+-------------+ (no implementation bias)=> Reviewer sees code objectively2. Specialized models
Different tasks need different capabilities:
MODEL_CONFIG = { "planner": "opus", # Deep reasoning, task breakdown "coder": "sonnet", # Best coding model "reviewer": "opus", # Strong reasoning for edge cases "tester": "haiku", # Fast, cheap for test generation}Using Opus for planning and review ensures deep reasoning where it matters. Using Haiku for testing reduces cost on routine tasks.
3. Automated feedback loop
The workflow doesn’t stop when issues are found. It routes feedback back to the coder automatically:
# Reviewer finds issuesreview = { "is_approved": False, "suggestions": [ "Add input validation for email field", "Handle null case in user lookup", "Extract duplicate validation logic" ]}
# Coder receives focused feedbackcoder.implement( plan=plan, previous_code=code, feedback=review["suggestions"] # Targeted fixes)Common mistakes
When I first implemented this, I made several mistakes:
Mistake 1: Same model for all agents
# WRONG: Using expensive Opus for everythingagents = { "planner": Agent(model="opus"), "coder": Agent(model="opus"), # Overkill "reviewer": Agent(model="opus"), "tester": Agent(model="opus"), # Wasteful}This worked but cost 3x more than necessary.
Mistake 2: Skipping the planning phase
# WRONG: Coder without plancode = await coder.implement(request) # Inconsistent resultsWithout a plan, the coder made inconsistent design decisions across iterations.
Mistake 3: Shared context between coder and reviewer
# WRONG: Passing full context to reviewerreview = await reviewer.review( code=code, plan=plan, coder_thoughts=coder.reasoning, # Taints review! decisions=coder.decisions)This defeated the whole purpose - the reviewer became biased by the coder’s rationale.
Mistake 4: No termination criteria
# WRONG: Could run foreverwhile not review.is_approved: code = await coder.fix(code, review.suggestions) review = await reviewer.review(code)Without a maximum iteration count, the loop could run indefinitely on contentious issues.
The results
After implementing the multi-agent workflow:
Before (single agent):- Average iterations: 3+ manual refactoring passes- Code quality: Often needs cleanup- Time spent: 30-60 min per feature- Technical debt: Accumulated
After (multi-agent):- Average iterations: 2-3 automated passes- Code quality: Reviewer catches issues- Time spent: 5-10 min review only- Technical debt: Caught before mergeSummary
In this post, I showed how to set up a planner-coder-reviewer multi-agent AI workflow. The key insight is that separate context windows for each agent enable effective self-correction - the reviewer sees what the coder missed.
Start with three agents (planner, coder, reviewer) using appropriate model tiers. Add a tester agent once the core loop works. Define clear quality gates to prevent infinite loops. The result: cleaner code, less manual intervention, and a sustainable AI coding practice.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit: OpenCode vs ClaudeCode Discussion
- 👨💻 LangGraph Documentation
- 👨💻 CrewAI Multi-Agent Framework
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments