How Do Planner, Builder, and QA Subagents Transform AI Coding Workflows?
Single-Agent AI Coding Falls Apart at Scale
I tried building a full application with Codex. I gave it one big prompt: “Build me a task management app with user auth, real-time updates, and a dashboard.”
It started well. The agent wrote code, created files, made progress. But then things went wrong.
The code had bugs. Features were half-implemented. Tests were missing. And worst of all, I couldn’t tell where things broke because everything happened in one long session.
This is the single-agent problem. When you throw a complex task at one agent, three things happen:
Context Overload: The agent tries to hold everything in memory. Requirements, code decisions, error states—all crammed together. Quality drops as context fills up.
Quality Drift: Without checkpoints, small mistakes compound. A wrong assumption at step 5 corrupts everything built after step 15.
Task Ambiguity: Large tasks like “build authentication” are too vague. The agent guesses wrong, builds the wrong thing, and you discover it too late.
The Solution: Three-Agent Orchestration
I learned this pattern from developers who run Codex for hours successfully. They don’t use one agent. They use three specialized agents in sequence:
+----------+ +----------+ +----------+| Planner | --> | Builder | --> | QA || Agent | | Agent | | Agent |+----------+ +----------+ +----------+ | | | v v v Decompose Implement One Validate into atomic task at a time against tasks with clear scope criteriaEach agent does one job well. The Planner breaks down the work. The Builder implements. The QA validates. This separation keeps context clean and quality high.
Phase 1: Planner Agent
The Planner’s job is decomposition. It takes a large goal and breaks it into 20-50 atomic tasks with clear acceptance criteria.
Here’s what a good Planner prompt looks like:
You are the Planner Agent. Your job is to decompose a project into atomic tasks.
Project: Build a user authentication system with JWT tokens
Requirements:- Email/password registration- Login with JWT access token (15 min expiry)- Refresh token rotation- Password reset via email- Rate limiting (5 attempts per minute per IP)
Create a tasks.csv file with these columns:- id: unique task identifier- task: what to do (atomic, one action)- acceptance_criteria: how to know it's done- dependencies: which tasks must complete first- status: planned
Rules:- Each task should take 5-15 minutes to complete- Tasks must have measurable acceptance criteria- Dependencies must be explicit- No task should require clarification
Output ONLY the tasks.csv file. No explanations.The Planner produces something like this:
id,task,acceptance_criteria,dependencies,status1,Create users table migration,Migration file exists with users table schema,none,planned2,Run users table migration,Migration applied, schema updated,1,planned3,Create User model,Model file exists with required fields,2,planned4,Add password hashing to User model,Passwords are hashed with bcrypt before save,3,planned5,Create register endpoint,POST /auth/register returns 201 on success,4,planned6,Add email validation to register,Invalid emails return 400 with error message,5,planned7,Add duplicate email check,Register returns 409 if email exists,5,planned8,Create login endpoint,POST /auth/login returns JWT on success,4,planned9,Add rate limiting to login,More than 5 attempts returns 429,8,planned10,Create refresh token endpoint,POST /auth/refresh returns new access token,8,planned...Notice each task is atomic. “Create register endpoint” is one task. “Add email validation” is a separate task. This granularity matters.
Phase 2: Builder Agent
The Builder implements ONE task at a time. Not two, not five—exactly one.
Here’s the Builder prompt:
You are the Builder Agent. Your job is to implement ONE task.
Current task: {task_from_csv}Acceptance criteria: {criteria_from_csv}
Context:- Project root: /app- Tech stack: Node.js, Express, PostgreSQL- Related files: {list_files_from_dependencies}
Rules:- Implement ONLY this task- Do not refactor code outside this task's scope- Write tests if acceptance criteria require them- Commit your changes with message referencing task ID- Output "TASK_COMPLETE" when done
If you encounter a blocker, output "BLOCKED: {reason}" and stop.The Builder receives exactly one task:
// Task ID: 5// Task: Create register endpoint// Acceptance criteria: POST /auth/register returns 201 on success
// Builder creates:// routes/auth.jsrouter.post('/register', async (req, res) => { const { email, password } = req.body;
if (!email || !password) { return res.status(400).json({ error: 'Email and password required' }); }
const hashedPassword = await bcrypt.hash(password, 10); const user = await User.create({ email, password: hashedPassword });
res.status(201).json({ id: user.id, email: user.email });});
// Builder output:// TASK_COMPLETE: Created POST /auth/register endpointThe Builder does one thing and stops. It doesn’t add email validation (that’s task 6). It doesn’t add duplicate checking (that’s task 7). Just the register endpoint.
Phase 3: QA Agent
The QA Agent validates the Builder’s work against acceptance criteria.
You are the QA Agent. Your job is to validate completed tasks.
Task completed: {task_from_csv}Acceptance criteria: {criteria_from_csv}
Rules:- Test the implementation against each acceptance criterion- For API endpoints: make actual HTTP requests- For database changes: verify schema- For tests: run them- Output "APPROVED" if all criteria pass- Output "REJECTED: {reason}" if any criterion fails
Do not fix code. Only validate and report.The QA runs actual tests:
# Task: Create register endpoint# Acceptance criteria: POST /auth/register returns 201 on success
# QA runs:curl -X POST http://localhost:3000/auth/register \ -H "Content-Type: application/json" \
# Response: {"id": 1, "email": "[email protected]"}# Status: 201
# QA output:# APPROVED: POST /auth/register returns 201 on successIf validation fails:
# Task: Add email validation to register# Acceptance criteria: Invalid emails return 400 with error message
# QA runs:curl -X POST http://localhost:3000/auth/register \ -H "Content-Type: application/json" \ -d '{"email": "not-an-email", "password": "secret123"}'
# Response: {"id": 2, "email": "not-an-email"}# Status: 201
# QA output:# REJECTED: Invalid email "not-an-email" was accepted (expected 400, got 201)The Full Orchestration Loop
Here’s how these three agents work together:
┌─────────────────────────────────────────────────────────────┐│ ORCHESTRATION LOOP │└─────────────────────────────────────────────────────────────┘ │ v ┌─────────────────┐ │ PLANNER AGENT │ │ │ │ Input: Goal │ │ Output: CSV │ └────────┬────────┘ │ v ┌───────────────────────────────┐ │ Read next task │ │ from tasks.csv │ └───────────────┬───────────────┘ │ ┌───────────────┴───────────────┐ │ │ v v ┌─────────────────┐ ┌─────────────────┐ │ More tasks? │ │ All done │ │ │ │ │ │ YES │ │ STOP │ └────────┬────────┘ └─────────────────┘ │ v ┌─────────────────┐ │ BUILDER AGENT │ │ │ │ Input: Task │ │ Output: Code │ └────────┬────────┘ │ v ┌─────────────────┐ │ QA AGENT │ │ │ │ Input: Code │ │ Output: Pass │ │ /Fail │ └────────┬────────┘ │ ┌────────┴────────┐ │ │ v v┌─────────┐ ┌─────────┐│ APPROVED│ │ REJECTED││ │ │ ││ Mark │ │ Return ││ task │ │ to ││ DONE │ │ PLANNED │└────┬────┘ └────┬────┘ │ │ │ v │ ┌─────────────┐ │ │ Log issue │ │ │ and retry │ │ └─────────────┘ │ v┌───────────────────┐│ Loop to next task │└───────────────────┘The orchestrator script looks like this:
import csvimport subprocess
def run_planner(goal): """Phase 1: Decompose into tasks""" result = subprocess.run( ["codex", "--agent", "planner", goal], capture_output=True, text=True ) # Result is tasks.csv return parse_csv(result.stdout)
def run_builder(task): """Phase 2: Implement one task""" result = subprocess.run( ["codex", "--agent", "builder", task], capture_output=True, text=True ) return "TASK_COMPLETE" in result.stdout
def run_qa(task, criteria): """Phase 3: Validate implementation""" result = subprocess.run( ["codex", "--agent", "qa", task, criteria], capture_output=True, text=True ) return "APPROVED" in result.stdout
def orchestrate(goal): # Phase 1: Planning tasks = run_planner(goal)
# Phases 2-3: Build and validate each task for task in tasks: # Phase 2: Build success = run_builder(task) if not success: mark_task_blocked(task) continue
# Phase 3: Validate approved = run_qa(task['task'], task['acceptance_criteria']) if approved: mark_task_done(task) else: mark_task_planned(task) # Retry later
return all_tasks_done(tasks)Real-World Prompt Sequence
Here’s an actual sequence of prompts you’d use with Codex:
Step 1: Invoke Planner
codex "You are the Planner Agent.
Project: Build a REST API for task management with the following features:- CRUD operations for tasks- User authentication with JWT- Task assignment to users- Filtering and pagination
Create tasks.csv with 30-40 atomic tasks. Each task must have:- Unique ID- Clear acceptance criteria- Explicit dependencies
Output only the CSV. No explanations."Step 2: Read tasks and invoke Builder for first task
codex "You are the Builder Agent.
Task ID: 1Task: Create project structure with Express.jsAcceptance criteria:- package.json exists- Express installed- Server starts on port 3000- GET /health returns 200
Rules:- Implement only this task- Output TASK_COMPLETE when done- Output BLOCKED: {reason} if stuck"Step 3: Invoke QA
codex "You are the QA Agent.
Task: Create project structure with Express.jsAcceptance criteria:- package.json exists- Express installed- Server starts on port 3000- GET /health returns 200
Validate each criterion. Run actual tests.Output APPROVED or REJECTED with reason."Step 4: Loop to next task
Repeat steps 2-3 for each task in the CSV.
Why This Works Better Than Single-Agent
Quality Assurance Built-In: Every task passes through QA. No broken code accumulates. The QA agent catches problems before moving forward.
Predictable Progress: You can see exactly where you are. Task 5 of 30 done. Task 6 blocked. Task 7 in progress. This visibility helps you plan your time.
Scalable Complexity: A 50-task project isn’t harder than a 10-task project. The orchestrator handles each task the same way. Context stays clean because each agent does one job.
Clear Debugging: When something breaks, you know exactly which task failed. The acceptance criteria tell you what should have happened. No more searching through 3 hours of output to find where things went wrong.
Common Mistakes
I made these mistakes when learning this pattern:
1. Skipping Planning
I tried to start building immediately. “Just add auth while you’re at it,” I said.
The Builder got confused. It started adding features that conflicted with later tasks. The QA couldn’t validate clearly.
Fix: Always run Planner first. Even for “small” projects. A 10-task plan takes 5 minutes to create and saves hours of rework.
2. Vague Acceptance Criteria
# BADTask: Add authenticationAcceptance criteria: Works correctly
# GOODTask: Add JWT authentication middlewareAcceptance criteria:- Requests without token return 401- Requests with expired token return 401- Requests with valid token pass through- Token payload available as req.userVague criteria make QA impossible. The QA agent can’t verify “works correctly.”
3. Rushing QA
I wanted to move fast, so I told QA to “just check if it compiles.”
Bad idea. Code that compiles can still fail acceptance criteria. QA is where quality lives. Don’t skip it.
4. Giant Tasks
“Build the entire API” is not an atomic task. It’s a project.
Atomic tasks take 5-15 minutes. If a task takes longer, the Planner should break it down further.
5. Monolithic Instructions
I once gave the Builder this: “Implement tasks 5-10.”
The Builder did all five, badly. No QA between them. Problems from task 5 corrupted tasks 6-10.
Fix: One task per Builder invocation. QA after each. No exceptions.
Summary
In this post, I showed how Planner, Builder, and QA subagents transform AI coding workflows. The key point is sequential specialization with quality gates.
The three-phase pattern:
- Planner: Decomposes large goals into atomic tasks with clear acceptance criteria
- Builder: Implements exactly one task with defined scope
- QA: Validates implementation against acceptance criteria before moving forward
Key takeaways:
- Single-agent coding fails at scale due to context overload, quality drift, and task ambiguity
- Specialized agents with clear responsibilities produce better results
- Atomic tasks (5-15 minutes each) keep context clean
- QA after every task catches problems early
- The orchestrator loop handles complexity through repetition
Next steps:
- Create a Planner prompt template for your project type
- Start with a 10-task plan to learn the workflow
- Add QA validation scripts for your tech stack
- Gradually increase to 30-50 task projects
This pattern transforms AI coding from unpredictable sprints into reliable, trackable workflows. You can run it for hours with confidence because every task has a quality gate.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments