How Do Planner, Builder, and QA Subagents Transform AI Coding Workflows?

Mar 22, 2026

Single-Agent AI Coding Falls Apart at Scale

I tried building a full application with Codex. I gave it one big prompt: “Build me a task management app with user auth, real-time updates, and a dashboard.”

It started well. The agent wrote code, created files, made progress. But then things went wrong.

The code had bugs. Features were half-implemented. Tests were missing. And worst of all, I couldn’t tell where things broke because everything happened in one long session.

This is the single-agent problem. When you throw a complex task at one agent, three things happen:

Context Overload: The agent tries to hold everything in memory. Requirements, code decisions, error states—all crammed together. Quality drops as context fills up.

Quality Drift: Without checkpoints, small mistakes compound. A wrong assumption at step 5 corrupts everything built after step 15.

Task Ambiguity: Large tasks like “build authentication” are too vague. The agent guesses wrong, builds the wrong thing, and you discover it too late.

The Solution: Three-Agent Orchestration

I learned this pattern from developers who run Codex for hours successfully. They don’t use one agent. They use three specialized agents in sequence:

+----------+     +----------+     +----------+
| Planner  | --> | Builder  | --> |   QA     |
|  Agent   |     |  Agent   |     |  Agent   |
+----------+     +----------+     +----------+
     |                |                |
     v                v                v
  Decompose      Implement One      Validate
  into atomic    task at a time     against
  tasks          with clear scope   criteria

Each agent does one job well. The Planner breaks down the work. The Builder implements. The QA validates. This separation keeps context clean and quality high.

Phase 1: Planner Agent

The Planner’s job is decomposition. It takes a large goal and breaks it into 20-50 atomic tasks with clear acceptance criteria.

Here’s what a good Planner prompt looks like:

You are the Planner Agent. Your job is to decompose a project into atomic tasks.

Project: Build a user authentication system with JWT tokens

Requirements:
- Email/password registration
- Login with JWT access token (15 min expiry)
- Refresh token rotation
- Password reset via email
- Rate limiting (5 attempts per minute per IP)

Create a tasks.csv file with these columns:
- id: unique task identifier
- task: what to do (atomic, one action)
- acceptance_criteria: how to know it's done
- dependencies: which tasks must complete first
- status: planned

Rules:
- Each task should take 5-15 minutes to complete
- Tasks must have measurable acceptance criteria
- Dependencies must be explicit
- No task should require clarification

Output ONLY the tasks.csv file. No explanations.

The Planner produces something like this:

id,task,acceptance_criteria,dependencies,status
1,Create users table migration,Migration file exists with users table schema,none,planned
2,Run users table migration,Migration applied, schema updated,1,planned
3,Create User model,Model file exists with required fields,2,planned
4,Add password hashing to User model,Passwords are hashed with bcrypt before save,3,planned
5,Create register endpoint,POST /auth/register returns 201 on success,4,planned
6,Add email validation to register,Invalid emails return 400 with error message,5,planned
7,Add duplicate email check,Register returns 409 if email exists,5,planned
8,Create login endpoint,POST /auth/login returns JWT on success,4,planned
9,Add rate limiting to login,More than 5 attempts returns 429,8,planned
10,Create refresh token endpoint,POST /auth/refresh returns new access token,8,planned
...

Notice each task is atomic. “Create register endpoint” is one task. “Add email validation” is a separate task. This granularity matters.

Phase 2: Builder Agent

The Builder implements ONE task at a time. Not two, not five—exactly one.

Here’s the Builder prompt:

You are the Builder Agent. Your job is to implement ONE task.

Current task: {task_from_csv}
Acceptance criteria: {criteria_from_csv}

Context:
- Project root: /app
- Tech stack: Node.js, Express, PostgreSQL
- Related files: {list_files_from_dependencies}

Rules:
- Implement ONLY this task
- Do not refactor code outside this task's scope
- Write tests if acceptance criteria require them
- Commit your changes with message referencing task ID
- Output "TASK_COMPLETE" when done

If you encounter a blocker, output "BLOCKED: {reason}" and stop.

The Builder receives exactly one task:

// Task ID: 5
// Task: Create register endpoint
// Acceptance criteria: POST /auth/register returns 201 on success

// Builder creates:
// routes/auth.js
router.post('/register', async (req, res) => {
  const { email, password } = req.body;

  if (!email || !password) {
    return res.status(400).json({ error: 'Email and password required' });
  }

  const hashedPassword = await bcrypt.hash(password, 10);
  const user = await User.create({ email, password: hashedPassword });

  res.status(201).json({ id: user.id, email: user.email });
});

// Builder output:
// TASK_COMPLETE: Created POST /auth/register endpoint

The Builder does one thing and stops. It doesn’t add email validation (that’s task 6). It doesn’t add duplicate checking (that’s task 7). Just the register endpoint.

Phase 3: QA Agent

The QA Agent validates the Builder’s work against acceptance criteria.

You are the QA Agent. Your job is to validate completed tasks.

Task completed: {task_from_csv}
Acceptance criteria: {criteria_from_csv}

Rules:
- Test the implementation against each acceptance criterion
- For API endpoints: make actual HTTP requests
- For database changes: verify schema
- For tests: run them
- Output "APPROVED" if all criteria pass
- Output "REJECTED: {reason}" if any criterion fails

Do not fix code. Only validate and report.

The QA runs actual tests:

# Task: Create register endpoint
# Acceptance criteria: POST /auth/register returns 201 on success

# QA runs:
curl -X POST http://localhost:3000/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "[email protected]", "password": "secret123"}'

# Response: {"id": 1, "email": "[email protected]"}
# Status: 201

# QA output:
# APPROVED: POST /auth/register returns 201 on success

If validation fails:

# Task: Add email validation to register
# Acceptance criteria: Invalid emails return 400 with error message

# QA runs:
curl -X POST http://localhost:3000/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "not-an-email", "password": "secret123"}'

# Response: {"id": 2, "email": "not-an-email"}
# Status: 201

# QA output:
# REJECTED: Invalid email "not-an-email" was accepted (expected 400, got 201)

The Full Orchestration Loop

Here’s how these three agents work together:

┌─────────────────────────────────────────────────────────────┐
│                      ORCHESTRATION LOOP                      │
└─────────────────────────────────────────────────────────────┘
                              │
                              v
                    ┌─────────────────┐
                    │  PLANNER AGENT  │
                    │                 │
                    │ Input: Goal     │
                    │ Output: CSV     │
                    └────────┬────────┘
                              │
                              v
              ┌───────────────────────────────┐
              │        Read next task         │
              │        from tasks.csv         │
              └───────────────┬───────────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              v                               v
    ┌─────────────────┐             ┌─────────────────┐
    │ More tasks?     │             │ All done        │
    │                 │             │                 │
    │ YES             │             │ STOP            │
    └────────┬────────┘             └─────────────────┘
             │
             v
    ┌─────────────────┐
    │  BUILDER AGENT │
    │                │
    │ Input: Task    │
    │ Output: Code   │
    └────────┬────────┘
             │
             v
    ┌─────────────────┐
    │    QA AGENT     │
    │                 │
    │ Input: Code     │
    │ Output: Pass    │
    │         /Fail   │
    └────────┬────────┘
             │
    ┌────────┴────────┐
    │                 │
    v                 v
┌─────────┐      ┌─────────┐
│ APPROVED│      │ REJECTED│
│         │      │         │
│ Mark    │      │ Return  │
│ task    │      │ to      │
│ DONE    │      │ PLANNED  │
└────┬────┘      └────┬────┘
     │                │
     │                v
     │         ┌─────────────┐
     │         │ Log issue   │
     │         │ and retry   │
     │         └─────────────┘
     │
     v
┌───────────────────┐
│ Loop to next task │
└───────────────────┘

The orchestrator script looks like this:

import csv
import subprocess

def run_planner(goal):
    """Phase 1: Decompose into tasks"""
    result = subprocess.run(
        ["codex", "--agent", "planner", goal],
        capture_output=True, text=True
    )
    # Result is tasks.csv
    return parse_csv(result.stdout)

def run_builder(task):
    """Phase 2: Implement one task"""
    result = subprocess.run(
        ["codex", "--agent", "builder", task],
        capture_output=True, text=True
    )
    return "TASK_COMPLETE" in result.stdout

def run_qa(task, criteria):
    """Phase 3: Validate implementation"""
    result = subprocess.run(
        ["codex", "--agent", "qa", task, criteria],
        capture_output=True, text=True
    )
    return "APPROVED" in result.stdout

def orchestrate(goal):
    # Phase 1: Planning
    tasks = run_planner(goal)

    # Phases 2-3: Build and validate each task
    for task in tasks:
        # Phase 2: Build
        success = run_builder(task)
        if not success:
            mark_task_blocked(task)
            continue

        # Phase 3: Validate
        approved = run_qa(task['task'], task['acceptance_criteria'])
        if approved:
            mark_task_done(task)
        else:
            mark_task_planned(task)  # Retry later

    return all_tasks_done(tasks)

Real-World Prompt Sequence

Here’s an actual sequence of prompts you’d use with Codex:

Step 1: Invoke Planner

codex "You are the Planner Agent.

Project: Build a REST API for task management with the following features:
- CRUD operations for tasks
- User authentication with JWT
- Task assignment to users
- Filtering and pagination

Create tasks.csv with 30-40 atomic tasks. Each task must have:
- Unique ID
- Clear acceptance criteria
- Explicit dependencies

Output only the CSV. No explanations."

Step 2: Read tasks and invoke Builder for first task

codex "You are the Builder Agent.

Task ID: 1
Task: Create project structure with Express.js
Acceptance criteria:
- package.json exists
- Express installed
- Server starts on port 3000
- GET /health returns 200

Rules:
- Implement only this task
- Output TASK_COMPLETE when done
- Output BLOCKED: {reason} if stuck"

Step 3: Invoke QA

codex "You are the QA Agent.

Task: Create project structure with Express.js
Acceptance criteria:
- package.json exists
- Express installed
- Server starts on port 3000
- GET /health returns 200

Validate each criterion. Run actual tests.
Output APPROVED or REJECTED with reason."

Step 4: Loop to next task

Repeat steps 2-3 for each task in the CSV.

Why This Works Better Than Single-Agent

Quality Assurance Built-In: Every task passes through QA. No broken code accumulates. The QA agent catches problems before moving forward.

Predictable Progress: You can see exactly where you are. Task 5 of 30 done. Task 6 blocked. Task 7 in progress. This visibility helps you plan your time.

Scalable Complexity: A 50-task project isn’t harder than a 10-task project. The orchestrator handles each task the same way. Context stays clean because each agent does one job.

Clear Debugging: When something breaks, you know exactly which task failed. The acceptance criteria tell you what should have happened. No more searching through 3 hours of output to find where things went wrong.

Common Mistakes

I made these mistakes when learning this pattern:

1. Skipping Planning

I tried to start building immediately. “Just add auth while you’re at it,” I said.

The Builder got confused. It started adding features that conflicted with later tasks. The QA couldn’t validate clearly.

Fix: Always run Planner first. Even for “small” projects. A 10-task plan takes 5 minutes to create and saves hours of rework.

2. Vague Acceptance Criteria

# BAD
Task: Add authentication
Acceptance criteria: Works correctly

# GOOD
Task: Add JWT authentication middleware
Acceptance criteria:
- Requests without token return 401
- Requests with expired token return 401
- Requests with valid token pass through
- Token payload available as req.user

Vague criteria make QA impossible. The QA agent can’t verify “works correctly.”

3. Rushing QA

I wanted to move fast, so I told QA to “just check if it compiles.”

Bad idea. Code that compiles can still fail acceptance criteria. QA is where quality lives. Don’t skip it.

4. Giant Tasks

“Build the entire API” is not an atomic task. It’s a project.

Atomic tasks take 5-15 minutes. If a task takes longer, the Planner should break it down further.

5. Monolithic Instructions

I once gave the Builder this: “Implement tasks 5-10.”

The Builder did all five, badly. No QA between them. Problems from task 5 corrupted tasks 6-10.

Fix: One task per Builder invocation. QA after each. No exceptions.

Summary

In this post, I showed how Planner, Builder, and QA subagents transform AI coding workflows. The key point is sequential specialization with quality gates.

The three-phase pattern:

Planner: Decomposes large goals into atomic tasks with clear acceptance criteria
Builder: Implements exactly one task with defined scope
QA: Validates implementation against acceptance criteria before moving forward

Key takeaways:

Single-agent coding fails at scale due to context overload, quality drift, and task ambiguity
Specialized agents with clear responsibilities produce better results
Atomic tasks (5-15 minutes each) keep context clean
QA after every task catches problems early
The orchestrator loop handles complexity through repetition

Next steps:

Create a Planner prompt template for your project type
Start with a 10-task plan to learn the workflow
Add QA validation scripts for your tech stack
Gradually increase to 30-50 task projects

This pattern transforms AI coding from unpredictable sprints into reliable, trackable workflows. You can run it for hours with confidence because every task has a quality gate.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit Discussion: Running Codex for several hours

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!