How Superpowers Enables Multi-Agent AI Development with TDD

Mar 27, 2026

I kept getting mediocre code reviews from AI assistants. The same agent that wrote the code would review it, and somehow everything always looked “good” - even when bugs were obvious. The problem? “I wrote this” bias - the AI couldn’t objectively review its own work because it was too attached to what it had just created.

The Problem with Single-Agent AI Coding

When you use a single AI agent to write and review code, several issues emerge:

Context pollution: Errors compound over long conversations
Self-review bias: The agent psychologically defends its own implementation
No separation of concerns: Planning, implementing, and reviewing blur together

I tried having Claude “switch modes” between implementing and reviewing. Didn’t work. The context was still shared - it knew exactly why it made each decision, so it couldn’t objectively evaluate whether those decisions were correct.

Then I discovered Superpowers - a multi-agent AI development framework that solves this elegantly.

Superpowers: Assigning Different Roles to Different Agents

Superpowers, developed by Jesse Vincent, takes a fundamentally different approach. Instead of one agent doing everything, it dispatches specialized subagents for each phase:

┌─────────────────────────────────────────────────────────────┐
│                    Controller (Orchestrator)                 │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Implementer  │  │ Spec Reviewer│  │ Code Quality      │  │
│  │              │  │              │  │ Reviewer          │  │
│  │ - Writes code│  │ - Did we     │  │ - Is code clean?  │  │
│  │ - Writes     │  │   build the  │  │ - Security check  │  │
│  │   tests      │  │   right      │  │ - Architecture    │  │
│  │ - Self-check │  │   thing?     │  │   sound?          │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
│         │                  │                   │            │
│    Isolated          Isolated            Isolated           │
│    Context           Context             Context            │
└─────────────────────────────────────────────────────────────┘

The key insight: each subagent has independent context. The Controller precisely curates what information passes to each subagent, avoiding context pollution. This is why multi-agent produces higher quality than single-agent.

The Workflow: From Brainstorming to PR

Superpowers defines a clear workflow with distinct phases:

Phase 1: brainstorming
    │
    ├─→ Socratic dialogue
    ├─→ Outputs design document
    │
    v
Phase 2: using-git-worktrees
    │
    ├─→ Creates isolated workspace
    │
    v
Phase 3: writing-plans
    │
    ├─→ Breaks into 2-5 minute tasks
    │
    v
Phase 4: subagent-driven-dev
    │
    ├─→ Each task dispatched to subagents
    │
    v
Phase 5: finishing-branch
    │
    └─→ Merge, PR, or discard

Each phase hands off to the next with clean boundaries. No agent carries baggage from previous phases.

TDD Enforcement: Tests Before Implementation

The framework enforces test-driven development by baking it into every plan step:

[ ] Step 1: Write failing test
[ ] Step 2: Run test, confirm failure
[ ] Step 3: Write minimal implementation
[ ] Step 4: Run test, confirm pass
[ ] Step 5: Commit

If the Implementer starts coding without tests first, the Spec Reviewer rejects immediately. This isn’t optional - it’s structural.

Two-Phase Review Process

Superpowers separates review into two distinct phases:

┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Spec Reviewer                                       │
│                                                              │
│ Focus: "Did we build the right thing?"                       │
│ Checks:                                                      │
│   - Requirement coverage                                     │
│   - Scenario completeness                                    │
│   - Scope adherence                                          │
│                                                              │
│ Output: PASS / FAIL with specific issues                     │
└─────────────────────────────────────────────────────────────┘
                          │
                          v (only if Phase 1 passes)
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Code Quality Reviewer                               │
│                                                              │
│ Focus: "Did we build it well?"                               │
│ Checks:                                                      │
│   - Code quality                                             │
│   - Architecture soundness                                   │
│   - Security vulnerabilities                                 │
│   - Test quality                                             │
│                                                              │
│ Output: Critical / Important / Minor severity ratings       │
└─────────────────────────────────────────────────────────────┘

Why two phases? If the code does the wrong thing, reviewing code quality is meaningless. You need to validate correctness before investing time in quality assessment.

How a Single Task Flows Through Subagents

Here’s what happens when the Controller dispatches a single task:

Controller (Orchestrator)
     │
     ├─→ Dispatch Implementer Subagent
     │      ├── Ask questions (if unclear)
     │      ├── Implement code
     │      ├── Write tests (TDD required)
     │      ├── Self-review
     │      └── Report: DONE / DONE_WITH_CONCERNS / BLOCKED / NEEDS_CONTEXT
     │
     ├─→ Dispatch Spec Reviewer Subagent
     │      └── Check against plan: Did we do it right?
     │          ├── PASS → Continue
     │          └── FAIL → Implementer fixes → Re-review
     │
     └─→ Dispatch Code Quality Reviewer Subagent
            └── Review code quality: Did we do it well?
                ├── PASS → Mark complete
                └── ISSUE → Implementer fixes → Re-review

Notice the feedback loops. If the Spec Reviewer finds issues, the task returns to the Implementer. Same for Code Quality. The Controller orchestrates these handoffs automatically.

Why This Matters for Real Projects

I’ve been testing Superpowers on greenfield projects with strict testing requirements. The difference is noticeable:

Objective reviews: The Spec Reviewer catches issues the Implementer “knew” about but didn’t document
Caught bugs early: TDD enforcement means tests exist before code
Better architecture: Code Quality Reviewer focuses purely on design, not on defending implementation choices
Clear accountability: Each subagent has a specific job, making it easy to identify where process failed

The framework currently supports Claude Code, Cursor, Codex, and Gemini CLI platforms. Skills are composable Markdown instruction files, making it easy to customize behavior.

Technical Implementation

Superpowers uses a Controller-Subagent pattern where:

Controller: Maintains overall state, curates context, dispatches tasks
Subagents: Receive isolated context, perform specific tasks, report back

Each subagent runs with fresh context. The Controller doesn’t pass “by the way, I wrote this code” information to reviewers. It passes only what’s needed: the requirements, the code to review, and the evaluation criteria.

                    Controller Context
                           │
           ┌───────────────┼───────────────┐
           │               │               │
           v               v               v
    Implementer       Spec Reviewer    Quality Reviewer
    Context           Context          Context
    (fresh)           (fresh)          (fresh)
           │               │               │
           │               │               │
           v               v               v
      "Write this"    "Check against    "Review code
      feature"         requirements"     quality"

When to Use Superpowers

Superpowers works best for:

Greenfield projects where you want quality from day one
Teams with strict testing requirements who need TDD enforcement
Complex features that benefit from independent review phases
AI-assisted development where you want end-to-end automation with quality gates

It’s probably overkill for quick scripts or one-off experiments. But for production code, the multi-agent approach delivers measurably better results.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!