How to Write Implementation Plans That AI Agents Can Execute Reliably

Mar 18, 2026

Problem

I handed an AI agent an implementation plan I’d written. The plan said things like:

1. Add input validation to the form
2. Create the API endpoint
3. Write tests

The agent ran with it. Ten minutes later, I had validation on the wrong fields, an API endpoint with incorrect authentication, and tests that passed but tested nothing useful.

The problem wasn’t the agent’s intelligence - it was my plan. I’d written for a human reader who could fill in gaps with context and judgment. AI agents need something completely different.

The Core Insight

Here’s what I learned from the Superpowers writing-plans skill:

Write comprehensive implementation plans assuming the engineer
has ZERO CONTEXT for your codebase and QUESTIONABLE TASTE.

This changes everything about how you write a plan. Every step must be explicit. Every file path must be exact. Every command must be verifiable.

What Makes a Plan Agent-Executable?

The skill defines a strict structure:

Each step is ONE ACTION (2-5 minutes):
- "Write the failing test"
- "Run it to make sure it fails"
- "Implement minimal code"

NOT: "Add validation to the form" (too vague, multiple actions)
NOT: "Implement the feature" (way too large)

This granularity forces you to think through the actual implementation sequence. You can’t hand-wave.

The Plan Document Structure

Every plan must start with this header:

# [Feature Name] Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development

**Goal:** [One sentence describing what this builds]

**Architecture:** [2-3 sentences about approach]

**Tech Stack:** [Key technologies/libraries]

This header does more than document - it sets context for agents that may know nothing about your project.

Task Structure: The Exact Format

Each task follows this template:

### Task N: [Component Name]

**Files:**
- Create: `exact/path/to/file.py`
- Modify: `exact/path/to/existing.py:123-145`
- Test: `tests/exact/path/to/test.py`

- [ ] **Step 1: Write the failing test**

```python title="Example Test"
def test_specific_behavior():
    result = function(input)
    assert result == expected

Step 2: Run test to verify it fails

Run: pytest tests/path/test.py::test_name -v Expected: FAIL with “function not defined”

Notice what’s included:

Exact file paths - No relative paths, no “in the appropriate file”
Complete code - Not “add validation” but the actual validation code
Exact commands - The exact shell command with expected output
Line numbers for modifications - Where to make the change

The Architecture Decision: Before Tasks

Before writing any tasks, map the file structure:

Before defining tasks:

1. Design units with clear boundaries
2. Define interfaces between units
3. Prefer smaller, focused files over large ones
4. Files that change together should live together

This prevents the “where does this go?” confusion that leads agents to make bad architectural decisions.

A Concrete Example

Let me show you the difference. Here’s how I used to write a plan:

## Add User Authentication

1. Create the user model with password hashing
2. Add login/logout routes
3. Create session management
4. Write tests for the auth flow

Here’s how the Superpowers approach would structure it:

### Task 1: User Model

**Files:**
- Create: `src/models/user.py`
- Create: `tests/models/test_user.py`

- [ ] **Step 1: Write the failing test for password hashing**

```python title="tests/models/test_user.py"
# tests/models/test_user.py
from src.models.user import User

def test_password_is_hashed_on_creation():
    user = User(email="[email protected]", password="plaintext")
    assert user.password != "plaintext"
    assert user.verify_password("plaintext") is True

Step 2: Run test to verify it fails

Run: pytest tests/models/test_user.py -v Expected: FAIL with “ModuleNotFoundError: No module named ‘src.models.user’”

Step 3: Create the User model

Create file src/models/user.py:

from passlib.context import CryptContext

pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")

class User:
    def __init__(self, email: str, password: str):
        self.email = email
        self.password = pwd_context.hash(password)

    def verify_password(self, plain_password: str) -> bool:
        return pwd_context.verify(plain_password, self.password)

Step 4: Run test to verify it passes

Run: pytest tests/models/test_user.py -v Expected: PASS

The second version leaves nothing to interpretation. An agent with zero context can execute it.

Key Principles Throughout

The skill emphasizes these principles must appear throughout the plan:

- DRY: Don't Repeat Yourself
- YAGNI: You Ain't Gonna Need It
- TDD: Test-Driven Development
- Frequent commits after each task
- Reference relevant skills with @ syntax

These aren’t afterthoughts - they’re baked into every step.

The Review Loop: Quality Assurance

Plans go through a review loop before execution:

┌─────────────────────────────────────────────────────────────┐
│                    PLAN REVIEW LOOP                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Write Plan ──► Dispatch plan-document-reviewer ──► Review  │
│                     │                          │              │
│                     │                          ▼              │
│                     │                    ┌────────┐         │
│                     │                    │Issues? │         │
│                     │                    └────┬───┘         │
│                     │                         │              │
│                     │           ┌─────────────┴─────┐       │
│                     │           │                   │       │
│                     │         Yes                   No      │
│                     │           │                   │       │
│                     │           ▼                   ▼       │
│                     │      Fix & Retry          Proceed      │
│                     │           │                   │       │
│                     └───────────┘                   │       │
│                                 │                           │
│                           Max 3 iterations                  │
│                                 │                           │
│                    ┌────────────┴────────────┐              │
│                    │                         │               │
│              Success                    Surface to Human     │
│                    │                         │               │
│                    └─────────────────────────┘               │
└─────────────────────────────────────────────────────────────┘

This catches vague steps, missing file paths, and unclear commands before an agent wastes time.

Scope Check: Breaking Down Large Plans

One critical check happens before writing tasks:

If the spec covers multiple independent subsystems,
it should have been broken into sub-project specs during brainstorming.

If it wasn't:
  Suggest breaking into separate plans
  Each plan produces working, testable software

This prevents the “one giant plan” anti-pattern that fails because it’s too complex to execute reliably.

The Execution Handoff

After saving the plan, you offer a choice:

Plan complete. Two execution options:

1. Subagent-Driven (recommended)
   - Fresh subagent per task
   - Review between tasks
   - Better isolation, easier debugging

2. Inline Execution
   - Batch execution with checkpoints
   - Faster for simple plans
   - Less visibility into intermediate states

Which approach?

This acknowledges that different situations call for different execution strategies.

Common Mistakes I Made

Vague steps: “Add error handling” became 15 specific steps when I was forced to be explicit.

Missing file paths: I assumed the agent would know where to put things. It didn’t.

Incomplete code: I wrote “validate the input” instead of writing the validation code. The agent wrote validation for the wrong inputs.

Skipping expected output: I didn’t specify what commands should output. The agent ran commands, saw output, and assumed success even when the output was an error.

No TDD enforcement: My plans said “write tests” after implementation. The Superpowers approach enforces test-first with verification that tests fail.

Why This Approach Works

1. Eliminates ambiguity - Agents can't read between lines
2. Enables verification - Every step has expected output
3. Supports recovery - When a step fails, you know exactly where
4. Allows parallelization - Different agents can take different tasks
5. Creates documentation - The plan IS documentation

Summary

In this post, I explained how to write implementation plans that AI agents can execute reliably. The key insight from the Superpowers writing-plans skill is to assume zero context and questionable taste - every step must be one action (2-5 minutes), with exact file paths, complete code, and specific commands with expected output.

The structured approach - header with goal and architecture, tasks with files and steps, embedded TDD principles, and a review loop - transforms vague plans into executable specifications. This dramatically improves agent execution reliability.

For teams working with AI agents on implementation, adopting this format is essential. The upfront investment in detailed planning pays off in reliable execution and reduced debugging time.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Superpowers Skills Repository
👨‍💻 Subagent-Driven Development

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!