Skip to content

What Is Spec-Driven Development with AI Coding Assistants?

The Problem: AI Coding Outputs Don’t Match What I Expected

When I first started using AI coding assistants, I kept running into the same frustration. I’d type a prompt like “build me a user authentication system,” and the AI would generate code that worked but didn’t match what I had in mind. Different architecture decisions, different patterns, different library choices.

I’d spend more time correcting the output than if I’d written it myself.

The issue wasn’t the AI’s capability. The issue was my input. Vague prompts produce vague results. And during long coding sessions, I noticed something worse: the AI would “drift” away from my original requirements. By the time we reached the tenth iteration, the codebase looked nothing like what I started building.

Then I discovered spec-driven development.

What Is Spec-Driven Development?

Spec-driven development is a methodology where you provide detailed specifications before asking the AI to generate code. Instead of hoping the AI understands your vague request, you create comprehensive documentation first:

  1. User stories
  2. Architecture decisions
  3. Tech plans
  4. Execution roadmaps

Then the AI executes against those specs. The result? Predictable outputs that match your mental model.

I think of it like this: traditional AI prompting is like hiring a contractor and saying “build me a house.” Spec-driven development is like handing them blueprints.

Why Drift Happens and How Specs Prevent It

Drift occurs when AI models lose track of project requirements during extended coding sessions. Without clear boundaries, models like Opus tend to “innovate” - they make creative decisions that deviate from your intent.

A Reddit discussion on r/LocalLLaMA captured this perfectly. One developer noted: “Spec-driven development is pretty much the way to go right now to avoid drifting.”

The same discussion highlighted why Codex works better than Opus for certain tasks. Codex is designed for execution, not creativity. When you give it a spec, it follows the spec. Opus, by contrast, might decide your spec is “suboptimal” and rewrite your architecture.

Prompt Flow Comparison
Traditional Prompt Flow:
User: "Build auth system"
AI: Generates code based on assumptions
User: "That's not what I wanted"
AI: Iterates, makes new assumptions
User: Corrects again
Result: Drift, frustration, wasted time
Spec-Driven Flow:
User: Provides user story, architecture, tech plan
AI: Generates execution plan
User: Reviews and approves plan
AI: Executes against spec
Result: Predictable, matches intent

The Two-Phase Approach That Works

After experimenting with different workflows, I settled on a two-phase approach that consistently produces good results.

Phase 1: Specification and Roadmap

First, I describe what I want at a high level, then ask the AI to create a detailed roadmap broken into phases.

For example, when building a feature, my initial prompt might be:

initial-prompt.txt
I need to add multi-factor authentication to an existing login system.
Context:
- Existing system uses email/password with JWT tokens
- Backend is Node.js with Express
- Frontend is React
- Redis is available for caching
Please create a detailed roadmap broken into phases.
Include dependencies between phases and acceptance criteria for each.

The AI returns a roadmap like:

roadmap.txt
Phase 1: Core MFA Service
- Create MFAService module
- Implement code generation (6-digit, 5-min expiry)
- Add Redis integration for code storage
Acceptance: Unit tests pass, codes expire correctly
Phase 2: API Integration
- Add MFA endpoints to existing auth flow
- Integrate with AuthService
- Add rate limiting
Acceptance: Integration tests pass, rate limiting works
Phase 3: Frontend Updates
- Add MFA input screen
- Update login flow
- Handle error states
Acceptance: E2E tests pass, UX matches design
Phase 4: Monitoring & Rollout
- Add metrics for MFA success/failure rates
- Feature flag for gradual rollout
- Documentation
Acceptance: Dashboards work, team trained

I review this roadmap, adjust phase boundaries if needed, and only then move to Phase 2.

Phase 2: Exhaustive Execution Planning

For each phase, I ask the AI to create an exhaustive execution plan. This is where the detail matters.

execution-plan-phase1.txt
Phase 1: Core MFA Service - Detailed Execution Plan
Task 1.1: Create MFAService module structure
- Create /src/services/MFAService.ts
- Define interface: generateCode(userId), validateCode(userId, code)
- Add dependency injection setup
Task 1.2: Implement code generation
- Use crypto.randomInt() for secure random
- Format: 6-digit numeric string
- Max attempts: 10 per 5-minute window
Task 1.3: Add Redis integration
- Key format: mfa:{userId}:{code}
- TTL: 300 seconds (5 minutes)
- Store hash of code for security
Task 1.4: Write unit tests
- Test: code is 6 digits
- Test: code expires after 5 minutes
- Test: max attempts enforcement
- Test: concurrent code generation

Now when Codex generates code, it has explicit instructions. No guesswork, no architectural improvisation.

Codex vs Opus: Choosing the Right Model

The Reddit discussion made a key point about model selection: “I want control over creative. Codex gives me that.”

Here’s how I think about it:

Model Selection Guide
Model Selection Guide:
Codex (Execution-focused):
- Spec-driven workflows
- Following detailed plans
- Consistent outputs across sessions
- When you want control over decisions
Opus (Creativity-focused):
- Exploring options
- Generating ideas
- Complex reasoning tasks
- When you need architectural input
Rule of thumb:
- Have a spec? Use Codex
- Need a spec? Use Opus to help create one

I use Opus during the specification phase when I need help thinking through architecture. Then I switch to Codex for execution. This combination works better than using either model alone.

Common Mistakes I Made (So You Don’t Have To)

Mistake 1: Skipping the Spec

Early on, I’d think “this feature is simple enough, I’ll just describe it as I go.” That approach led right back to drift and rework. Even simple features benefit from a brief spec.

Mistake 2: Over-Specifying

The opposite mistake: writing specs that are too detailed at the line level. Specs should describe what needs to happen, not how to write every line. Leave implementation details to the AI.

Spec Detail Level
WRONG (over-specified):
"Create a function called validateEmail that takes email as string param,
uses regex pattern /^[^\s@]+@[^\s@]+\.[^\s@]+$/ to validate,
returns boolean, and logs validation attempts to console."
RIGHT (appropriately specified):
"Create email validation with regex pattern, return boolean,
log validation attempts for debugging."

Mistake 3: Mixing Models at Wrong Phases

Using Opus for execution when you have a detailed spec wastes its reasoning capability. Using Codex for architecture decisions produces suboptimal designs. Match model strengths to task types.

Mistake 4: Not Reviewing AI Plans

I used to accept AI-generated roadmaps without validation. Then I’d realize halfway through that Phase 2 depends on something missing from Phase 1. Now I review every plan before execution.

Mistake 5: Stale Specs

Requirements change. When they do, update your specs. I’ve seen specs that contradicted the actual codebase because someone updated code but not documentation. The AI then generates against outdated specs, creating technical debt.

When to Use Spec-Driven vs Traditional Prompts

Spec-driven development isn’t always the right choice. Here’s my decision matrix:

When to Use Each Approach
Use Spec-Driven When:
- Building features with multiple components
- Working on code that will be maintained over time
- Collaborating with team members (specs become documentation)
- You need consistent results across sessions
- The cost of rework is high
Use Traditional Prompts When:
- Quick scripts and one-off utilities
- Exploring and prototyping ideas
- Simple, single-file solutions
- Speed matters more than precision
- You're learning something new and don't know enough to spec

Tool Integration: Making Specs Less Tedious

Managing specs manually becomes tedious. Tools like traycer (mentioned in the Reddit discussion) provide spec-driven workspaces out of the box.

I haven’t used traycer specifically, but the principle holds: orchestration tools help you maintain and version your specs alongside your code.

The workflow I’m moving toward:

  1. Specs live in the repo as markdown files
  2. AI tools read specs before generating code
  3. PRs include both spec changes and code changes
  4. Specs serve as documentation for new team members

The Upfront Investment Pays Off

Spec-driven development requires more time upfront. I spend 20-30% of my time on specification before writing any code. But my overall development time has decreased because:

  1. Less rework from mismatched expectations
  2. Faster debugging (specs tell me where to look)
  3. Better collaboration (specs are shared understanding)
  4. Reproducible results across sessions

The Reddit poster who sparked this thinking said: “What really made Codex click for me was combining it with spec-driven development.” I’ve found the same. The model matters less than the methodology.

Summary

Spec-driven development with AI coding assistants works because it front-loads the thinking. Instead of hoping the AI understands your intent, you make that intent explicit through specifications. Codex excels at executing against specs. Opus excels at helping create them. Used together, they form a powerful workflow that reduces drift and increases control.

Start with a user story. Define your architecture. Create a roadmap. Generate execution plans. Then let the AI execute. The upfront investment pays off in predictable, maintainable code.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments