Skip to content

Spec-Driven Development: How to Make AI Coding Reliable and Production-Ready

I asked an AI agent to implement social login for my app. It created three different auth flows in three sessions—none of them matched. Each time I said “fix it,” it made different assumptions. After hours of back-and-forth, I had a mess.

The problem wasn’t the AI. It was my instructions.

Vibe Coding Fails Because Vague Begets Vague

When you tell an AI coding agent “add login” or “fix the bug,” you’re gambling. The agent fills gaps with guesses. Different sessions produce different results. Debugging becomes archaeology.

I tried being more specific. “Add Google OAuth using NextAuth.” Better, but still—I’d get different implementations across sessions. One used JWT, another used database sessions. One handled error states, another didn’t.

The 2025 industry consensus is clear: AI coding agents fail not because models are weak, but because instructions are vague.

The Shift: Say “What” Before “How”

Spec-Driven Development (SDD) forces you to define what success looks like before any code is written. It’s a four-phase workflow:

SDD Workflow
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ SPECIFY │ ──▶ │ PLAN │ ──▶ │ TASKS │ ──▶ │ IMPLEMENT │
│ │ │ │ │ │ │ │
│ "What" │ │ "How" │ │ "Order" │ │ "Do It" │
│ (func) │ │ (tech) │ │ (chunks) │ │ (agent) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

Let me walk through each phase with my social login example.

Phase 1: Specify — The Functional Contract

I started by writing a spec that describes user-facing behavior, not implementation:

SPEC.md
## Feature: Social Login
Users can sign in using Google and GitHub accounts.
### Acceptance Criteria
**Scenario: New user signs in with Google**
- Given: A new user on the login page
- When: They click "Sign in with Google" and authorize
- Then: Redirect to Dashboard with a valid session
**Scenario: Same email, different provider**
- Given: A user already linked to Google
- When: They sign in with GitHub using the same email
- Then: Accounts are merged into one profile
**Scenario: User cancels authorization**
- Given: User clicks "Sign in with Google"
- When: They cancel in the Google consent screen
- Then: Show friendly error message (not blank screen)

Key insight: This spec is technology-agnostic. It doesn’t mention NextAuth, JWT, or PostgreSQL. It describes what the feature does from the user’s perspective.

Also notice: the acceptance criteria are testable. I can verify each scenario.

Phase 2: Plan — Inject Developer Expertise

Now I add the technical decisions. This is where my expertise matters:

PLAN.md
## Technical Approach
**Stack Decisions**
- Auth library: NextAuth.js v5 with App Router
- Session strategy: JWT stored in HttpOnly cookie
- Database: Extend existing User entity with `provider` and `providerAccountId` fields
**Constraints**
- Login endpoint response time: < 100ms (P99)
- No external dependencies beyond existing SendGrid integration
**Patterns to Follow**
- Match existing auth flow structure in `/app/auth/`
- Use existing error boundary component for OAuth failures
- Follow existing rate limiting pattern (Redis, sliding window)
**Test Strategy**
- Unit test coverage: ≥ 90% for auth flow
- Integration tests: Full OAuth flow with mocked providers
- E2E tests: Login → Dashboard redirect path

This tells the agent exactly how I want it implemented. No guessing needed.

Phase 3: Tasks — Break It Down

I split the work into small, ordered tasks:

TASKS.md
## Implementation Tasks
1. [ ] Add Google provider to NextAuth config
2. [ ] Add GitHub provider to NextAuth config
3. [ ] Create UserAccount entity with provider fields
4. [ ] Implement account linking logic in callbacks
5. [ ] Create OAuth error handling component
6. [ ] Add rate limiting to auth endpoints
7. [ ] Write unit tests for account linking
8. [ ] Write integration tests for full OAuth flow

Each task is scoped to a single agent session. Each produces verifiable changes.

Phase 4: Implement — Let the Agent Work

Now the agent executes. Here’s what changed compared to my “vibe coding” attempts:

Before vs After Comparison
┌────────────────────────────┬────────────────────────────┐
│ VIBE CODING │ SPEC-DRIVEN │
├────────────────────────────┼────────────────────────────┤
│ "Add Google login" │ Spec: "New user signs in, │
│ │ gets redirected to │
│ │ Dashboard with valid │
│ │ session" │
├────────────────────────────┼────────────────────────────┤
│ Agent guesses: │ Agent knows: │
│ - JWT or database session? │ - JWT (from plan) │
│ - How to handle errors? │ - Use error boundary │
│ - Rate limiting? │ - Follow existing pattern │
├────────────────────────────┼────────────────────────────┤
│ Result: │ Result: │
│ - Different each time │ - Reproducible │
│ - Missing edge cases │ - All criteria met │
│ - Tech debt accumulates │ - Clean, reviewable code │
└────────────────────────────┴────────────────────────────┘

Maturity Levels: Where to Start

Not every team needs full SDD adoption. Here’s the progression:

SDD Maturity Levels
L1: Spec-First
└── Write spec before coding
└── Archive after completion
└── Benefit: 50% fewer revision cycles
L2: Spec-Anchored
└── Spec lives in repo
└── Evolves with code
└── Benefit: New team members onboard faster
L3: Spec-as-Source
└── Spec is the master artifact
└── Changing spec changes system
└── Benefit: Automated verification, living documentation

Start at L1. Write a spec, then code. You’ll see immediate benefits.

The Mistake I Made Early On

I wrote specs that were too technical:

Bad Spec (Too Technical)
## OAuth Implementation
Use NextAuth.js with Google provider. Store JWT in HttpOnly
cookie with 7-day expiry. Use Prisma adapter for user storage.

This belongs in the plan, not the spec. The spec should be readable by non-technical stakeholders:

Good Spec (Functional)
## Feature: Social Login
Users can sign in using their existing Google or GitHub
accounts instead of creating new passwords.

Why Industry Leaders Adopted This

By 2025, major players converged on this approach:

  • GitHub open-sourced Spec Kit with spec → plan → task → implement structure
  • OpenAI built Symphony requiring SPEC.md as a contract for each issue
  • Anthropic added Plan Mode to Claude Code—a lightweight spec system
  • AWS launched Kiro, spec-core-driven agent development

They all realized the same thing: the bottleneck isn’t AI capability, it’s instruction clarity.

How to Start

Pick your next feature. Before asking an AI agent to implement:

  1. Write a spec (functional, testable, tech-agnostic)
  2. Write a plan (technical decisions, constraints, patterns)
  3. Break into tasks (small, ordered, verifiable)
  4. Let the agent implement

The upfront investment pays off immediately: fewer revision cycles, reproducible results, and code that actually matches what you wanted.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments