Skip to content

What is Spec-Driven Development for AI Coding? The Six Primitives Explained

The Problem with Ad-Hoc AI Coding

I kept hitting the same wall. I’d start a coding session with an AI assistant, throw in a prompt, get some code, iterate a bit, and eventually ship something. But three weeks later, I couldn’t reproduce the same quality. The prompts were scattered across chat histories. The reasoning behind decisions was lost. I had no system—just a collection of one-off interactions.

Then I discovered spec-driven development (SDD), and everything clicked into place.

The Core Idea: Markdown is the New Source Code

Software engineering has moved up a level. The actual code—the thing a developer writes, reviews, versions, argues about in PRs—is increasingly the markdown: the plans, the specs, the rubrics, the references, the retrospectives.

Markdown is the new source code, and code is the new assembly.

This isn’t just a cute metaphor. It’s a fundamental shift in how we work with AI. When I started treating my prompts like code—with modularity, naming, review, testability, and versioning—my AI coding workflows became reproducible, maintainable, and actually scalable.

But when I looked at existing SDD frameworks like OpenSpec, GitHub Spec Kit, and Get Shit Done (GSD), I realized they all implement the same underlying primitives. Understanding these primitives helped me choose the right framework for my needs—and even customize it.

The Six Primitives of Spec-Driven Development

Every SDD system is built from the same six building blocks. Here’s what they are and why they matter.

Primitive 1: Context is a Budget

Nothing loads by default. This was the hardest mental shift for me.

In traditional coding, I import what I need and the environment figures it out. But with AI, context is a finite resource. Every file I load, every piece of documentation I include—it all eats into the context window. And when that window fills up, the AI loses coherence.

So SDD treats context like a budget. You pull in only what’s needed for the specific task at hand.

Example: Scoped Context Loading
context:
task: "implement-auth-flow"
files:
- path: "src/auth/*.ts"
reason: "auth implementation files"
- path: "docs/auth-spec.md"
reason: "authentication specification"
exclude:
- "node_modules/**"
- "*.test.ts"

The pattern is simple: be intentional about what enters the context. Smaller, focused context windows lead to better AI outputs.

Primitive 2: Prompts are a State Machine

I used to write prompts as one-off requests. Now I see them as a state machine with distinct phases.

Every SDD system has:

  • Routers: Determine which workflow to trigger
  • Phases: Break down complex tasks into stages
  • Skills: Reusable capabilities the agent can invoke
  • Templates: Pre-defined structures for common outputs
  • References: Documentation and examples the agent can consult
Example: State Machine Prompt Structure
# Router
If request is about implementation:
-> Phase: Plan
-> Phase: Implement
-> Phase: Review
If request is about debugging:
-> Phase: Investigate
-> Phase: Fix
-> Phase: Verify
# Skills Available
- file-reader: Read and understand code
- test-runner: Execute test suites
- linter: Check code quality

When I structure prompts this way, the AI agent has a clear navigation path. It knows what phase it’s in, what skills it can use, and what the next step should be.

Primitive 3: Correctness is Adversarial

This one changed my results dramatically.

Instead of trusting the AI’s output, SDD uses a generator vs. reviewer pattern. One agent generates code. Another agent—often with different context or a different prompt template—reviews it.

Example: Adversarial Review Setup
generator:
model: "claude-sonnet"
context: ["spec.md", "existing-code/"]
task: "implement feature X"
reviewer:
model: "claude-sonnet" # can be same or different
context: ["spec.md", "test-requirements.md"]
task: "verify implementation matches spec"

The reviewer isn’t just looking for bugs. It’s checking against the spec, the test requirements, and the broader system constraints. This adversarial approach catches issues that a single-pass generation would miss.

Primitive 4: Guidance Inside, Gates Outside

Here’s where SDD diverges from traditional automation.

Inside the workflow, the agent has freedom. It picks the approach, makes micro-decisions, and adapts to the specific codebase. But outside the workflow, there are deterministic walls—gates that the output must pass through.

Example: Gates vs. Guidance
# Guidance (Inside - Agent Decides)
- Which files to read
- How to structure the implementation
- Variable naming conventions
- Which patterns to apply
# Gates (Outside - Must Pass)
- All tests must pass
- No lint errors
- Code coverage must be >80%
- No security vulnerabilities
- PR description must follow template

The gates are non-negotiable. They’re checked by deterministic tools—test runners, linters, security scanners. But inside those walls, the agent has the flexibility to find the best solution for the specific context.

Primitive 5: The System Rewrites Itself

This is where SDD becomes a meta-system.

After each significant interaction, the system runs retrospectives. These aren’t just post-mortems—they’re structured analyses that modify the framework files themselves.

Example: Retrospective Template
# Retrospective: [Task Name]
## What Worked
- [Patterns, approaches, context choices that succeeded]
## What Didn't Work
- [Failed approaches, missed edge cases, context overflows]
## Framework Updates
- [Specific changes to templates, prompts, or context rules]
## Metric Changes
- [Updates to thresholds, budgets, or quality gates]

When I implement a feature and the reviewer catches a common mistake, the retrospective updates the generator’s prompt template to avoid that mistake in the future. The system learns from its own execution.

Primitive 6: The System Can See Itself

The final primitive is introspection.

Every SDD system should be able to render a live architecture diagram of itself. This isn’t just documentation—it’s a diagnostic tool.

Example: Introspector Output
# Current System Architecture
## Active Workflows
- feature-implementation: 3 active
- bug-fix: 1 active
- refactoring: 0 active
## Context Budgets
- feature-implementation: 45k/200k tokens used
- bug-fix: 12k/200k tokens used
## Recent Framework Updates
- Added retry logic to router (2 hours ago)
- Updated reviewer template for edge cases (1 day ago)
- Increased context budget for refactoring workflow (3 days ago)

When something goes wrong—or goes surprisingly well—I can inspect the system’s state. I can see which workflows are active, how context budgets are being used, and what recent changes have been made to the framework.

Common Misconceptions

As I explored SDD, I ran into several misconceptions that are worth addressing.

“SDD is just detailed prompts” - No. SDD is a system design. Detailed prompts are one component, but the real power comes from the orchestration—how prompts work together, how context is managed, how correctness is verified.

“Frameworks solve everything” - GitHub Spec Kit, OpenSpec, and Get Shit Done are good pieces of work. But if SDD is system design, the normal rule of system design applies: the best solution is the one shaped to your specific constraints. A framework is a starting point, not a complete solution.

“This is waterfall” - SDD isn’t about big upfront design. It’s about having a systematic approach that you can iterate on. The retrospectives ensure that the system evolves based on actual usage, not theoretical planning.

“Just use plan mode” - Plan mode in AI assistants is useful, but it’s a feature, not a system. SDD treats planning as a first-class artifact that’s versioned, reviewed, and refined over time.

Choosing or Building Your SDD System

When I evaluated existing frameworks, I looked at three factors:

  1. Integration with my stack - Does it work with my existing tools (git, CI/CD, code review)?
  2. Customization depth - Can I modify the primitives without fighting the framework?
  3. Introspection support - Can I see what the system is doing and why?

For simple projects, a lightweight framework like GSD might be enough. For complex, evolving codebases, I needed something more configurable—so I built a custom system that implements all six primitives but adapts them to my specific workflow.

The key insight: don’t just adopt a framework because it’s popular. Understand the primitives, evaluate your constraints, and choose (or build) accordingly.

The Payoff: Reproducible Quality

After implementing my SDD system, the difference was stark.

Before: Each AI coding session was a roll of the dice. Sometimes I got great results, sometimes I spent hours debugging hallucinated code.

After: I have a reproducible process. The same task, with the same context, produces consistent quality. And when quality drops, I can inspect the system, find the issue, and fix the framework itself.

Spec-driven development isn’t a silver bullet. But for anyone serious about AI-assisted coding, it’s the difference between treating AI as a magic box and treating it as a tool you can actually engineer with.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments