What is Spec-Driven Development for AI Coding? The Six Primitives Explained
The Problem with Ad-Hoc AI Coding
I kept hitting the same wall. I’d start a coding session with an AI assistant, throw in a prompt, get some code, iterate a bit, and eventually ship something. But three weeks later, I couldn’t reproduce the same quality. The prompts were scattered across chat histories. The reasoning behind decisions was lost. I had no system—just a collection of one-off interactions.
Then I discovered spec-driven development (SDD), and everything clicked into place.
The Core Idea: Markdown is the New Source Code
Software engineering has moved up a level. The actual code—the thing a developer writes, reviews, versions, argues about in PRs—is increasingly the markdown: the plans, the specs, the rubrics, the references, the retrospectives.
Markdown is the new source code, and code is the new assembly.
This isn’t just a cute metaphor. It’s a fundamental shift in how we work with AI. When I started treating my prompts like code—with modularity, naming, review, testability, and versioning—my AI coding workflows became reproducible, maintainable, and actually scalable.
But when I looked at existing SDD frameworks like OpenSpec, GitHub Spec Kit, and Get Shit Done (GSD), I realized they all implement the same underlying primitives. Understanding these primitives helped me choose the right framework for my needs—and even customize it.
The Six Primitives of Spec-Driven Development
Every SDD system is built from the same six building blocks. Here’s what they are and why they matter.
Primitive 1: Context is a Budget
Nothing loads by default. This was the hardest mental shift for me.
In traditional coding, I import what I need and the environment figures it out. But with AI, context is a finite resource. Every file I load, every piece of documentation I include—it all eats into the context window. And when that window fills up, the AI loses coherence.
So SDD treats context like a budget. You pull in only what’s needed for the specific task at hand.
context: task: "implement-auth-flow" files: - path: "src/auth/*.ts" reason: "auth implementation files" - path: "docs/auth-spec.md" reason: "authentication specification" exclude: - "node_modules/**" - "*.test.ts"The pattern is simple: be intentional about what enters the context. Smaller, focused context windows lead to better AI outputs.
Primitive 2: Prompts are a State Machine
I used to write prompts as one-off requests. Now I see them as a state machine with distinct phases.
Every SDD system has:
- Routers: Determine which workflow to trigger
- Phases: Break down complex tasks into stages
- Skills: Reusable capabilities the agent can invoke
- Templates: Pre-defined structures for common outputs
- References: Documentation and examples the agent can consult
# RouterIf request is about implementation: -> Phase: Plan -> Phase: Implement -> Phase: Review
If request is about debugging: -> Phase: Investigate -> Phase: Fix -> Phase: Verify
# Skills Available- file-reader: Read and understand code- test-runner: Execute test suites- linter: Check code qualityWhen I structure prompts this way, the AI agent has a clear navigation path. It knows what phase it’s in, what skills it can use, and what the next step should be.
Primitive 3: Correctness is Adversarial
This one changed my results dramatically.
Instead of trusting the AI’s output, SDD uses a generator vs. reviewer pattern. One agent generates code. Another agent—often with different context or a different prompt template—reviews it.
generator: model: "claude-sonnet" context: ["spec.md", "existing-code/"] task: "implement feature X"
reviewer: model: "claude-sonnet" # can be same or different context: ["spec.md", "test-requirements.md"] task: "verify implementation matches spec"The reviewer isn’t just looking for bugs. It’s checking against the spec, the test requirements, and the broader system constraints. This adversarial approach catches issues that a single-pass generation would miss.
Primitive 4: Guidance Inside, Gates Outside
Here’s where SDD diverges from traditional automation.
Inside the workflow, the agent has freedom. It picks the approach, makes micro-decisions, and adapts to the specific codebase. But outside the workflow, there are deterministic walls—gates that the output must pass through.
# Guidance (Inside - Agent Decides)- Which files to read- How to structure the implementation- Variable naming conventions- Which patterns to apply
# Gates (Outside - Must Pass)- All tests must pass- No lint errors- Code coverage must be >80%- No security vulnerabilities- PR description must follow templateThe gates are non-negotiable. They’re checked by deterministic tools—test runners, linters, security scanners. But inside those walls, the agent has the flexibility to find the best solution for the specific context.
Primitive 5: The System Rewrites Itself
This is where SDD becomes a meta-system.
After each significant interaction, the system runs retrospectives. These aren’t just post-mortems—they’re structured analyses that modify the framework files themselves.
# Retrospective: [Task Name]
## What Worked- [Patterns, approaches, context choices that succeeded]
## What Didn't Work- [Failed approaches, missed edge cases, context overflows]
## Framework Updates- [Specific changes to templates, prompts, or context rules]
## Metric Changes- [Updates to thresholds, budgets, or quality gates]When I implement a feature and the reviewer catches a common mistake, the retrospective updates the generator’s prompt template to avoid that mistake in the future. The system learns from its own execution.
Primitive 6: The System Can See Itself
The final primitive is introspection.
Every SDD system should be able to render a live architecture diagram of itself. This isn’t just documentation—it’s a diagnostic tool.
# Current System Architecture
## Active Workflows- feature-implementation: 3 active- bug-fix: 1 active- refactoring: 0 active
## Context Budgets- feature-implementation: 45k/200k tokens used- bug-fix: 12k/200k tokens used
## Recent Framework Updates- Added retry logic to router (2 hours ago)- Updated reviewer template for edge cases (1 day ago)- Increased context budget for refactoring workflow (3 days ago)When something goes wrong—or goes surprisingly well—I can inspect the system’s state. I can see which workflows are active, how context budgets are being used, and what recent changes have been made to the framework.
Common Misconceptions
As I explored SDD, I ran into several misconceptions that are worth addressing.
“SDD is just detailed prompts” - No. SDD is a system design. Detailed prompts are one component, but the real power comes from the orchestration—how prompts work together, how context is managed, how correctness is verified.
“Frameworks solve everything” - GitHub Spec Kit, OpenSpec, and Get Shit Done are good pieces of work. But if SDD is system design, the normal rule of system design applies: the best solution is the one shaped to your specific constraints. A framework is a starting point, not a complete solution.
“This is waterfall” - SDD isn’t about big upfront design. It’s about having a systematic approach that you can iterate on. The retrospectives ensure that the system evolves based on actual usage, not theoretical planning.
“Just use plan mode” - Plan mode in AI assistants is useful, but it’s a feature, not a system. SDD treats planning as a first-class artifact that’s versioned, reviewed, and refined over time.
Choosing or Building Your SDD System
When I evaluated existing frameworks, I looked at three factors:
- Integration with my stack - Does it work with my existing tools (git, CI/CD, code review)?
- Customization depth - Can I modify the primitives without fighting the framework?
- Introspection support - Can I see what the system is doing and why?
For simple projects, a lightweight framework like GSD might be enough. For complex, evolving codebases, I needed something more configurable—so I built a custom system that implements all six primitives but adapts them to my specific workflow.
The key insight: don’t just adopt a framework because it’s popular. Understand the primitives, evaluate your constraints, and choose (or build) accordingly.
The Payoff: Reproducible Quality
After implementing my SDD system, the difference was stark.
Before: Each AI coding session was a roll of the dice. Sometimes I got great results, sometimes I spent hours debugging hallucinated code.
After: I have a reproducible process. The same task, with the same context, produces consistent quality. And when quality drops, I can inspect the system, find the issue, and fix the framework itself.
Spec-driven development isn’t a silver bullet. But for anyone serious about AI-assisted coding, it’s the difference between treating AI as a magic box and treating it as a tool you can actually engineer with.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments