Skip to content

How to Use Spec-Driven Development with Open Source AI Models

I’ve been testing various AI coding assistants, and one question keeps coming up: can cheaper, open-source models match Claude Code’s performance? After extensive experimentation with DeepSeek and other alternatives, I found the answer is yes—but only if you change how you work with them.

DeepSeek V4 Benchmark

The Problem: Claude Code Works Differently

Claude Code treats you like a user. You describe what you want, and it figures out the implementation details autonomously. It plans, executes, and iterates with minimal guidance.

Open-source models work differently. They treat you like a developer. If you give them the same vague prompts you use with Claude Code, you’ll get subpar results.

I learned this the hard way. My first attempts with DeepSeek produced verbose, sometimes incorrect code because I expected it to “just understand” what I needed. The token usage was high, the answers were often wrong, and I wasted time correcting outputs.

The Solution: Spec-Driven Development

Spec-driven development is your friend when working with open-source AI models. The core principle is simple: be as specific as possible and break down modules into smaller tasks.

Bad vs Good Prompts

Here’s how my prompts evolved:

Bad Prompt
Create a REST API for user management with authentication

This prompt works fine with Claude Code. With DeepSeek? It produced a monolithic file with poor error handling and no separation of concerns.

Good Spec-Driven Prompt
Create a FastAPI endpoint for user registration with the following specifications:
FILE: app/api/users.py
1. Endpoint: POST /api/v1/users/register
2. Request body (Pydantic model):
- email: str (validated email format)
- password: str (min 8 chars, must contain uppercase, lowercase, digit)
- username: str (alphanumeric, 3-20 chars)
3. Validation logic:
- Check if email already exists in database
- Check if username already exists in database
- Return 400 with specific error message for each case
4. Password handling:
- Hash password using bcrypt (work factor 12)
- Store only the hash
5. Database operation:
- Use SQLAlchemy with async session
- Insert new user record
- Return user_id and created_at timestamp
6. Response:
- Success: 201 with {"user_id": str, "created_at": str}
- Error: 400 with {"error": str, "field": str}
7. Include logging for: registration attempt, success, failure
8. Add type hints for all functions
9. Handle database connection errors gracefully

The second prompt produced clean, well-structured code that matched my requirements exactly.

Breaking Down Complex Logic

When I needed to build a complete microservice, I didn’t ask for everything at once. I created separate specs for each component:

Spec for Each Component
1. Database models (models/user.py)
2. Pydantic schemas (schemas/user.py)
3. Repository layer (repositories/user_repository.py)
4. Service layer (services/user_service.py)
5. API routes (api/users.py)
6. Unit tests (tests/test_user_service.py)

Each spec was explicit about:

  • Input/output types
  • Error cases
  • Dependencies
  • Logging requirements
  • Edge cases to handle

Iterative Refinement Workflow

My workflow now looks like this:

Spec-Driven Workflow
1. Write spec for one small module
2. Generate code
3. Test immediately
4. Fix issues with targeted prompts
5. Move to next module
6. Integrate modules
7. Add integration tests

This contrasts with my old Claude Code workflow where I’d ask for a complete feature and then iterate on the whole thing.

Prompt Template for Repeated Tasks

I created a reusable template for common patterns:

prompt-template.yaml
task: "{task_description}"
file: "{target_file_path}"
requirements:
input:
- {input_type_and_format}
output:
- {output_type_and_format}
validation:
- {validation_rule_1}
- {validation_rule_2}
error_handling:
- {error_case_1}: {expected_behavior_1}
- {error_case_2}: {expected_behavior_2}
dependencies:
- {dependency_1}
- {dependency_2}
style:
- Use type hints
- Add docstrings
- Follow {style_guide}

When I need to create a new endpoint or service, I fill in this template and get consistent results.

Why This Matters

Cost Savings

Claude Code is convenient but expensive. DeepSeek costs a fraction of the price. For high-volume coding tasks, the savings add up quickly.

Better Control

With spec-driven development, I know exactly what code will be generated. I’m not surprised by architectural decisions or hidden dependencies. The AI follows my plan rather than inventing its own.

Reproducibility

When I use the same spec twice, I get similar results. This makes it easier to maintain consistency across a codebase and onboard new team members.

Trade-offs

This approach requires more upfront effort. Writing detailed specs takes time. I need to think through edge cases, error handling, and integration points before generating any code.

The skill requirement is also higher. I need to know what I want and how to describe it precisely. Claude Code bridges that gap; open-source models don’t.

Common Mistakes to Avoid

Mistake 1: Using Claude-Style Prompts

Wrong Approach
Refactor this code to be better

This works with Claude Code because it infers what “better” means. With open-source models, you need to specify:

Correct Approach
Refactor this code by:
1. Extracting the validation logic into a separate function
2. Adding type hints to all parameters
3. Replacing the manual loop with list comprehension
4. Adding error handling for the database call

Mistake 2: Not Breaking Down Complex Logic

A 500-line module should never be generated in one prompt. Break it into logical units under 100 lines each. Generate, test, then integrate.

Mistake 3: Ignoring Model-Specific Behaviors

Chinese models like DeepSeek sometimes “think too much”—they over-explain, provide unnecessary context, or spiral into verbose responses. I counter this by adding explicit constraints to my specs:

Managing Verbosity
Constraints:
- Maximum 50 lines per function
- No tutorial-style comments
- Assume reader knows Python syntax
- Focus on implementation, not explanation

Final Thoughts

Open-source AI models can match Claude Code’s output quality, but they require a different workflow. The spec-driven approach shifts effort from debugging generated code to writing precise specifications. If you’re willing to invest that upfront time, you can significantly reduce your AI coding costs while maintaining code quality.

The key insight: Claude Code optimizes for convenience. Open-source models optimize for explicit instruction. Choose your tool based on what you’re optimizing for.

The real difference isn’t model capability—it’s how you communicate your intent. Spec-driven development isn’t just about saving money; it’s about writing better specs that lead to better code, regardless of which AI you use.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments