Skip to content

How to Cut Claude Code Costs by 35x: Use Opus for Planning, Cheaper Models for Execution

Problem

I was burning through my Claude Code budget faster than expected. A typical coding session with Claude Opus would consume $5-10 worth of tokens, and I wasn’t even doing particularly complex work. Just routine code generation, some debugging, and occasional refactoring.

Session 1: Bug fix + refactoring
Tokens used: ~50,000 input + ~20,000 output
Cost: $2.50 (input) + $12.50 (output) = $15.00
Session 2: Feature implementation
Tokens used: ~80,000 input + ~35,000 output
Cost: $4.00 (input) + $21.88 (output) = $25.88
Weekly total: $40-50 for routine coding tasks

The pattern was clear: most of my token usage wasn’t from complex reasoning—it was from routine implementation work. I was using a Ferrari (Opus) to do grocery shopping.

The Cost Reality Check

I sat down and analyzed where my tokens were going:

Task breakdown by token consumption:
- Planning/Architecture decisions: ~15% of tokens
- Code implementation: ~60% of tokens
- Bug fixing and iteration: ~20% of tokens
- Final review and polish: ~5% of tokens

The insight hit me: I was using Opus’s $25/million output tokens for code that a much cheaper model could write. The planning decisions—the actual strategic thinking—only represented 15% of my usage.

That’s when I discovered the plan-execute pattern on Reddit. One user mentioned:

“I’m using Opus for planning and then GPT OSS 20B for implementation and given it’s $0.7 per million output tokens vs the $25 for Opus, there is huge headroom for error/validation cycles.”

Let me do the math:

Cost comparison (per million output tokens):
- Claude Opus: ~$25
- GPT OSS 20B: ~$0.70
- Claude Haiku: ~$1.25
Potential savings: 35x (Opus to GPT OSS) or 20x (Opus to Haiku)

The question was: would this actually work in practice?

What I Tried First

Attempt 1: Manual Model Switching

I started by manually switching between models:

Step 1: Ask Opus for a plan
User: "I need to implement user authentication with JWT. Plan the implementation."
Opus output:
- Create auth middleware
- Define user schema
- Implement login endpoint
- Implement registration
- Add password hashing
- Add input validation
- Implement rate limiting
Step 2: Copy plan to cheaper model
User: "Implement this plan: [paste Opus plan]"
Model: GPT OSS 20B or Haiku

This worked, but it was tedious. I had to manually copy plans between conversations, and the context handoff was clumsy.

Attempt 2: Building a Workflow

I realized I needed to automate this. Here’s the workflow I designed:

┌─────────────────────────────────────────────────────────┐
│ PLAN-EXECUTE WORKFLOW │
├─────────────────────────────────────────────────────────┤
│ │
│ 1. PLANNING PHASE (Opus) │
│ ┌─────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Task │───►│ Opus Planner │───►│ YAML Plan │ │
│ │ Input │ │ (Strategic) │ │ Contract │ │
│ └─────────┘ └──────────────┘ └────────────┘ │
│ │ │
│ ▼ │
│ 2. EXECUTION PHASE (Cheaper Model) │
│ ┌────────────┐ ┌──────────────┐ ┌────────┐ │
│ │ YAML Plan │───►│ GPT/Haiku │───►│ Code │ │
│ │ Contract │ │ Executor │ │ Output │ │
│ └────────────┘ └──────────────┘ └────────┘ │
│ │ │
│ ▼ │
│ 3. VALIDATION PHASE (Cheaper Model) │
│ ┌────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Code │───►│ Validator │───►│ Test Results│ │
│ │ Output │ │ │ │ │ │
│ └────────┘ └──────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Pass? ──────┼────► NO ──► Return to Execution │
│ │ │ │
│ │ YES ▼ │
│ └─────────────┘ │
│ │ │
│ ▼ │
│ 4. REVIEW PHASE (Opus - Optional) │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Final Code │───►│ Opus Reviewer│───► Done! │
│ └─────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘

The YAML Contract

The key innovation was creating a structured YAML contract between phases:

example-plan-contract.yaml
task: "Implement user authentication API"
planner: "claude-opus-4"
executor: "gpt-oss-20b"
plan:
phases:
- name: "Setup"
steps:
- "Create auth middleware"
- "Define user schema"
estimated_tokens: 500
- name: "Core Logic"
steps:
- "Implement login endpoint"
- "Implement registration"
- "Add password hashing"
estimated_tokens: 2000
- name: "Validation"
steps:
- "Add input validation"
- "Implement rate limiting"
estimated_tokens: 1000
validation:
- "All endpoints return correct HTTP codes"
- "Passwords are properly hashed"
- "Rate limiting prevents brute force"
success_criteria:
- "Tests pass at 80% coverage"
- "No security vulnerabilities"
- "Response time < 200ms"

This contract serves as a handoff document. Opus creates it, the cheaper model follows it.

Implementation: The Workflow Code

Here’s the Python implementation I built:

plan_execute_workflow.py
from typing import Dict, Any
import yaml
class PlanExecuteWorkflow:
def __init__(self, planner_model: str, executor_model: str):
self.planner = planner_model # "claude-opus-4"
self.executor = executor_model # "gpt-oss-20b"
def plan(self, task_description: str) -> Dict[str, Any]:
"""Use Opus for strategic planning"""
response = self.planner.generate(
f"Create an implementation plan for: {task_description}"
)
return yaml.safe_load(response)
def execute(self, plan: Dict[str, Any]) -> str:
"""Use cheaper model for execution"""
results = []
for phase in plan["phases"]:
for step in phase["steps"]:
code = self.executor.generate(
f"Implement: {step}\nContext: {phase['name']}"
)
results.append(code)
return "\n".join(results)
def validate(self, output: str, criteria: list) -> bool:
"""Validate output against success criteria"""
# Use cheaper model for validation
for criterion in criteria:
result = self.executor.validate(output, criterion)
if not result.passed:
return False
return True
# Usage
workflow = PlanExecuteWorkflow(
planner_model="claude-opus-4",
executor_model="gpt-oss-20b"
)
plan = workflow.plan("Build a REST API for user management")
code = workflow.execute(plan)
success = workflow.validate(code, plan["success_criteria"])

The Results

After implementing this pattern, my costs dropped dramatically:

BEFORE (Opus only):
- Planning: $3.75 (15% of $25)
- Implementation: $15.00 (60% of $25)
- Bug fixing: $5.00 (20% of $25)
- Review: $1.25 (5% of $25)
- Total: $25.00 per session
AFTER (Plan-Execute Pattern):
- Planning (Opus): $3.75 (unchanged)
- Implementation (GPT OSS): $0.42 ($15 * 0.7/25)
- Bug fixing (GPT OSS): $0.14 ($5 * 0.7/25)
- Review (Opus): $1.25 (unchanged)
- Total: $5.56 per session
Savings: 78% cost reduction

Wait, that’s not 35x. Let me recalculate focusing on the execution phase only:

Execution phase cost comparison:
- Opus: $15 + $5 = $20
- GPT OSS: $0.42 + $0.14 = $0.56
Ratio: $20 / $0.56 = 35.7x savings on execution

The 35x savings is specifically on the execution phase where the cheaper model does the heavy lifting.

Common Mistakes I Made

Mistake 1: Using Opus for Everything Initially

I used Opus for simple code generation like:

# Overkill: Using Opus for this
def calculate_total(items):
return sum(item.price for item in items)

This was like hiring a Nobel laureate to do basic arithmetic. I learned to recognize tasks that don’t require strategic thinking.

Mistake 2: Skipping the YAML Contract

My first attempt at switching models failed because I didn’t have a structured handoff. The cheaper model had no context about what to build.

[WITHOUT CONTRACT]
Opus: "Build authentication system"
Haiku: "What kind? OAuth? Basic auth? Session-based?"
Result: Confusion, back-and-forth, wasted tokens

The YAML contract solves this:

[WITH CONTRACT]
Opus: Creates detailed plan with specific requirements
Haiku: Reads plan, executes step-by-step
Result: Clean handoff, minimal tokens wasted

Mistake 3: No Validation Layer

I initially trusted the cheaper model’s output completely. Bad idea. I needed validation cycles:

Iteration 1: Haiku generates code
Validation: Check tests pass
Result: 2 tests failing
Iteration 2: Haiku fixes failing tests
Validation: Check tests pass
Result: All tests pass, but security vulnerability found
Iteration 3: Haiku fixes vulnerability
Validation: Security check passes
Result: Done!

The key insight: cheaper models allow more iteration cycles without breaking the budget.

Mistake 4: Combining Token-Heavy Extensions

I also learned that some Claude Code extensions are “token huggers”:

Warning from community:
- "Superpowers" extension: consumes extra tokens
- "get-shit-done" extension: consumes extra tokens
- Combined: "will burn through your tokens"
My advice: Audit your extensions and disable unnecessary ones

Mistake 5: Poor Task Decomposition

Finding the right granularity matters:

TOO GRANULAR:
Phase 1: Create file auth.py
Phase 2: Import flask
Phase 3: Define function login
Result: Excessive model switches, overhead dominates
TOO COARSE:
Phase 1: Build entire authentication system
Result: Cheaper model struggles, quality drops
JUST RIGHT:
Phase 1: Setup - Create auth middleware and user schema
Phase 2: Core Logic - Implement login, registration, password hashing
Phase 3: Validation - Add input validation and rate limiting

When This Pattern Works Best

Based on my experience, the plan-execute pattern is ideal for:

IDEAL USE CASES:
- Large, well-defined projects with clear requirements
- Teams with predictable coding patterns
- Budget-conscious development
- Projects with clear validation criteria (tests exist)
- Feature additions to existing codebases
NOT IDEAL FOR:
- Exploratory coding where requirements change rapidly
- One-off scripts that take 5 minutes to write
- Projects without clear validation criteria
- Situations where you need Opus's deep reasoning throughout

Trade-offs to Consider

This pattern isn’t free:

BENEFITS:
- 35x cost reduction on execution phase
- Quality preserved for strategic decisions (Opus still plans)
- More iteration cycles possible (cheap to retry)
- Better budget efficiency
TRADE-OFFS:
- Increased workflow complexity (need orchestration)
- Context switching overhead
- Potential quality gaps (cheaper models miss nuances)
- Setup time (initial implementation takes effort)

Summary

In this post, I explained how to reduce Claude Code costs by up to 35x using a plan-execute pattern. The key insight is simple: use expensive models (Opus) for strategic decisions and cheap models (GPT OSS 20B, Haiku) for implementation.

The workflow is:

  1. Planning Phase (Opus): Create detailed YAML plan with phases, steps, and success criteria
  2. Execution Phase (Cheaper Model): Follow the plan step-by-step
  3. Validation Phase (Cheaper Model): Run tests and check success criteria
  4. Review Phase (Opus, optional): Final architectural review if needed

The YAML contract between phases ensures clean handoffs. Validation cycles catch issues early. The result is dramatic cost savings while maintaining output quality.

For my workflow, this meant going from $25 per session to under $6—a 78% overall reduction, with 35x savings specifically on the execution phase.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments