How to Cut Claude Code Costs by 35x: Use Opus for Planning, Cheaper Models for Execution
Problem
I was burning through my Claude Code budget faster than expected. A typical coding session with Claude Opus would consume $5-10 worth of tokens, and I wasn’t even doing particularly complex work. Just routine code generation, some debugging, and occasional refactoring.
Session 1: Bug fix + refactoringTokens used: ~50,000 input + ~20,000 outputCost: $2.50 (input) + $12.50 (output) = $15.00
Session 2: Feature implementationTokens used: ~80,000 input + ~35,000 outputCost: $4.00 (input) + $21.88 (output) = $25.88
Weekly total: $40-50 for routine coding tasksThe pattern was clear: most of my token usage wasn’t from complex reasoning—it was from routine implementation work. I was using a Ferrari (Opus) to do grocery shopping.
The Cost Reality Check
I sat down and analyzed where my tokens were going:
Task breakdown by token consumption:- Planning/Architecture decisions: ~15% of tokens- Code implementation: ~60% of tokens- Bug fixing and iteration: ~20% of tokens- Final review and polish: ~5% of tokensThe insight hit me: I was using Opus’s $25/million output tokens for code that a much cheaper model could write. The planning decisions—the actual strategic thinking—only represented 15% of my usage.
That’s when I discovered the plan-execute pattern on Reddit. One user mentioned:
“I’m using Opus for planning and then GPT OSS 20B for implementation and given it’s $0.7 per million output tokens vs the $25 for Opus, there is huge headroom for error/validation cycles.”
Let me do the math:
Cost comparison (per million output tokens):- Claude Opus: ~$25- GPT OSS 20B: ~$0.70- Claude Haiku: ~$1.25
Potential savings: 35x (Opus to GPT OSS) or 20x (Opus to Haiku)The question was: would this actually work in practice?
What I Tried First
Attempt 1: Manual Model Switching
I started by manually switching between models:
Step 1: Ask Opus for a planUser: "I need to implement user authentication with JWT. Plan the implementation."
Opus output:- Create auth middleware- Define user schema- Implement login endpoint- Implement registration- Add password hashing- Add input validation- Implement rate limiting
Step 2: Copy plan to cheaper modelUser: "Implement this plan: [paste Opus plan]"Model: GPT OSS 20B or HaikuThis worked, but it was tedious. I had to manually copy plans between conversations, and the context handoff was clumsy.
Attempt 2: Building a Workflow
I realized I needed to automate this. Here’s the workflow I designed:
┌─────────────────────────────────────────────────────────┐│ PLAN-EXECUTE WORKFLOW │├─────────────────────────────────────────────────────────┤│ ││ 1. PLANNING PHASE (Opus) ││ ┌─────────┐ ┌──────────────┐ ┌────────────┐ ││ │ Task │───►│ Opus Planner │───►│ YAML Plan │ ││ │ Input │ │ (Strategic) │ │ Contract │ ││ └─────────┘ └──────────────┘ └────────────┘ ││ │ ││ ▼ ││ 2. EXECUTION PHASE (Cheaper Model) ││ ┌────────────┐ ┌──────────────┐ ┌────────┐ ││ │ YAML Plan │───►│ GPT/Haiku │───►│ Code │ ││ │ Contract │ │ Executor │ │ Output │ ││ └────────────┘ └──────────────┘ └────────┘ ││ │ ││ ▼ ││ 3. VALIDATION PHASE (Cheaper Model) ││ ┌────────┐ ┌──────────────┐ ┌─────────────┐ ││ │ Code │───►│ Validator │───►│ Test Results│ ││ │ Output │ │ │ │ │ ││ └────────┘ └──────────────┘ └─────────────┘ ││ │ ││ ▼ ││ ┌─────────────┐ ││ │ Pass? ──────┼────► NO ──► Return to Execution ││ │ │ ││ │ YES ▼ ││ └─────────────┘ ││ │ ││ ▼ ││ 4. REVIEW PHASE (Opus - Optional) ││ ┌─────────────┐ ┌──────────────┐ ││ │ Final Code │───►│ Opus Reviewer│───► Done! ││ └─────────────┘ └──────────────┘ ││ │└─────────────────────────────────────────────────────────┘The YAML Contract
The key innovation was creating a structured YAML contract between phases:
task: "Implement user authentication API"planner: "claude-opus-4"executor: "gpt-oss-20b"
plan: phases: - name: "Setup" steps: - "Create auth middleware" - "Define user schema" estimated_tokens: 500
- name: "Core Logic" steps: - "Implement login endpoint" - "Implement registration" - "Add password hashing" estimated_tokens: 2000
- name: "Validation" steps: - "Add input validation" - "Implement rate limiting" estimated_tokens: 1000
validation: - "All endpoints return correct HTTP codes" - "Passwords are properly hashed" - "Rate limiting prevents brute force"
success_criteria: - "Tests pass at 80% coverage" - "No security vulnerabilities" - "Response time < 200ms"This contract serves as a handoff document. Opus creates it, the cheaper model follows it.
Implementation: The Workflow Code
Here’s the Python implementation I built:
from typing import Dict, Anyimport yaml
class PlanExecuteWorkflow: def __init__(self, planner_model: str, executor_model: str): self.planner = planner_model # "claude-opus-4" self.executor = executor_model # "gpt-oss-20b"
def plan(self, task_description: str) -> Dict[str, Any]: """Use Opus for strategic planning""" response = self.planner.generate( f"Create an implementation plan for: {task_description}" ) return yaml.safe_load(response)
def execute(self, plan: Dict[str, Any]) -> str: """Use cheaper model for execution""" results = [] for phase in plan["phases"]: for step in phase["steps"]: code = self.executor.generate( f"Implement: {step}\nContext: {phase['name']}" ) results.append(code) return "\n".join(results)
def validate(self, output: str, criteria: list) -> bool: """Validate output against success criteria""" # Use cheaper model for validation for criterion in criteria: result = self.executor.validate(output, criterion) if not result.passed: return False return True
# Usageworkflow = PlanExecuteWorkflow( planner_model="claude-opus-4", executor_model="gpt-oss-20b")
plan = workflow.plan("Build a REST API for user management")code = workflow.execute(plan)success = workflow.validate(code, plan["success_criteria"])The Results
After implementing this pattern, my costs dropped dramatically:
BEFORE (Opus only):- Planning: $3.75 (15% of $25)- Implementation: $15.00 (60% of $25)- Bug fixing: $5.00 (20% of $25)- Review: $1.25 (5% of $25)- Total: $25.00 per session
AFTER (Plan-Execute Pattern):- Planning (Opus): $3.75 (unchanged)- Implementation (GPT OSS): $0.42 ($15 * 0.7/25)- Bug fixing (GPT OSS): $0.14 ($5 * 0.7/25)- Review (Opus): $1.25 (unchanged)- Total: $5.56 per session
Savings: 78% cost reductionWait, that’s not 35x. Let me recalculate focusing on the execution phase only:
Execution phase cost comparison:- Opus: $15 + $5 = $20- GPT OSS: $0.42 + $0.14 = $0.56
Ratio: $20 / $0.56 = 35.7x savings on executionThe 35x savings is specifically on the execution phase where the cheaper model does the heavy lifting.
Common Mistakes I Made
Mistake 1: Using Opus for Everything Initially
I used Opus for simple code generation like:
# Overkill: Using Opus for thisdef calculate_total(items): return sum(item.price for item in items)This was like hiring a Nobel laureate to do basic arithmetic. I learned to recognize tasks that don’t require strategic thinking.
Mistake 2: Skipping the YAML Contract
My first attempt at switching models failed because I didn’t have a structured handoff. The cheaper model had no context about what to build.
[WITHOUT CONTRACT]Opus: "Build authentication system"Haiku: "What kind? OAuth? Basic auth? Session-based?"
Result: Confusion, back-and-forth, wasted tokensThe YAML contract solves this:
[WITH CONTRACT]Opus: Creates detailed plan with specific requirementsHaiku: Reads plan, executes step-by-stepResult: Clean handoff, minimal tokens wastedMistake 3: No Validation Layer
I initially trusted the cheaper model’s output completely. Bad idea. I needed validation cycles:
Iteration 1: Haiku generates codeValidation: Check tests passResult: 2 tests failing
Iteration 2: Haiku fixes failing testsValidation: Check tests passResult: All tests pass, but security vulnerability found
Iteration 3: Haiku fixes vulnerabilityValidation: Security check passesResult: Done!The key insight: cheaper models allow more iteration cycles without breaking the budget.
Mistake 4: Combining Token-Heavy Extensions
I also learned that some Claude Code extensions are “token huggers”:
Warning from community:- "Superpowers" extension: consumes extra tokens- "get-shit-done" extension: consumes extra tokens- Combined: "will burn through your tokens"
My advice: Audit your extensions and disable unnecessary onesMistake 5: Poor Task Decomposition
Finding the right granularity matters:
TOO GRANULAR:Phase 1: Create file auth.pyPhase 2: Import flaskPhase 3: Define function loginResult: Excessive model switches, overhead dominates
TOO COARSE:Phase 1: Build entire authentication systemResult: Cheaper model struggles, quality drops
JUST RIGHT:Phase 1: Setup - Create auth middleware and user schemaPhase 2: Core Logic - Implement login, registration, password hashingPhase 3: Validation - Add input validation and rate limitingWhen This Pattern Works Best
Based on my experience, the plan-execute pattern is ideal for:
IDEAL USE CASES:- Large, well-defined projects with clear requirements- Teams with predictable coding patterns- Budget-conscious development- Projects with clear validation criteria (tests exist)- Feature additions to existing codebases
NOT IDEAL FOR:- Exploratory coding where requirements change rapidly- One-off scripts that take 5 minutes to write- Projects without clear validation criteria- Situations where you need Opus's deep reasoning throughoutTrade-offs to Consider
This pattern isn’t free:
BENEFITS:- 35x cost reduction on execution phase- Quality preserved for strategic decisions (Opus still plans)- More iteration cycles possible (cheap to retry)- Better budget efficiency
TRADE-OFFS:- Increased workflow complexity (need orchestration)- Context switching overhead- Potential quality gaps (cheaper models miss nuances)- Setup time (initial implementation takes effort)Summary
In this post, I explained how to reduce Claude Code costs by up to 35x using a plan-execute pattern. The key insight is simple: use expensive models (Opus) for strategic decisions and cheap models (GPT OSS 20B, Haiku) for implementation.
The workflow is:
- Planning Phase (Opus): Create detailed YAML plan with phases, steps, and success criteria
- Execution Phase (Cheaper Model): Follow the plan step-by-step
- Validation Phase (Cheaper Model): Run tests and check success criteria
- Review Phase (Opus, optional): Final architectural review if needed
The YAML contract between phases ensures clean handoffs. Validation cycles catch issues early. The result is dramatic cost savings while maintaining output quality.
For my workflow, this meant going from $25 per session to under $6—a 78% overall reduction, with 35x savings specifically on the execution phase.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit Discussion: Claude Code Cost Optimization
- 👨💻 Claude Pricing Documentation
- 👨💻 Context Window Management
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments