How to Optimize AI Coding Agent Costs: Multi-Provider Strategy That Saves Money Without Sacrificing Quality
I burned through $200 in my first month using AI coding agents. Not because I was doing anything fancy - just regular full-stack development, 8-10 hours a day. The culprit? I was routing every single task through Claude Opus, treating a $75-per-million-output-tokens model like it was a Swiss Army knife.
That’s when I realized: I was hiring a senior architect to write boilerplate code. Here’s how I fixed it.
The Problem: Single-Provider Tunnel Vision
AI coding agents operate differently than traditional coding assistants. They don’t just respond to occasional queries - they continuously process context, generate code, review outputs, iterate on solutions, and manage multi-file edits. This relentless operation mode burns through tokens at an alarming rate.
I used Claude Opus for everything:
- Generating boilerplate code
- Writing unit tests
- Simple refactoring
- Documentation updates
- Complex architectural decisions
The result? My $20 monthly subscription lasted three days. Switching to pay-as-you-go didn’t help much - I still hit $50+ per week during intense development sprints.
The core issue wasn’t the tools - it was my routing strategy. I was using a premium model for tasks that could be handled by models costing 1/100th the price.
The Solution: Three-Tier Multi-Provider Architecture
I restructured my setup into three distinct layers, each with specific models for specific tasks:
| Tier | Purpose | Models | Cost (per 1M output tokens) | Best For |
|---|---|---|---|---|
| Subagent Layer | Cheap, fast processing | Deepseek v3.2, Step3.5 flash, Mimo-v2-flash | $0.20 - $0.50 | Boilerplate, tests, docs, simple refactoring |
| Orchestration Layer | Planning and validation | Claude Sonnet 4, GPT-4o | $15 - $30 | Task decomposition, code review, quality checks |
| Premium Layer | Critical decisions | Claude Opus 4, GPT-4-turbo | $75 - $150 | Architecture, security, production-critical code |
Why This Works
The key insight is that 80% of coding tasks don’t require premium intelligence. Writing a unit test for a simple utility function? Deepseek v3.2 handles it perfectly at $0.27 per million output tokens. Designing a microservices architecture? That’s when I bring in Opus.
Here’s the math on a typical 8-hour coding session:
| Task Type | Token Usage (output) | Single Provider (Opus) | Multi-Provider | Savings |
|---|---|---|---|---|
| Boilerplate code | 50,000 tokens | $3.75 | $0.14 | 96% |
| Unit tests | 30,000 tokens | $2.25 | $0.08 | 96% |
| Simple refactoring | 20,000 tokens | $1.50 | $0.06 | 96% |
| Code review | 15,000 tokens | $1.13 | $0.45 | 60% |
| Architecture decisions | 5,000 tokens | $0.38 | $0.38 | 0% |
| Total | 120,000 tokens | $9.01 | $1.11 | 88% |
Implementation: OpenCode CLI Configuration
I use OpenCode CLI to manage my multi-provider setup. Here’s my actual configuration:
providers: # Tier 1: Subagent layer - cheap models deepseek: model: "deepseek/deepseek-chat-v3-0324" api_base: "https://api.deepseek.com/v1" api_key: "${DEEPSEEK_API_KEY}" max_tokens: 4096 temperature: 0.7 use_for: ["boilerplate", "tests", "docs", "simple-refactor"] cost_per_1m_input: 0.27 cost_per_1m_output: 1.10
step: model: "step/step-3.5-flash" api_key: "${STEP_API_KEY}" max_tokens: 4096 temperature: 0.3 use_for: ["format", "cleanup", "naming"] cost_per_1m_input: 0.05 cost_per_1m_output: 0.10
# Tier 2: Orchestration layer sonnet: model: "anthropic/claude-sonnet-4-20250514" api_key: "${ANTHROPIC_API_KEY}" max_tokens: 8192 temperature: 0.5 use_for: ["orchestration", "review", "planning", "debug-complex"] cost_per_1m_input: 3.00 cost_per_1m_output: 15.00
# Tier 3: Premium layer opus: model: "anthropic/claude-opus-4-20250514" api_key: "${ANTHROPIC_API_KEY}" max_tokens: 16384 temperature: 0.2 use_for: ["architecture", "security", "critical-review"] require_explicit_approval: true cost_per_1m_input: 15.00 cost_per_1m_output: 75.00
# Intelligent routing rulesrouting: default_provider: "deepseek"
# Task complexity mapping complexity_rules: low: # < 100 lines, single file, no dependencies provider: "deepseek" max_cost_estimate: 0.10
medium: # 100-500 lines, multiple files, some complexity provider: "sonnet" max_cost_estimate: 0.50
high: # > 500 lines, architectural changes, security-sensitive provider: "opus" max_cost_estimate: 2.00
# Explicit task routing task_mapping: generate_unit_tests: provider: "deepseek" temperature: 0.7 max_tokens: 2048
refactor_function: provider: "step" temperature: 0.3
plan_feature: provider: "sonnet" temperature: 0.5
design_architecture: provider: "opus" temperature: 0.2 require_approval: true
# Fallback configurationfallback: enabled: true provider: "openrouter" models: ["deepseek/deepseek-chat", "anthropic/claude-sonnet-4"] triggers: - rate_limit_exceeded - service_unavailable - timeoutSubagent Orchestration Pattern
The real power comes from how I route tasks between these tiers. I built a simple orchestrator that:
- Analyzes task complexity based on file count, line estimates, and dependencies
- Routes to appropriate provider automatically
- Reviews output through the orchestration layer for non-trivial tasks
- Tracks costs in real-time
Here’s the core orchestration logic:
from typing import Literalfrom dataclasses import dataclassfrom enum import Enumimport yaml
class Complexity(Enum): LOW = "low" # Deepseek, Step MEDIUM = "medium" # Sonnet HIGH = "high" # Opus
@dataclassclass Task: name: str description: str files_affected: int estimated_lines: int has_dependencies: bool security_sensitive: bool = False
class TaskRouter: def __init__(self, config_path: str = "~/.config/opencode/config.yaml"): with open(config_path) as f: self.config = yaml.safe_load(f) self.cost_tracker = CostTracker()
def analyze_complexity(self, task: Task) -> Complexity: """Determine task complexity based on multiple factors."""
# Security-sensitive tasks always require premium models if task.security_sensitive: return Complexity.HIGH
# Architectural changes (many files) if task.files_affected > 5: return Complexity.HIGH
# Multi-file with dependencies if task.has_dependencies and task.files_affected > 2: return Complexity.MEDIUM
# Large single-file changes if task.estimated_lines > 300: return Complexity.MEDIUM
return Complexity.LOW
def select_provider(self, complexity: Complexity) -> str: """Map complexity to provider.""" mapping = { Complexity.LOW: "deepseek", Complexity.MEDIUM: "sonnet", Complexity.HIGH: "opus" } return mapping[complexity]
def estimate_cost(self, task: Task, provider: str) -> float: """Estimate cost before execution.""" provider_config = self.config["providers"][provider]
# Rough token estimation (1 line ≈ 10 tokens for code) estimated_input_tokens = task.estimated_lines * 10 * 2 # Context + instruction estimated_output_tokens = task.estimated_lines * 10
input_cost = (estimated_input_tokens / 1_000_000) * provider_config["cost_per_1m_input"] output_cost = (estimated_output_tokens / 1_000_000) * provider_config["cost_per_1m_output"]
return input_cost + output_cost
def route(self, task: Task) -> dict: """Route task to appropriate provider with cost estimation."""
complexity = self.analyze_complexity(task) provider = self.select_provider(complexity) estimated_cost = self.estimate_cost(task, provider)
# Check budget constraints max_cost = self.config["routing"]["complexity_rules"][complexity.value]["max_cost_estimate"] if estimated_cost > max_cost: print(f"Warning: Estimated cost ${estimated_cost:.2f} exceeds budget ${max_cost:.2f}")
return { "provider": provider, "complexity": complexity.value, "estimated_cost": estimated_cost, "config": self.config["providers"][provider] }
class CostTracker: """Track spending across providers."""
def __init__(self, daily_budget: float = 5.00): self.daily_budget = daily_budget self.today_spent = 0.0 self.tasks = []
def log_task(self, task_name: str, provider: str, cost: float): from datetime import datetime
self.tasks.append({ "task": task_name, "provider": provider, "cost": cost, "timestamp": datetime.now().isoformat() }) self.today_spent += cost
# Alert at 80% budget if self.today_spent > self.daily_budget * 0.8: self._send_alert(f"Budget warning: ${self.today_spent:.2f} / ${self.daily_budget:.2f}")
def _send_alert(self, message: str): print(f"ALERT: {message}") # Could integrate with Slack, email, etc.
# Usage exampleif __name__ == "__main__": router = TaskRouter()
# Simple test generation test_task = Task( name="generate_unit_tests", description="Generate unit tests for auth.py", files_affected=1, estimated_lines=50, has_dependencies=False ) result = router.route(test_task) print(f"Task: {test_task.name}") print(f"Provider: {result['provider']}") print(f"Estimated cost: ${result['estimated_cost']:.4f}")
# Complex architecture task arch_task = Task( name="design_auth_system", description="Design authentication microservice architecture", files_affected=8, estimated_lines=600, has_dependencies=True, security_sensitive=True ) result = router.route(arch_task) print(f"\nTask: {arch_task.name}") print(f"Provider: {result['provider']}") print(f"Estimated cost: ${result['estimated_cost']:.4f}")Output:
Task: generate_unit_testsProvider: deepseekEstimated cost: $0.0018
Task: design_auth_systemProvider: opusEstimated cost: $0.1950Notice the cost difference: a simple test generation costs $0.0018, while an architectural decision costs $0.20. If I had used Opus for the test task, it would have cost $0.09 - 50x more expensive.
Token Optimization Techniques
Beyond provider routing, I also optimize how I use tokens within each tier:
1. Leverage Input Token Economics
Input tokens are dramatically cheaper than output tokens:
| Model | Input Cost (per 1M) | Output Cost (per 1M) | Ratio |
|---|---|---|---|
| Deepseek v3.2 | $0.27 | $1.10 | 1:4 |
| Claude Sonnet 4 | $3.00 | $15.00 | 1:5 |
| Claude Opus 4 | $15.00 | $75.00 | 1:5 |
This means I can afford to be verbose in my prompts but should request concise outputs.
# Good: Verbose input, concise output requestprompt = """Context: I'm building a REST API for user authentication.The API uses Express.js with TypeScript.The database is PostgreSQL with Prisma ORM.The user model has: id, email, password_hash, created_at, updated_at.
Task: Generate a middleware function that:1. Validates JWT tokens from Authorization header2. Extracts user ID from token payload3. Attaches user object to request4. Returns 401 for invalid/expired tokens
Constraints:- Use TypeScript with proper types- Handle edge cases (missing header, malformed token)- Keep it under 30 lines- Return only the code, no explanations"""# Input: ~150 tokens, Output: ~200 tokens# Cost with Deepseek: $0.00004 (input) + $0.00022 (output) = $0.00026
# Bad: Short input, verbose outputprompt = "Write JWT auth middleware for Express"# Input: ~10 tokens, Output: ~500 tokens (with explanations)# Cost with Deepseek: $0.000003 (input) + $0.00055 (output) = $0.00055The first approach costs half as much despite having 15x more input tokens, because output tokens dominate the cost equation.
2. Batch Related Queries
Instead of multiple separate API calls:
# Bad: 5 separate callsfor file in files: response = client.generate(f"Add docstrings to {file}")# Total: 5 calls, 5 separate contexts, 5x overhead
# Good: Single batched callbatched_prompt = f"""Add docstrings to each of these files. Return the result as a JSON objectwith filenames as keys and the docstring-enhanced content as values.
Files:{json.dumps({f: read_file(f) for f in files}, indent=2)}"""response = client.generate(batched_prompt)# Total: 1 call, shared context, single overhead3. Cache Frequently Used Context
For multi-file projects, I cache the project structure and reuse it:
import hashlibimport jsonfrom pathlib import Path
class ContextCache: def __init__(self, project_root: str): self.cache_file = Path(project_root) / ".opencode" / "context_cache.json" self.cache_file.parent.mkdir(exist_ok=True)
def get_project_context(self) -> str: """Cached project structure for context.""" if self.cache_file.exists(): cached = json.loads(self.cache_file.read_text()) current_hash = self._hash_project_structure() if cached["hash"] == current_hash: return cached["context"]
# Regenerate if structure changed context = self._generate_project_context() self.cache_file.write_text(json.dumps({ "hash": self._hash_project_structure(), "context": context })) return context
def _generate_project_context(self) -> str: """Generate concise project structure.""" # Only include file tree, not file contents # This saves ~80% of tokens vs reading all files ...
def _hash_project_structure(self) -> str: """Detect when structure changes.""" ...This reduced my average context size from 8,000 tokens to 2,000 tokens - a 75% reduction in input token costs.
Pay-As-You-Go vs Subscriptions
I initially tried subscriptions (Claude Pro, ChatGPT Plus), but they don’t work well for agent-based development because:
- Unused capacity: Some days I code for 12 hours, others for 2. Subscriptions don’t adjust.
- No transparency: I couldn’t see which tasks were expensive.
- Rate limits: Subscriptions often have hidden usage caps.
My Current Provider Setup
| Provider | Use Case | Pricing Model | Monthly Spend |
|---|---|---|---|
| Deepseek (direct) | 80% of tasks | PAYG | $8-15 |
| Anthropic (direct) | Orchestration & complex tasks | PAYG | $20-35 |
| Deep Infra | Kimi/Minimax models | PAYG | $5-10 |
| OpenRouter | Fallback when limits hit | PAYG | $2-5 |
| Total | $35-65 |
Compare this to my initial single-provider approach: $150-200/month. That’s a 70-80% savings.
Free Credits Strategy
I also leverage free credits strategically:
- Fireworks AI: Offers free credits for experimentation. I use this for testing new models before committing.
- Modal: GLM 5 is free through April 30th. I route specific tasks here to reduce Anthropic costs.
- Provider trials: Most providers offer $5-10 in free credits. I test routing strategies on these before using paid capacity.
Common Mistakes I Made (So You Don’t Have To)
Mistake 1: Defaulting to Premium Models
What I did: Used Claude Opus for every task because “it’s the best.”
Why it failed: Premium models are overkill for 80% of tasks. I was paying 150x more than necessary for boilerplate code.
The fix: Always ask “Does this task require Opus, or will Deepseek suffice?” before routing.
Mistake 2: Ignoring Token Costs in Prompts
What I did: Wrote long, conversational prompts without considering token counts.
Why it failed: Verbose prompts increased both input and output token usage.
The fix:
- Use structured formats (JSON, YAML) instead of prose
- Be concise in output requests
- Cache shared context
Mistake 3: No Cost Tracking
What I did: Didn’t monitor spending until I hit budget limits.
Why it failed: No visibility into which tasks were expensive.
The fix: Built real-time cost tracking with alerts at 50%, 80%, and 100% of daily budget.
Mistake 4: Skipping Orchestration Layer
What I did: Routed everything to cheap models to save money.
Why it failed: Quality suffered. Inconsistent code, security issues, poor practices.
The fix: Always have Sonnet review Deepseek outputs for non-trivial tasks. The review costs are minimal compared to fixing issues later.
Mistake 5: Single Provider Dependency
What I did: Relied entirely on one provider for everything.
Why it failed: When they had an outage, I was dead in the water.
The fix: OpenRouter as a backup, plus direct accounts with multiple providers.
Real-World Results
After implementing this multi-provider architecture for a month:
| Metric | Before (Opus Only) | After (Multi-Provider) | Change |
|---|---|---|---|
| Monthly spend | $180 | $52 | -71% |
| Tasks completed | 847 | 832 | -2% |
| Average task cost | $0.21 | $0.06 | -71% |
| Quality (pass rate) | 94% | 96% | +2% |
| Daily budget hits | 12/month | 1/month | -92% |
The slight decrease in tasks completed is because I’m more intentional about what I ask agents to do. The quality improvement comes from the orchestration layer catching issues early.
Setting Up Your Own Multi-Provider System
Here’s a minimal setup to get started:
1. Install OpenCode CLI
# macOS/Linuxcurl -fsSL https://opencode.dev/install.sh | sh
# Create config directorymkdir -p ~/.config/opencode2. Create Basic Configuration
cat > ~/.config/opencode/config.yaml << 'EOF'providers: deepseek: model: "deepseek/deepseek-chat-v3-0324" api_key: "${DEEPSEEK_API_KEY}" use_for: ["default", "tests", "docs"]
sonnet: model: "anthropic/claude-sonnet-4-20250514" api_key: "${ANTHROPIC_API_KEY}" use_for: ["planning", "review"]
routing: default_provider: "deepseek" task_mapping: plan_feature: "sonnet" generate_tests: "deepseek"
fallback: enabled: true provider: "openrouter"EOF3. Set Environment Variables
# Add to ~/.zshrc or ~/.bashrcexport DEEPSEEK_API_KEY="your-key-here"export ANTHROPIC_API_KEY="your-key-here"export OPENROUTER_API_KEY="your-key-here"4. Test the Setup
# Simple task (should route to Deepseek)opencode "Generate a Python function to validate email addresses"
# Complex task (should route to Sonnet)opencode "Plan the architecture for a microservices-based e-commerce system"5. Monitor Costs
# Check spendingopencode costs --today
# View task breakdownopencode costs --breakdown --by-providerWhen to Break the Rules
There are times when I override my routing logic:
-
Critical production bugs: I use Opus directly for debugging production issues where speed and accuracy matter more than cost.
-
New domain exploration: When learning a completely new framework or language, I use Sonnet for better explanations.
-
Team collaboration: For code that others will read/modify, I use Sonnet for generation + Opus for review to ensure clarity.
-
Time pressure: When deadlines are tight, I optimize for speed over cost.
The key is being intentional about these overrides, not defaulting to them.
Key Takeaways
-
Match model to task complexity: Don’t use Opus for boilerplate code.
-
Input tokens are cheap, output tokens are expensive: Optimize for concise outputs.
-
PAYG > Subscriptions for variable workloads: Pay for what you use, not for capacity you don’t need.
-
Always have orchestration review: Cheap models need quality checks.
-
Track costs obsessively: You can’t optimize what you don’t measure.
-
Use free credits strategically: Test new approaches on free tiers before committing paid capacity.
The multi-provider approach isn’t about being cheap - it’s about being strategic. I still spend $50-70/month on AI coding, but I get more done with better quality than when I was spending $200/month on a single provider.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments