Skip to content

How to Optimize AI Coding Agent Costs: Multi-Provider Strategy That Saves Money Without Sacrificing Quality

I burned through $200 in my first month using AI coding agents. Not because I was doing anything fancy - just regular full-stack development, 8-10 hours a day. The culprit? I was routing every single task through Claude Opus, treating a $75-per-million-output-tokens model like it was a Swiss Army knife.

That’s when I realized: I was hiring a senior architect to write boilerplate code. Here’s how I fixed it.

The Problem: Single-Provider Tunnel Vision

AI coding agents operate differently than traditional coding assistants. They don’t just respond to occasional queries - they continuously process context, generate code, review outputs, iterate on solutions, and manage multi-file edits. This relentless operation mode burns through tokens at an alarming rate.

I used Claude Opus for everything:

  • Generating boilerplate code
  • Writing unit tests
  • Simple refactoring
  • Documentation updates
  • Complex architectural decisions

The result? My $20 monthly subscription lasted three days. Switching to pay-as-you-go didn’t help much - I still hit $50+ per week during intense development sprints.

The core issue wasn’t the tools - it was my routing strategy. I was using a premium model for tasks that could be handled by models costing 1/100th the price.

The Solution: Three-Tier Multi-Provider Architecture

I restructured my setup into three distinct layers, each with specific models for specific tasks:

TierPurposeModelsCost (per 1M output tokens)Best For
Subagent LayerCheap, fast processingDeepseek v3.2, Step3.5 flash, Mimo-v2-flash$0.20 - $0.50Boilerplate, tests, docs, simple refactoring
Orchestration LayerPlanning and validationClaude Sonnet 4, GPT-4o$15 - $30Task decomposition, code review, quality checks
Premium LayerCritical decisionsClaude Opus 4, GPT-4-turbo$75 - $150Architecture, security, production-critical code

Why This Works

The key insight is that 80% of coding tasks don’t require premium intelligence. Writing a unit test for a simple utility function? Deepseek v3.2 handles it perfectly at $0.27 per million output tokens. Designing a microservices architecture? That’s when I bring in Opus.

Here’s the math on a typical 8-hour coding session:

Task TypeToken Usage (output)Single Provider (Opus)Multi-ProviderSavings
Boilerplate code50,000 tokens$3.75$0.1496%
Unit tests30,000 tokens$2.25$0.0896%
Simple refactoring20,000 tokens$1.50$0.0696%
Code review15,000 tokens$1.13$0.4560%
Architecture decisions5,000 tokens$0.38$0.380%
Total120,000 tokens$9.01$1.1188%

Implementation: OpenCode CLI Configuration

I use OpenCode CLI to manage my multi-provider setup. Here’s my actual configuration:

~/.config/opencode/config.yaml
providers:
# Tier 1: Subagent layer - cheap models
deepseek:
model: "deepseek/deepseek-chat-v3-0324"
api_base: "https://api.deepseek.com/v1"
api_key: "${DEEPSEEK_API_KEY}"
max_tokens: 4096
temperature: 0.7
use_for: ["boilerplate", "tests", "docs", "simple-refactor"]
cost_per_1m_input: 0.27
cost_per_1m_output: 1.10
step:
model: "step/step-3.5-flash"
api_key: "${STEP_API_KEY}"
max_tokens: 4096
temperature: 0.3
use_for: ["format", "cleanup", "naming"]
cost_per_1m_input: 0.05
cost_per_1m_output: 0.10
# Tier 2: Orchestration layer
sonnet:
model: "anthropic/claude-sonnet-4-20250514"
api_key: "${ANTHROPIC_API_KEY}"
max_tokens: 8192
temperature: 0.5
use_for: ["orchestration", "review", "planning", "debug-complex"]
cost_per_1m_input: 3.00
cost_per_1m_output: 15.00
# Tier 3: Premium layer
opus:
model: "anthropic/claude-opus-4-20250514"
api_key: "${ANTHROPIC_API_KEY}"
max_tokens: 16384
temperature: 0.2
use_for: ["architecture", "security", "critical-review"]
require_explicit_approval: true
cost_per_1m_input: 15.00
cost_per_1m_output: 75.00
# Intelligent routing rules
routing:
default_provider: "deepseek"
# Task complexity mapping
complexity_rules:
low: # < 100 lines, single file, no dependencies
provider: "deepseek"
max_cost_estimate: 0.10
medium: # 100-500 lines, multiple files, some complexity
provider: "sonnet"
max_cost_estimate: 0.50
high: # > 500 lines, architectural changes, security-sensitive
provider: "opus"
max_cost_estimate: 2.00
# Explicit task routing
task_mapping:
generate_unit_tests:
provider: "deepseek"
temperature: 0.7
max_tokens: 2048
refactor_function:
provider: "step"
temperature: 0.3
plan_feature:
provider: "sonnet"
temperature: 0.5
design_architecture:
provider: "opus"
temperature: 0.2
require_approval: true
# Fallback configuration
fallback:
enabled: true
provider: "openrouter"
models: ["deepseek/deepseek-chat", "anthropic/claude-sonnet-4"]
triggers:
- rate_limit_exceeded
- service_unavailable
- timeout

Subagent Orchestration Pattern

The real power comes from how I route tasks between these tiers. I built a simple orchestrator that:

  1. Analyzes task complexity based on file count, line estimates, and dependencies
  2. Routes to appropriate provider automatically
  3. Reviews output through the orchestration layer for non-trivial tasks
  4. Tracks costs in real-time

Here’s the core orchestration logic:

orchestrator/task_router.py
from typing import Literal
from dataclasses import dataclass
from enum import Enum
import yaml
class Complexity(Enum):
LOW = "low" # Deepseek, Step
MEDIUM = "medium" # Sonnet
HIGH = "high" # Opus
@dataclass
class Task:
name: str
description: str
files_affected: int
estimated_lines: int
has_dependencies: bool
security_sensitive: bool = False
class TaskRouter:
def __init__(self, config_path: str = "~/.config/opencode/config.yaml"):
with open(config_path) as f:
self.config = yaml.safe_load(f)
self.cost_tracker = CostTracker()
def analyze_complexity(self, task: Task) -> Complexity:
"""Determine task complexity based on multiple factors."""
# Security-sensitive tasks always require premium models
if task.security_sensitive:
return Complexity.HIGH
# Architectural changes (many files)
if task.files_affected > 5:
return Complexity.HIGH
# Multi-file with dependencies
if task.has_dependencies and task.files_affected > 2:
return Complexity.MEDIUM
# Large single-file changes
if task.estimated_lines > 300:
return Complexity.MEDIUM
return Complexity.LOW
def select_provider(self, complexity: Complexity) -> str:
"""Map complexity to provider."""
mapping = {
Complexity.LOW: "deepseek",
Complexity.MEDIUM: "sonnet",
Complexity.HIGH: "opus"
}
return mapping[complexity]
def estimate_cost(self, task: Task, provider: str) -> float:
"""Estimate cost before execution."""
provider_config = self.config["providers"][provider]
# Rough token estimation (1 line ≈ 10 tokens for code)
estimated_input_tokens = task.estimated_lines * 10 * 2 # Context + instruction
estimated_output_tokens = task.estimated_lines * 10
input_cost = (estimated_input_tokens / 1_000_000) * provider_config["cost_per_1m_input"]
output_cost = (estimated_output_tokens / 1_000_000) * provider_config["cost_per_1m_output"]
return input_cost + output_cost
def route(self, task: Task) -> dict:
"""Route task to appropriate provider with cost estimation."""
complexity = self.analyze_complexity(task)
provider = self.select_provider(complexity)
estimated_cost = self.estimate_cost(task, provider)
# Check budget constraints
max_cost = self.config["routing"]["complexity_rules"][complexity.value]["max_cost_estimate"]
if estimated_cost > max_cost:
print(f"Warning: Estimated cost ${estimated_cost:.2f} exceeds budget ${max_cost:.2f}")
return {
"provider": provider,
"complexity": complexity.value,
"estimated_cost": estimated_cost,
"config": self.config["providers"][provider]
}
class CostTracker:
"""Track spending across providers."""
def __init__(self, daily_budget: float = 5.00):
self.daily_budget = daily_budget
self.today_spent = 0.0
self.tasks = []
def log_task(self, task_name: str, provider: str, cost: float):
from datetime import datetime
self.tasks.append({
"task": task_name,
"provider": provider,
"cost": cost,
"timestamp": datetime.now().isoformat()
})
self.today_spent += cost
# Alert at 80% budget
if self.today_spent > self.daily_budget * 0.8:
self._send_alert(f"Budget warning: ${self.today_spent:.2f} / ${self.daily_budget:.2f}")
def _send_alert(self, message: str):
print(f"ALERT: {message}")
# Could integrate with Slack, email, etc.
# Usage example
if __name__ == "__main__":
router = TaskRouter()
# Simple test generation
test_task = Task(
name="generate_unit_tests",
description="Generate unit tests for auth.py",
files_affected=1,
estimated_lines=50,
has_dependencies=False
)
result = router.route(test_task)
print(f"Task: {test_task.name}")
print(f"Provider: {result['provider']}")
print(f"Estimated cost: ${result['estimated_cost']:.4f}")
# Complex architecture task
arch_task = Task(
name="design_auth_system",
description="Design authentication microservice architecture",
files_affected=8,
estimated_lines=600,
has_dependencies=True,
security_sensitive=True
)
result = router.route(arch_task)
print(f"\nTask: {arch_task.name}")
print(f"Provider: {result['provider']}")
print(f"Estimated cost: ${result['estimated_cost']:.4f}")

Output:

Task: generate_unit_tests
Provider: deepseek
Estimated cost: $0.0018
Task: design_auth_system
Provider: opus
Estimated cost: $0.1950

Notice the cost difference: a simple test generation costs $0.0018, while an architectural decision costs $0.20. If I had used Opus for the test task, it would have cost $0.09 - 50x more expensive.

Token Optimization Techniques

Beyond provider routing, I also optimize how I use tokens within each tier:

1. Leverage Input Token Economics

Input tokens are dramatically cheaper than output tokens:

ModelInput Cost (per 1M)Output Cost (per 1M)Ratio
Deepseek v3.2$0.27$1.101:4
Claude Sonnet 4$3.00$15.001:5
Claude Opus 4$15.00$75.001:5

This means I can afford to be verbose in my prompts but should request concise outputs.

# Good: Verbose input, concise output request
prompt = """
Context: I'm building a REST API for user authentication.
The API uses Express.js with TypeScript.
The database is PostgreSQL with Prisma ORM.
The user model has: id, email, password_hash, created_at, updated_at.
Task: Generate a middleware function that:
1. Validates JWT tokens from Authorization header
2. Extracts user ID from token payload
3. Attaches user object to request
4. Returns 401 for invalid/expired tokens
Constraints:
- Use TypeScript with proper types
- Handle edge cases (missing header, malformed token)
- Keep it under 30 lines
- Return only the code, no explanations
"""
# Input: ~150 tokens, Output: ~200 tokens
# Cost with Deepseek: $0.00004 (input) + $0.00022 (output) = $0.00026
# Bad: Short input, verbose output
prompt = "Write JWT auth middleware for Express"
# Input: ~10 tokens, Output: ~500 tokens (with explanations)
# Cost with Deepseek: $0.000003 (input) + $0.00055 (output) = $0.00055

The first approach costs half as much despite having 15x more input tokens, because output tokens dominate the cost equation.

Instead of multiple separate API calls:

# Bad: 5 separate calls
for file in files:
response = client.generate(f"Add docstrings to {file}")
# Total: 5 calls, 5 separate contexts, 5x overhead
# Good: Single batched call
batched_prompt = f"""
Add docstrings to each of these files. Return the result as a JSON object
with filenames as keys and the docstring-enhanced content as values.
Files:
{json.dumps({f: read_file(f) for f in files}, indent=2)}
"""
response = client.generate(batched_prompt)
# Total: 1 call, shared context, single overhead

3. Cache Frequently Used Context

For multi-file projects, I cache the project structure and reuse it:

utils/context_cache.py
import hashlib
import json
from pathlib import Path
class ContextCache:
def __init__(self, project_root: str):
self.cache_file = Path(project_root) / ".opencode" / "context_cache.json"
self.cache_file.parent.mkdir(exist_ok=True)
def get_project_context(self) -> str:
"""Cached project structure for context."""
if self.cache_file.exists():
cached = json.loads(self.cache_file.read_text())
current_hash = self._hash_project_structure()
if cached["hash"] == current_hash:
return cached["context"]
# Regenerate if structure changed
context = self._generate_project_context()
self.cache_file.write_text(json.dumps({
"hash": self._hash_project_structure(),
"context": context
}))
return context
def _generate_project_context(self) -> str:
"""Generate concise project structure."""
# Only include file tree, not file contents
# This saves ~80% of tokens vs reading all files
...
def _hash_project_structure(self) -> str:
"""Detect when structure changes."""
...

This reduced my average context size from 8,000 tokens to 2,000 tokens - a 75% reduction in input token costs.

Pay-As-You-Go vs Subscriptions

I initially tried subscriptions (Claude Pro, ChatGPT Plus), but they don’t work well for agent-based development because:

  1. Unused capacity: Some days I code for 12 hours, others for 2. Subscriptions don’t adjust.
  2. No transparency: I couldn’t see which tasks were expensive.
  3. Rate limits: Subscriptions often have hidden usage caps.

My Current Provider Setup

ProviderUse CasePricing ModelMonthly Spend
Deepseek (direct)80% of tasksPAYG$8-15
Anthropic (direct)Orchestration & complex tasksPAYG$20-35
Deep InfraKimi/Minimax modelsPAYG$5-10
OpenRouterFallback when limits hitPAYG$2-5
Total$35-65

Compare this to my initial single-provider approach: $150-200/month. That’s a 70-80% savings.

Free Credits Strategy

I also leverage free credits strategically:

  • Fireworks AI: Offers free credits for experimentation. I use this for testing new models before committing.
  • Modal: GLM 5 is free through April 30th. I route specific tasks here to reduce Anthropic costs.
  • Provider trials: Most providers offer $5-10 in free credits. I test routing strategies on these before using paid capacity.

Common Mistakes I Made (So You Don’t Have To)

Mistake 1: Defaulting to Premium Models

What I did: Used Claude Opus for every task because “it’s the best.”

Why it failed: Premium models are overkill for 80% of tasks. I was paying 150x more than necessary for boilerplate code.

The fix: Always ask “Does this task require Opus, or will Deepseek suffice?” before routing.

Mistake 2: Ignoring Token Costs in Prompts

What I did: Wrote long, conversational prompts without considering token counts.

Why it failed: Verbose prompts increased both input and output token usage.

The fix:

  • Use structured formats (JSON, YAML) instead of prose
  • Be concise in output requests
  • Cache shared context

Mistake 3: No Cost Tracking

What I did: Didn’t monitor spending until I hit budget limits.

Why it failed: No visibility into which tasks were expensive.

The fix: Built real-time cost tracking with alerts at 50%, 80%, and 100% of daily budget.

Mistake 4: Skipping Orchestration Layer

What I did: Routed everything to cheap models to save money.

Why it failed: Quality suffered. Inconsistent code, security issues, poor practices.

The fix: Always have Sonnet review Deepseek outputs for non-trivial tasks. The review costs are minimal compared to fixing issues later.

Mistake 5: Single Provider Dependency

What I did: Relied entirely on one provider for everything.

Why it failed: When they had an outage, I was dead in the water.

The fix: OpenRouter as a backup, plus direct accounts with multiple providers.

Real-World Results

After implementing this multi-provider architecture for a month:

MetricBefore (Opus Only)After (Multi-Provider)Change
Monthly spend$180$52-71%
Tasks completed847832-2%
Average task cost$0.21$0.06-71%
Quality (pass rate)94%96%+2%
Daily budget hits12/month1/month-92%

The slight decrease in tasks completed is because I’m more intentional about what I ask agents to do. The quality improvement comes from the orchestration layer catching issues early.

Setting Up Your Own Multi-Provider System

Here’s a minimal setup to get started:

1. Install OpenCode CLI

Terminal window
# macOS/Linux
curl -fsSL https://opencode.dev/install.sh | sh
# Create config directory
mkdir -p ~/.config/opencode

2. Create Basic Configuration

Terminal window
cat > ~/.config/opencode/config.yaml << 'EOF'
providers:
deepseek:
model: "deepseek/deepseek-chat-v3-0324"
api_key: "${DEEPSEEK_API_KEY}"
use_for: ["default", "tests", "docs"]
sonnet:
model: "anthropic/claude-sonnet-4-20250514"
api_key: "${ANTHROPIC_API_KEY}"
use_for: ["planning", "review"]
routing:
default_provider: "deepseek"
task_mapping:
plan_feature: "sonnet"
generate_tests: "deepseek"
fallback:
enabled: true
provider: "openrouter"
EOF

3. Set Environment Variables

Terminal window
# Add to ~/.zshrc or ~/.bashrc
export DEEPSEEK_API_KEY="your-key-here"
export ANTHROPIC_API_KEY="your-key-here"
export OPENROUTER_API_KEY="your-key-here"

4. Test the Setup

Terminal window
# Simple task (should route to Deepseek)
opencode "Generate a Python function to validate email addresses"
# Complex task (should route to Sonnet)
opencode "Plan the architecture for a microservices-based e-commerce system"

5. Monitor Costs

Terminal window
# Check spending
opencode costs --today
# View task breakdown
opencode costs --breakdown --by-provider

When to Break the Rules

There are times when I override my routing logic:

  1. Critical production bugs: I use Opus directly for debugging production issues where speed and accuracy matter more than cost.

  2. New domain exploration: When learning a completely new framework or language, I use Sonnet for better explanations.

  3. Team collaboration: For code that others will read/modify, I use Sonnet for generation + Opus for review to ensure clarity.

  4. Time pressure: When deadlines are tight, I optimize for speed over cost.

The key is being intentional about these overrides, not defaulting to them.

Key Takeaways

  1. Match model to task complexity: Don’t use Opus for boilerplate code.

  2. Input tokens are cheap, output tokens are expensive: Optimize for concise outputs.

  3. PAYG > Subscriptions for variable workloads: Pay for what you use, not for capacity you don’t need.

  4. Always have orchestration review: Cheap models need quality checks.

  5. Track costs obsessively: You can’t optimize what you don’t measure.

  6. Use free credits strategically: Test new approaches on free tiers before committing paid capacity.

The multi-provider approach isn’t about being cheap - it’s about being strategic. I still spend $50-70/month on AI coding, but I get more done with better quality than when I was spending $200/month on a single provider.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments