How to Set Up Cost Optimization in OpenClaw: Reduce API Costs by Up to 90%
I opened my Anthropic billing dashboard and stared at a $400 charge for the month. I’d been running OpenClaw for three weeks. The culprit? Every single task was routing through Claude Opus.
Formatting a five-line JSON? Opus. Extracting email addresses from text? Opus. Summarizing a 200-word document? Opus.
I was using a Ferrari to deliver pizza.
The Core Problem
Most AI applications default to the most capable model for everything. OpenClaw is no exception out of the box. Without configuration, you get:
- Simple tasks burning expensive model tokens
- No fallback strategy when cheaper models could handle the job
- Zero visibility into where your money actually goes
- One-size-fits-all defaults that maximize costs
A user on r/openclaw put it plainly: “Add a cost optimization routing for your models AI to reduce the cost of API usage up to 90%.”
But how? The documentation showed what was possible, not how to actually configure it.
Understanding Model Tiers
Before diving into configuration, I needed to understand what I was paying for:
Claude 3 Haiku: $0.00025/1K input, $0.00125/1K output (cheapest)Claude 3 Sonnet: $0.003/1K input, $0.015/1K output (balanced)Claude 3 Opus: $0.015/1K input, $0.075/1K output (most capable)
Price ratio: Opus is 60x more expensive than Haiku for input tokensThe math was brutal. A task using 10,000 input tokens:
- Haiku: $0.0025
- Opus: $0.15
Same task, 60x price difference.
The question became: which tasks actually need Opus?
Classifying Task Complexity
I analyzed my OpenClaw usage over two weeks. Here’s what I found:
70% of requests: Simple formatting, extraction, classification20% of requests: Analysis, writing, code review10% of requests: Complex reasoning, architecture, debugging
But 100% of requests: Routed through OpusI was spending Opus money on Haiku work.
Simple Tasks (Haiku Territory)
- Formatting text as JSON, Markdown, lists
- Extracting emails, URLs, names
- Classifying short text into categories
- Simple summaries under 200 words
- Basic data transformation
Medium Tasks (Sonnet Territory)
- Writing original content
- Code review and suggestions
- Longer summaries and analysis
- Multi-step transformations
- Translations with context
Complex Tasks (Opus Territory)
- Architectural decisions
- Debugging complex systems
- Research and synthesis
- Nuanced reasoning
- Creative problem-solving
Setting Up Routing Rules
OpenClaw allows configuring model routing based on task type. Here’s the configuration that worked:
routing_rules: simple_tasks: pattern: ["format", "extract", "classify", "summarize_short", "list", "count"] model: "claude-3-haiku" fallback: "claude-3-sonnet" max_tokens: 1000
medium_tasks: pattern: ["analyze", "compare", "transform", "summarize_long", "write", "review"] model: "claude-3-sonnet" fallback: "claude-3-opus" max_tokens: 4000
complex_tasks: pattern: ["reason", "architect", "debug", "research", "design", "synthesize"] model: "claude-3-opus" fallback: null max_tokens: 8000The key insight: each task type has a fallback model. If Haiku struggles with a simple task, Sonnet takes over. If Sonnet can’t handle medium complexity, Opus steps in.
Implementing the Cost Optimizer
Configuration is one thing. Making it work required actual code. Here’s my implementation:
from dataclasses import dataclassfrom typing import Optional, Listfrom enum import Enumimport loggingfrom datetime import datetime
class TaskComplexity(Enum): SIMPLE = "simple" MEDIUM = "medium" COMPLEX = "complex"
@dataclassclass ModelConfig: model_id: str cost_per_1k_input: float cost_per_1k_output: float max_context: int
class CostOptimizer: def __init__(self): self.models = { "haiku": ModelConfig( model_id="claude-3-haiku", cost_per_1k_input=0.00025, cost_per_1k_output=0.00125, max_context=200000 ), "sonnet": ModelConfig( model_id="claude-3-sonnet", cost_per_1k_input=0.003, cost_per_1k_output=0.015, max_context=200000 ), "opus": ModelConfig( model_id="claude-3-opus", cost_per_1k_input=0.015, cost_per_1k_output=0.075, max_context=200000 ) } self.cost_tracker = CostTracker()
def classify_task(self, prompt: str, task_hint: Optional[str] = None) -> TaskComplexity: """Classify task complexity based on prompt content and hints.""" simple_keywords = ["format", "extract", "classify", "list", "count"] complex_keywords = ["analyze", "design", "debug", "reason", "architect"]
prompt_lower = prompt.lower()
# Use explicit hint if provided if task_hint: if task_hint in simple_keywords: return TaskComplexity.SIMPLE elif task_hint in complex_keywords: return TaskComplexity.COMPLEX
# Heuristic: word count + keyword matching word_count = len(prompt.split()) if word_count < 50 and any(kw in prompt_lower for kw in simple_keywords): return TaskComplexity.SIMPLE elif word_count > 500 or any(kw in prompt_lower for kw in complex_keywords): return TaskComplexity.COMPLEX
return TaskComplexity.MEDIUM
def get_model_chain(self, complexity: TaskComplexity) -> List[str]: """Return model chain with fallbacks for given complexity.""" chains = { TaskComplexity.SIMPLE: ["haiku", "sonnet"], TaskComplexity.MEDIUM: ["sonnet", "opus"], TaskComplexity.COMPLEX: ["opus"] } return chains[complexity]
async def complete(self, prompt: str, task_hint: Optional[str] = None) -> str: """Get completion with automatic model selection and fallback.""" complexity = self.classify_task(prompt, task_hint) model_chain = self.get_model_chain(complexity)
for model_name in model_chain: model = self.models[model_name] try: response = await self._call_model(model.model_id, prompt)
# Track actual costs input_tokens = self._count_tokens(prompt) output_tokens = self._count_tokens(response) cost = self._calculate_cost(model, input_tokens, output_tokens) self.cost_tracker.record(model_name, cost)
return response except Exception as e: logging.warning(f"Model {model_name} failed, trying fallback: {e}") continue
raise RuntimeError("All models failed to respond")
def _calculate_cost(self, model: ModelConfig, input_tokens: int, output_tokens: int) -> float: """Calculate the cost of an API call.""" input_cost = (input_tokens / 1000) * model.cost_per_1k_input output_cost = (output_tokens / 1000) * model.cost_per_1k_output return input_cost + output_cost
def _count_tokens(self, text: str) -> int: """Approximate token count (use proper tokenizer in production).""" return len(text.split()) * 1.3 # Rough approximationThe fallback logic is crucial. If Haiku returns a response that’s obviously wrong or incomplete, the system automatically retries with Sonnet. This catches edge cases where simple heuristics misclassify tasks.
Adding Cost Visibility
I couldn’t optimize what I couldn’t measure. I added a tracking layer:
class CostTracker: def __init__(self): self.history = []
def record(self, model: str, cost: float, prompt_tokens: int, completion_tokens: int): self.history.append({ "timestamp": datetime.now(), "model": model, "cost": cost, "prompt_tokens": prompt_tokens, "completion_tokens": completion_tokens })
def get_daily_spend(self) -> float: today = datetime.now().date() return sum(h["cost"] for h in self.history if h["timestamp"].date() == today)
def get_model_distribution(self) -> dict: """Show cost breakdown by model.""" distribution = {} for h in self.history: model = h["model"] distribution[model] = distribution.get(model, 0) + h["cost"] return distribution
def get_savings_estimate(self) -> dict: """Calculate savings vs. using Opus for everything.""" total_tokens = sum(h["prompt_tokens"] + h["completion_tokens"] for h in self.history)
# What it would have cost with Opus only opus_input_rate = 0.015 opus_output_rate = 0.075 hypothetical_opus_cost = sum( (h["prompt_tokens"] / 1000 * opus_input_rate) + (h["completion_tokens"] / 1000 * opus_output_rate) for h in self.history )
actual_cost = sum(h["cost"] for h in self.history) savings = hypothetical_opus_cost - actual_cost savings_percent = (savings / hypothetical_opus_cost) * 100 if hypothetical_opus_cost > 0 else 0
return { "actual_cost": actual_cost, "hypothetical_opus_cost": hypothetical_opus_cost, "savings": savings, "savings_percent": savings_percent }Setting Up Alerts
Cost optimization without monitoring is wishful thinking. I added threshold alerts:
monitoring: enabled: true alert_thresholds: daily_spend: 10.00 # Alert if daily spend exceeds $10 weekly_spend: 50.00 # Alert if weekly spend exceeds $50 monthly_spend: 200.00 # Alert if monthly spend exceeds $200 anomaly_detection: true # Alert on unusual spending patterns notifications: - webhook: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL" report_frequency: "daily"The anomaly detection caught a runaway agent that was making 500 unnecessary API calls per hour. Without it, I would have discovered this on my next billing cycle.
The Results After 30 Days
With routing configured and monitoring in place, here’s what happened:
Before Optimization:- Total monthly spend: $487- Opus calls: 100%- Average cost per request: $0.48
After Optimization:- Total monthly spend: $67- Haiku calls: 68%- Sonnet calls: 22%- Opus calls: 10%- Average cost per request: $0.067
Savings: 86% reduction in API costsThe 10% of requests that still use Opus? Those are the complex tasks where Opus is genuinely needed. The other 90% were overkill.
Common Mistakes I Made
Mistake 1: Over-Engineering the Classifier
I initially built a machine learning classifier to detect task complexity. It added latency and complexity without improving accuracy over simple keyword matching.
The heuristic approach (word count + keywords) works well enough. Start simple, add complexity only when you hit edge cases.
Mistake 2: Forgetting About Fallbacks
My first configuration had no fallbacks. When Haiku failed on an edge case, the whole request failed.
Now every tier has a fallback. Simple tasks try Haiku first, Sonnet if Haiku struggles. Medium tasks start with Sonnet, escalate to Opus only if needed.
Mistake 3: Ignoring Token Counting
I skipped accurate token counting and used word count * 1.3 as an approximation. This led to inaccurate cost tracking.
Use a proper tokenizer:
from anthropic import Anthropic
client = Anthropic()
def count_tokens(text: str) -> int: """Accurate token counting using Anthropic's tokenizer.""" return client.count_tokens(text)Mistake 4: Set-and-Forget Configuration
I configured routing once and stopped thinking about it. Bad idea.
Usage patterns change. New use cases emerge. Costs creep up.
I now review the cost distribution weekly and adjust routing rules monthly. Optimization is a process, not a one-time setup.
A Practical Checklist
If you’re setting up cost optimization in OpenClaw, do these in order:
- Analyze your current API usage for 7 days
- Classify requests into simple/medium/complex buckets
- Configure routing rules based on task patterns
- Implement fallback chains for each complexity tier
- Add cost tracking with accurate token counting
- Set up spending alerts at daily/weekly/monthly thresholds
- Review cost distribution weekly
- Adjust routing rules based on actual usage data
The Bottom Line
Cost optimization in OpenClaw isn’t about being cheap. It’s about using the right tool for the job.
Haiku handles formatting. Sonnet handles analysis. Opus handles architecture. Routing tasks to appropriate models reduces costs by 80-90% while maintaining quality.
The configuration takes a few hours. The savings compound every month. And you’ll finally understand where your API budget actually goes.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments