How to Configure OpenRouter with Hermes Agent Without Burning Credits
€15. Gone in 48 hours.
I was testing my Hermes agent setup with OpenRouter as the backend. Two days later, my credits were depleted. I checked my usage stats: 72 million tokens for just testing the configuration.
This wasn’t a bug. It was my architecture.
Why Credits Disappear So Fast
Hermes agents don’t make single API calls. Each task triggers a cascade:
┌─────────────────────────────────────────────────────────────┐│ ││ Task Request ││ │ ││ ▼ ││ ┌─────────────┐ ││ │ Planning │ 1 call (~10k tokens) ││ └─────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────┐ ││ │ Tool Selection │ 1-3 calls (~5k each) ││ └─────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────┐ ││ │ Execution Loop │ 2-5 calls (~8k each) ││ └─────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────┐ ││ │ Result Synthesis │ 1-2 calls (~6k each) ││ └─────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────┐ ││ │ Error Recovery │ Additional calls if needed ││ └─────────────────────┘ ││ ││ Total per task: 5-15 calls = 50-150k tokens ││ │└─────────────────────────────────────────────────────────────┘At OpenRouter prices, this compounds rapidly:
┌─────────────────────────────────────────────────────────────┐│ ││ 1 task = 5-15 calls = 50-150k tokens ││ ││ 10 tasks/day = 500k-1.5M tokens ││ ││ Using Claude-3.5-Sonnet via OpenRouter: ││ Input: $3/M tokens, Output: $15/M tokens ││ ││ Daily cost: $0.50 - $3.00 ││ Monthly cost: $15 - $90 ││ ││ My reality: €15 in 2 days (testing = worst case) ││ │└─────────────────────────────────────────────────────────────┘I was using premium models for every step. That was mistake #1.
The Fix: Tiered Model Routing
The solution is to match model cost to task complexity:
┌─────────────────────────────────────────────────────────────┐│ ││ Complexity Level │ Model Tier │ Example Models ││ ───────────────────────────────────────────────────────── ││ 1-3 (Simple) │ Free/Budget │ Gemini-Flash, ││ │ │ GPT-3.5-Turbo ││ ───────────────────────────────────────────────────────── ││ 4-6 (Moderate) │ Mid-tier │ Claude-Haiku, ││ │ │ Llama-3 ││ ───────────────────────────────────────────────────────── ││ 7-10 (Complex) │ Premium │ Claude-Sonnet, ││ │ │ GPT-4o ││ │└─────────────────────────────────────────────────────────────┘Here’s my optimized Hermes configuration:
provider: openrouter
# Model routing by task typemodel_routing: # Planning step: Use fast, cheap models planning: model: "google/gemini-flash-1.5" fallback: "openai/gpt-3.5-turbo" max_tokens: 8000
# Tool selection: Mid-tier for reliability tool_selection: model: "anthropic/claude-3-haiku" fallback: "meta-llama/llama-3.1-8b-instruct" max_tokens: 4000
# Execution reasoning: Depends on task execution: simple_tasks: model: "google/gemini-flash-1.5" trigger: complexity_score < 4 moderate_tasks: model: "anthropic/claude-3-haiku" trigger: complexity_score >= 4 and complexity_score < 7 complex_tasks: model: "anthropic/claude-3.5-sonnet" trigger: complexity_score >= 7
# Result synthesis: Keep it cheap synthesis: model: "openai/gpt-3.5-turbo" fallback: "google/gemini-flash-1.5"
# Budget controls (critical!)budget_controls: daily_cap: 2.00 # EUR monthly_cap: 30.00 # EUR alert_threshold: 0.80 # Alert at 80% of cap
# Automatic fallback when capped fallback_provider: "ollama" fallback_model: "llama3.1:8b"
# Rate limit handlingrate_limits: max_retries: 3 retry_delay: 30 # seconds on_limit_hit: "switch_to_fallback"Spending Caps: Your Safety Net
Without caps, you’re flying without a parachute. OpenRouter offers budget alerts, but Hermes needs explicit limits:
# OpenRouter settings (in OpenRouter dashboard)openrouter_budget: monthly_limit: 30 # USD daily_limit: 2 # USD
# Hermes internal trackinghermes_budget: track_tokens: true log_usage: true warn_at_percent: 70 stop_at_percent: 95
# What happens when limit reached on_limit_reached: action: "switch_to_local" local_model: "ollama://llama3.1:8b" notify: trueThe key: automatic fallback to local models. When OpenRouter caps hit, Hermes switches to Ollama without crashing your workflow.
My Trial-and-Error Journey
I tried three approaches before settling on tiered routing:
Attempt 1: Premium Everything
model_routing: default: "anthropic/claude-3.5-sonnet"
# Result: €15 in 2 daysEvery task used Claude Sonnet. Even simple planning steps. Even error retries. The cost multiplied like compound interest.
Attempt 2: Free Tier Only
model_routing: default: "google/gemini-flash" # OpenRouter free tier
# Result: Rate limits within 30 minutesOpenRouter free tier has strict limits. My agent workflow hit them constantly. Tasks failed mid-execution.
Attempt 3: Mixed with No Caps
model_routing: planning: "google/gemini-flash" execution: "anthropic/claude-3.5-sonnet"
# No budget controls
# Result: Still burned €10/weekI used cheap models for planning but premium for execution. Without caps, complex tasks still drained credits. And when free tier hit limits, everything defaulted to premium.
The Working Solution
After those failures, I implemented full tiered routing with caps:
┌─────────────────────────────────────────────────────────────┐│ ││ BEFORE (Premium everything): ││ ───────────────────────────── ││ Daily: €3.00 ││ Weekly: €21.00 ││ Monthly: €90.00 ││ ││ AFTER (Tiered routing + caps): ││ ───────────────────────────── ││ Daily: €0.50 - €1.50 ││ Weekly: €3.50 - €10.50 ││ Monthly: €15 - €45 ││ ││ Savings: 50-80% reduction ││ │└─────────────────────────────────────────────────────────────┘Common Mistakes to Avoid
Mistake 1: Using Premium for Every Step
WRONG: Planning → Claude Sonnet (expensive) Tool Selection → Claude Sonnet (expensive) Execution → Claude Sonnet (expensive)
RIGHT: Planning → Gemini Flash (free/budget) Tool Selection → Claude Haiku (cheap) Execution → Claude Sonnet (only for complex tasks)Planning doesn’t need Claude Sonnet. A cheap model can outline steps just fine.
Mistake 2: No Spending Caps
Without caps, a runaway task drains your credits. I learned this when a debugging loop ran for 30 minutes.
budget_controls: daily_cap: 2 # EUR monthly_cap: 30 # EURMistake 3: Ignoring the OpenRouter Dashboard
OpenRouter shows real-time usage. I ignored it for two days. Then I saw 72M tokens and panicked.
Check daily. Set alerts.
Mistake 4: Testing with Credits
Testing configurations burns tokens. I used €5 just debugging my model routing.
Fix: Use Ollama local models for testing. Switch to OpenRouter only for production work.
development: provider: "ollama" model: "llama3.1:8b"
production: provider: "openrouter" model_routing: [as configured above]Free Tier Priority Strategy
OpenRouter offers free tier models. Use them first:
┌─────────────────────────────────────────────────────────────┐│ ││ Model │ Free Limit │ Use Case ││ ───────────────────────────────────────────────────────── ││ google/gemini-flash │ Rate limited │ Planning, drafts ││ meta-llama/llama-3.1 │ Rate limited │ Simple tasks ││ mistral/mistral-7b │ Rate limited │ Basic reasoning ││ ││ Strategy: ││ 1. Try free tier first ││ 2. On rate limit → fallback to budget model ││ 3. On complexity → upgrade to premium ││ │└─────────────────────────────────────────────────────────────┘Here’s how I implement this:
model_routing: planning: primary: "google/gemini-flash" # Free tier fallback_1: "meta-llama/llama-3.1-8b-instruct" # Free tier backup fallback_2: "openai/gpt-3.5-turbo" # Paid, but cheap
execution: # Try free first, escalate based on complexity layers: - model: "google/gemini-flash" trigger: "always_try_first" cost: "free" - model: "anthropic/claude-3-haiku" trigger: "complexity > 4" cost: "cheap" - model: "anthropic/claude-3.5-sonnet" trigger: "complexity > 7" cost: "premium"Monthly Cost Projection
With proper configuration, realistic costs look like this:
┌─────────────────────────────────────────────────────────────┐│ ││ Usage Pattern │ Config │ Monthly Cost ││ ───────────────────────────────────────────────────────── ││ Light (5 tasks/day) │ Tiered │ €5-10 ││ Moderate (15 tasks/day) │ Tiered │ €15-25 ││ Heavy (50 tasks/day) │ Tiered │ €30-50 ││ Testing/Development │ Ollama local │ €0 (GPU) ││ ││ Compare to unoptimized: ││ Heavy without routing │ Premium only │ €90-150 ││ │└─────────────────────────────────────────────────────────────┘Summary
OpenRouter credits evaporate because agents make multiple calls per task. The fix:
- Tiered routing: Match model cost to task complexity
- Spending caps: Set daily/monthly limits with automatic fallback
- Free tier priority: Try free models first, escalate only when needed
- Local testing: Use Ollama for development, OpenRouter for production
With these changes, I cut my monthly cost from €90+ to €15-30 while maintaining agent capability.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 OpenRouter Documentation
- 👨💻 Hermes 3 Technical Report
- 👨💻 Hermes Agent Setup Guide
- 👨💻 Reddit - Hermes 3 Discussion
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments