Skip to content

How to Configure OpenRouter with Hermes Agent Without Burning Credits

€15. Gone in 48 hours.

I was testing my Hermes agent setup with OpenRouter as the backend. Two days later, my credits were depleted. I checked my usage stats: 72 million tokens for just testing the configuration.

This wasn’t a bug. It was my architecture.

Why Credits Disappear So Fast

Hermes agents don’t make single API calls. Each task triggers a cascade:

Agent Task Token Flow
┌─────────────────────────────────────────────────────────────┐
│ │
│ Task Request │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Planning │ 1 call (~10k tokens) │
│ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Tool Selection │ 1-3 calls (~5k each) │
│ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Execution Loop │ 2-5 calls (~8k each) │
│ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Result Synthesis │ 1-2 calls (~6k each) │
│ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Error Recovery │ Additional calls if needed │
│ └─────────────────────┘ │
│ │
│ Total per task: 5-15 calls = 50-150k tokens │
│ │
└─────────────────────────────────────────────────────────────┘

At OpenRouter prices, this compounds rapidly:

Cost Projection (Unoptimized)
┌─────────────────────────────────────────────────────────────┐
│ │
│ 1 task = 5-15 calls = 50-150k tokens │
│ │
│ 10 tasks/day = 500k-1.5M tokens │
│ │
│ Using Claude-3.5-Sonnet via OpenRouter: │
│ Input: $3/M tokens, Output: $15/M tokens │
│ │
│ Daily cost: $0.50 - $3.00 │
│ Monthly cost: $15 - $90 │
│ │
│ My reality: €15 in 2 days (testing = worst case) │
│ │
└─────────────────────────────────────────────────────────────┘

I was using premium models for every step. That was mistake #1.

The Fix: Tiered Model Routing

The solution is to match model cost to task complexity:

Task Complexity vs. Model Tier
┌─────────────────────────────────────────────────────────────┐
│ │
│ Complexity Level │ Model Tier │ Example Models │
│ ───────────────────────────────────────────────────────── │
│ 1-3 (Simple) │ Free/Budget │ Gemini-Flash, │
│ │ │ GPT-3.5-Turbo │
│ ───────────────────────────────────────────────────────── │
│ 4-6 (Moderate) │ Mid-tier │ Claude-Haiku, │
│ │ │ Llama-3 │
│ ───────────────────────────────────────────────────────── │
│ 7-10 (Complex) │ Premium │ Claude-Sonnet, │
│ │ │ GPT-4o │
│ │
└─────────────────────────────────────────────────────────────┘

Here’s my optimized Hermes configuration:

hermes-config.yaml
provider: openrouter
# Model routing by task type
model_routing:
# Planning step: Use fast, cheap models
planning:
model: "google/gemini-flash-1.5"
fallback: "openai/gpt-3.5-turbo"
max_tokens: 8000
# Tool selection: Mid-tier for reliability
tool_selection:
model: "anthropic/claude-3-haiku"
fallback: "meta-llama/llama-3.1-8b-instruct"
max_tokens: 4000
# Execution reasoning: Depends on task
execution:
simple_tasks:
model: "google/gemini-flash-1.5"
trigger: complexity_score < 4
moderate_tasks:
model: "anthropic/claude-3-haiku"
trigger: complexity_score >= 4 and complexity_score < 7
complex_tasks:
model: "anthropic/claude-3.5-sonnet"
trigger: complexity_score >= 7
# Result synthesis: Keep it cheap
synthesis:
model: "openai/gpt-3.5-turbo"
fallback: "google/gemini-flash-1.5"
# Budget controls (critical!)
budget_controls:
daily_cap: 2.00 # EUR
monthly_cap: 30.00 # EUR
alert_threshold: 0.80 # Alert at 80% of cap
# Automatic fallback when capped
fallback_provider: "ollama"
fallback_model: "llama3.1:8b"
# Rate limit handling
rate_limits:
max_retries: 3
retry_delay: 30 # seconds
on_limit_hit: "switch_to_fallback"

Spending Caps: Your Safety Net

Without caps, you’re flying without a parachute. OpenRouter offers budget alerts, but Hermes needs explicit limits:

budget-controls.yaml
# OpenRouter settings (in OpenRouter dashboard)
openrouter_budget:
monthly_limit: 30 # USD
daily_limit: 2 # USD
notification_email: "[email protected]"
# Hermes internal tracking
hermes_budget:
track_tokens: true
log_usage: true
warn_at_percent: 70
stop_at_percent: 95
# What happens when limit reached
on_limit_reached:
action: "switch_to_local"
local_model: "ollama://llama3.1:8b"
notify: true

The key: automatic fallback to local models. When OpenRouter caps hit, Hermes switches to Ollama without crashing your workflow.

My Trial-and-Error Journey

I tried three approaches before settling on tiered routing:

Attempt 1: Premium Everything

Failed config - premium-only.yaml
model_routing:
default: "anthropic/claude-3.5-sonnet"
# Result: €15 in 2 days

Every task used Claude Sonnet. Even simple planning steps. Even error retries. The cost multiplied like compound interest.

Attempt 2: Free Tier Only

Failed config - free-only.yaml
model_routing:
default: "google/gemini-flash" # OpenRouter free tier
# Result: Rate limits within 30 minutes

OpenRouter free tier has strict limits. My agent workflow hit them constantly. Tasks failed mid-execution.

Attempt 3: Mixed with No Caps

Failed config - uncapped.yaml
model_routing:
planning: "google/gemini-flash"
execution: "anthropic/claude-3.5-sonnet"
# No budget controls
# Result: Still burned €10/week

I used cheap models for planning but premium for execution. Without caps, complex tasks still drained credits. And when free tier hit limits, everything defaulted to premium.

The Working Solution

After those failures, I implemented full tiered routing with caps:

Cost Comparison: Before vs After
┌─────────────────────────────────────────────────────────────┐
│ │
│ BEFORE (Premium everything): │
│ ───────────────────────────── │
│ Daily: €3.00 │
│ Weekly: €21.00 │
│ Monthly: €90.00 │
│ │
│ AFTER (Tiered routing + caps): │
│ ───────────────────────────── │
│ Daily: €0.50 - €1.50 │
│ Weekly: €3.50 - €10.50 │
│ Monthly: €15 - €45 │
│ │
│ Savings: 50-80% reduction │
│ │
└─────────────────────────────────────────────────────────────┘

Common Mistakes to Avoid

Mistake 1: Using Premium for Every Step

Wrong vs Right
WRONG:
Planning → Claude Sonnet (expensive)
Tool Selection → Claude Sonnet (expensive)
Execution → Claude Sonnet (expensive)
RIGHT:
Planning → Gemini Flash (free/budget)
Tool Selection → Claude Haiku (cheap)
Execution → Claude Sonnet (only for complex tasks)

Planning doesn’t need Claude Sonnet. A cheap model can outline steps just fine.

Mistake 2: No Spending Caps

Without caps, a runaway task drains your credits. I learned this when a debugging loop ran for 30 minutes.

Add this to your config
budget_controls:
daily_cap: 2 # EUR
monthly_cap: 30 # EUR

Mistake 3: Ignoring the OpenRouter Dashboard

OpenRouter shows real-time usage. I ignored it for two days. Then I saw 72M tokens and panicked.

Check daily. Set alerts.

Mistake 4: Testing with Credits

Testing configurations burns tokens. I used €5 just debugging my model routing.

Fix: Use Ollama local models for testing. Switch to OpenRouter only for production work.

Testing config
development:
provider: "ollama"
model: "llama3.1:8b"
production:
provider: "openrouter"
model_routing: [as configured above]

Free Tier Priority Strategy

OpenRouter offers free tier models. Use them first:

OpenRouter Free Tier Models
┌─────────────────────────────────────────────────────────────┐
│ │
│ Model │ Free Limit │ Use Case │
│ ───────────────────────────────────────────────────────── │
│ google/gemini-flash │ Rate limited │ Planning, drafts │
│ meta-llama/llama-3.1 │ Rate limited │ Simple tasks │
│ mistral/mistral-7b │ Rate limited │ Basic reasoning │
│ │
│ Strategy: │
│ 1. Try free tier first │
│ 2. On rate limit → fallback to budget model │
│ 3. On complexity → upgrade to premium │
│ │
└─────────────────────────────────────────────────────────────┘

Here’s how I implement this:

free-tier-priority.yaml
model_routing:
planning:
primary: "google/gemini-flash" # Free tier
fallback_1: "meta-llama/llama-3.1-8b-instruct" # Free tier backup
fallback_2: "openai/gpt-3.5-turbo" # Paid, but cheap
execution:
# Try free first, escalate based on complexity
layers:
- model: "google/gemini-flash"
trigger: "always_try_first"
cost: "free"
- model: "anthropic/claude-3-haiku"
trigger: "complexity > 4"
cost: "cheap"
- model: "anthropic/claude-3.5-sonnet"
trigger: "complexity > 7"
cost: "premium"

Monthly Cost Projection

With proper configuration, realistic costs look like this:

Projected Monthly Costs
┌─────────────────────────────────────────────────────────────┐
│ │
│ Usage Pattern │ Config │ Monthly Cost │
│ ───────────────────────────────────────────────────────── │
│ Light (5 tasks/day) │ Tiered │ €5-10 │
│ Moderate (15 tasks/day) │ Tiered │ €15-25 │
│ Heavy (50 tasks/day) │ Tiered │ €30-50 │
│ Testing/Development │ Ollama local │ €0 (GPU) │
│ │
│ Compare to unoptimized: │
│ Heavy without routing │ Premium only │ €90-150 │
│ │
└─────────────────────────────────────────────────────────────┘

Summary

OpenRouter credits evaporate because agents make multiple calls per task. The fix:

  1. Tiered routing: Match model cost to task complexity
  2. Spending caps: Set daily/monthly limits with automatic fallback
  3. Free tier priority: Try free models first, escalate only when needed
  4. Local testing: Use Ollama for development, OpenRouter for production

With these changes, I cut my monthly cost from €90+ to €15-30 while maintaining agent capability.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments