How Much Does Claude Opus API Cost? (Plus 5 Ways to Slash Your Bill)

Mar 11, 2026

The Problem: My First $47 API Bill

I opened my Anthropic billing dashboard and stared at the number: $47.38 for one week of Claude Opus usage.

That’s nearly $200/month if I kept this pace. For a personal project.

The worst part? I didn’t even realize how fast tokens were stacking up. One complex conversation with code analysis cost me $3.42. A debugging session that ran through 15 iterations burned another $8.

I found a Reddit thread on r/clawdbot where someone asked the same question:

“how are you not burning a fortune with opus?”

Another user replied with practical advice:

“take my advice and host openclaw on a 5$ VPS first for 20 months with a hard limited API or opus”

This was my wake-up call. I needed to understand exactly what I was paying for and how to control it.

Understanding Claude Opus Pricing

Let me start with the raw numbers. Claude Opus 4.6 is Anthropic’s most capable model, and you pay for that capability:

Input tokens:  $15 per million
Output tokens: $75 per million

For comparison with other Claude tiers:

Model           | Input ($/M) | Output ($/M) | Relative Cost
----------------|-------------|--------------|---------------
Claude Haiku    | $0.25       | $1.25        | 1x (cheapest)
Claude Sonnet   | $3          | $15          | 12x
Claude Opus     | $15         | $75          | 60x

The price difference is massive. Opus costs 60x more than Haiku per token. This means every optimization matters.

What Actually Costs Money

Here’s what I didn’t realize at first: the API charges for everything.

Conversation Component       | Token Cost Impact
-----------------------------|-------------------
System prompts               | Every request (unless cached)
Conversation history         | Exponential growth with length
Failed/retried requests      | You pay even when it fails
Structured output (JSON)     | Extra tokens for formatting
Code in responses            | Output tokens add up fast

A single conversation with 10 back-and-forth messages can easily consume 50,000+ tokens. At Opus pricing, that’s $0.75 just for context before you even get a useful response.

Step 1: Model Tier Selection (The 70% Savings)

My first discovery: I was using Opus for everything. Simple “what is” questions. List formatting. Basic code reviews.

This was wasteful. I analyzed my last 100 API calls and found:

Task Type                  | % of Calls | Opus Needed?
---------------------------|------------|-------------
Simple queries             | 45%        | No
Medium complexity tasks    | 35%        | Sometimes
Complex reasoning          | 20%        | Yes

70% of my calls didn’t need Opus. I could route them to cheaper models.

Building a Model Router

I created a simple routing function:

def route_query(query: str) -> str:
    """Route to appropriate model based on query complexity."""

    # Simple heuristics for routing
    simple_indicators = [
        len(query) < 100,
        any(word in query.lower() for word in ["what is", "define", "list"]),
        query.count("?") == 1
    ]

    complex_indicators = [
        len(query) > 500,
        any(word in query.lower() for word in ["analyze", "compare", "reasoning"]),
        "step by step" in query.lower()
    ]

    if sum(simple_indicators) >= 2:
        return "claude-haiku-4-5-20250514"  # $0.25/$1.25 per M tokens
    elif sum(complex_indicators) >= 2:
        return "claude-opus-4-6-20250514"   # $15/$75 per M tokens
    else:
        return "claude-sonnet-4-5-20250514"  # $3/$15 per M tokens

This naive routing cut my bill by 40% in the first week. Not perfect, but a start.

Before routing: $47.38/week
After routing:  $28.43/week
Savings:        40%

Breakdown:
- Haiku:  45% of calls @ $0.25/$1.25 = $2.10
- Sonnet: 35% of calls @ $3/$15     = $12.50
- Opus:   20% of calls @ $15/$75     = $13.83
Total: $28.43

Step 2: Prompt Caching (The 90% Savings on Repeated Context)

My next discovery: Anthropic offers prompt caching for repeated instructions.

When you send the same system prompt across multiple requests, you can cache it and get 90% off those tokens. This is huge for:

System prompts you reuse
Few-shot examples
Long context documents

def cached_completion(system_prompt: str, user_query: str):
    """Use prompt caching for repeated system instructions."""

    response = client.messages.create(
        model="claude-opus-4-6-20250514",
        max_tokens=1024,
        system=[
            {
                "type": "text",
                "text": system_prompt,
                "cache_control": {"type": "ephemeral"}
            }
        ],
        messages=[{"role": "user", "content": user_query}]
    )

    # Cached tokens cost 90% less
    cached_savings = response.usage.cache_read_input_tokens * 0.9 * 15 / 1_000_000

    return {
        "response": response.content[0].text,
        "cache_savings_usd": round(cached_savings, 4)
    }

For my code review bot with a 2,000-token system prompt:

Without caching:
- 2,000 tokens × $15/M × 100 requests = $3.00

With caching (90% off on reads):
- First request: 2,000 tokens = $0.03
- Next 99 requests: 2,000 × 0.1 × $0.03 = $0.27
- Total: $0.30

Savings: 90%

The cache is valid for 5+ minutes, so repeated API calls within that window benefit significantly.

Step 3: Token Optimization (The Fine-Tuning)

Beyond model selection and caching, I found several token reduction techniques:

Technique                  | Savings  | Trade-off
---------------------------|----------|------------------
Shorter system prompts     | 10-30%   | Less context
Remove conversation history| 20-50%   | No context retention
Summarize vs full context  | 40-60%   | Information loss
Structured output (JSON)   | 5-15%    | Format constraints

The biggest culprit was conversation history. Each message in a conversation gets re-sent with every new request:

Message 1:  1,000 tokens  (user) + 500 tokens  (assistant) = 1,500 total
Message 2:  1,500 (history) + 1,000 (new) + 500 = 3,000 total
Message 3:  3,000 (history) + 1,000 (new) + 500 = 4,500 total
Message 4:  4,500 (history) + 1,000 (new) + 500 = 6,000 total

Token growth is EXPONENTIAL in conversations

I started truncating or summarizing old messages:

def truncate_history(messages: list, max_tokens: int = 10000):
    """Keep only recent messages to control costs."""
    truncated = []
    total_tokens = 0

    # Work backwards, keeping most recent messages
    for msg in reversed(messages):
        msg_tokens = count_tokens(msg["content"])
        if total_tokens + msg_tokens > max_tokens:
            break
        truncated.insert(0, msg)
        total_tokens += msg_tokens

    return truncated

Step 4: Budget Limits (The Hard Stop)

The Reddit advice was clear: set hard limits before you need them.

I built a budget tracker that stops API calls when limits are hit:

import os
from datetime import datetime, timedelta
from collections import defaultdict

class BudgetTracker:
    def __init__(self, daily_limit: float = 50.0, alert_threshold: float = 30.0):
        self.daily_limit = daily_limit
        self.alert_threshold = alert_threshold
        self.daily_spend = defaultdict(float)

    def track_request(self, cost: float) -> bool:
        """Returns False if budget exceeded."""
        today = datetime.now().date().isoformat()
        self.daily_spend[today] += cost

        if self.daily_spend[today] >= self.alert_threshold:
            print(f"Alert: ${self.daily_spend[today]:.2f} spent today")

        if self.daily_spend[today] >= self.daily_limit:
            print(f"Budget exceeded: ${self.daily_spend[today]:.2f}")
            return False

        return True

    def get_remaining_budget(self) -> float:
        today = datetime.now().date().isoformat()
        return max(0, self.daily_limit - self.daily_spend[today])

And integrated it into my API calls:

def safe_api_call(prompt: str):
    result = get_completion_with_cost(prompt)
    if not tracker.track_request(result['cost_usd']):
        raise Exception("Daily budget exceeded!")
    return result

This prevents the “$500 surprise bill” scenario.

Step 5: Architectural Patterns (The Systemic Fix)

The final piece was designing systems that minimize Opus usage by default.

The Router Pattern

User Query
    │
    ▼
Classifier (Haiku - cheap)
    │
    ├─── Simple Task ────► Haiku ($0.25/$1.25)
    │
    ├─── Medium Task ────► Sonnet ($3/$15)
    │
    └─── Complex Task ───► Opus ($15/$75)

Instead of sending everything to Opus, route based on complexity:

def calculate_cost(input_tokens: int, output_tokens: int, model: str = "opus"):
    """Calculate actual cost in USD."""
    prices = {
        "haiku": (0.25, 1.25),
        "sonnet": (3, 15),
        "opus": (15, 75)
    }

    input_price, output_price = prices.get(model, (15, 75))
    input_cost = (input_tokens / 1_000_000) * input_price
    output_cost = (output_tokens / 1_000_000) * output_price

    return round(input_cost + output_cost, 4)

# Example:
# Haiku:  10K input + 2K output = $0.005
# Opus:   10K input + 2K output = $0.30

Batch Processing

For non-urgent tasks, I accumulate requests and process them in batches:

Instead of:
- 100 individual requests throughout the day
- Each request: full context + overhead

Do this:
- Accumulate 100 requests
- Process in one batch call with shared context
- Distribute results

Savings: Reduced context repetition, better cache utilization

The Results: My Optimized Stack

After implementing all five strategies, my weekly costs dropped significantly:

Strategy              | Weekly Savings | Cumulative
----------------------|----------------|------------
Model routing         | $19 (40%)      | $28.43/week
Prompt caching        | $5 (18%)       | $23.43/week
Token optimization    | $3 (13%)       | $20.43/week
Budget limits         | Preventive     | N/A
Architectural changes | $5 (25%)       | $15.43/week

Total reduction: $47.38 → $15.43/week (67% savings)

Common Mistakes I Made

Mistake 1: Using Opus for everything

My first week, every API call went to Opus. 70% of those calls could have used Haiku. That’s a 60x price difference wasted.

Mistake 2: Ignoring prompt caching

I sent the same 3,000-token system prompt 50 times a day. That’s $2.25/day in system prompts alone. With caching: $0.23/day.

Mistake 3: No usage monitoring

I didn’t track costs per request. When the bill arrived, I had no idea which conversations were expensive. Now I log every call:

def get_completion_with_cost(prompt: str, model: str = "claude-opus-4-6-20250514"):
    response = client.messages.create(
        model=model,
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )

    input_cost = (response.usage.input_tokens / 1_000_000) * 15
    output_cost = (response.usage.output_tokens / 1_000_000) * 75
    total_cost = input_cost + output_cost

    return {
        "response": response.content[0].text,
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens,
        "cost_usd": round(total_cost, 4)
    }

Mistake 4: Full conversation history

I kept the entire conversation context for every request. A 20-message conversation costs 10x more than a 2-message conversation due to token accumulation.

Mistake 5: No budget limits

The API will happily process unlimited requests. Without hard limits, a runaway script or unexpected usage pattern can drain your budget in hours.

When Opus Is Worth the Cost

After all this optimization, when do I actually use Opus?

Task Type                    | Example                          | Cost Justification
-----------------------------|----------------------------------|--------------------
Complex code architecture    | "Design a microservices system"  | One $3 call saves hours of work
Deep analysis               | "Debug this race condition"      | Quality matters more than cost
Multi-step reasoning        | "Plan a database migration"      | Opus handles complexity better
Creative writing            | Long-form technical content     | Output quality justifies cost

Tasks I route AWAY from Opus:
Simple queries              | "What is Docker?"                | Haiku: $0.001 vs Opus: $0.03
Formatting                  | "Convert this to JSON"           | Sonnet: $0.02 vs Opus: $0.10
Short reviews               | "Review this 10-line function"  | Sonnet handles this well

If you’re optimizing AI costs, you might also want to explore:

OpenAI API Pricing - Similar optimization strategies apply to GPT models
Local LLM Deployment - Running models on your own hardware for zero per-token cost
Prompt Engineering - Better prompts mean fewer tokens needed
Anthropic’s Prompt Caching Docs - Official documentation on caching implementation

References

Anthropic API Pricing - Official pricing page
Prompt Caching Documentation - How to implement caching
Reddit Discussion on r/clawdbot - Community experiences with Opus costs

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!