Why Does Claude Code's $100 Plan Feel More Restrictive Than Codex's $20?

Mar 28, 2026

I stared at my screen in disbelief. My Claude Code subscription—$100 per month for the Max plan—had just cut me off mid-refactoring session. Meanwhile, my colleague with his $20 Codex subscription was still coding away happily.

What’s going on here?

The Token Mystery: Where Did They All Go?

I started tracking my usage obsessively. Every morning, I’d check my token count, code for a few hours, and watch the numbers plummet faster than I expected.

from datetime import datetime
import json

class TokenTracker:
    def __init__(self, daily_allowance: int = 200000):
        self.daily_allowance = daily_allowance
        self.sessions = []

    def log_session(self, model: str, duration_minutes: int, tokens_used: int):
        """Log a coding session and its token consumption."""
        session = {
            "timestamp": datetime.now().isoformat(),
            "model": model,
            "duration_minutes": duration_minutes,
            "tokens_used": tokens_used,
            "tokens_per_minute": tokens_used / duration_minutes
        }
        self.sessions.append(session)
        return session

    def get_consumption_rate(self, model: str) -> float:
        """Calculate average token consumption rate for a model."""
        model_sessions = [s for s in self.sessions if s["model"] == model]
        if not model_sessions:
            return 0.0

        total_tokens = sum(s["tokens_used"] for s in model_sessions)
        total_minutes = sum(s["duration_minutes"] for s in model_sessions)

        return total_tokens / total_minutes if total_minutes > 0 else 0.0

# My actual usage data from a week of coding
tracker = TokenTracker(daily_allowance=200000)

# Sonnet sessions - reasonable consumption
tracker.log_session("sonnet", 45, 35000)
tracker.log_session("sonnet", 30, 22000)

# Opus session - wait, what?!
tracker.log_session("opus", 60, 180000)  # That's 90% of my daily limit!

print(f"Sonnet avg: {tracker.get_consumption_rate('sonnet'):.0f} tokens/min")
print(f"Opus avg: {tracker.get_consumption_rate('opus'):.0f} tokens/min")

The results shocked me. When I used the Opus model, I was burning through tokens at 5-10x the rate of Sonnet. A single extended debugging session with Opus consumed my entire daily allowance.

Peak Hours: The Hidden Throttling

Then I noticed something else. My token counts seemed to vanish even faster during certain times of day.

I started documenting when I hit limits:

from collections import defaultdict
from datetime import datetime, time

class PeakHoursAnalyzer:
    def __init__(self):
        self.limit_hits = defaultdict(list)

    def log_limit_event(self, timestamp: datetime, tokens_remaining: int):
        """Log when we hit a limit event."""
        hour = timestamp.hour
        self.limit_hits[hour].append({
            "time": timestamp,
            "remaining": tokens_remaining
        })

    def analyze_peak_hours(self) -> dict:
        """Analyze when we're most likely to hit limits."""
        hour_counts = {}
        for hour, events in self.limit_hits.items():
            hour_counts[hour] = len(events)

        # US business hours: 9am-6pm EST = 14:00-23:00 UTC
        us_business_hours = range(14, 23)
        peak_limit_count = sum(
            hour_counts.get(h, 0) for h in us_business_hours
        )

        return {
            "hour_distribution": hour_counts,
            "us_business_hours_total": peak_limit_count,
            "total_events": sum(hour_counts.values())
        }

# My data from two weeks
analyzer = PeakHoursAnalyzer()

# Most limit hits happened during these hours
events = [
    (datetime(2026, 3, 14, 15, 30), 0),   # 11:30am EST
    (datetime(2026, 3, 14, 16, 45), 0),   # 12:45pm EST
    (datetime(2026, 3, 15, 18, 20), 0),   # 2:20pm EST
    (datetime(2026, 3, 16, 15, 10), 0),   # 11:10am EST
    (datetime(2026, 3, 18, 21, 30), 0),   # 5:30pm EST
]

for ts, remaining in events:
    analyzer.log_limit_event(ts, remaining)

results = analyzer.analyze_peak_hours()
print(f"US business hours limit hits: {results['us_business_hours_total']}")
print(f"Total limit events: {results['total_events']}")
print(f"Business hours percentage: {results['us_business_hours_total']/results['total_events']*100:.0f}%")

80% of my limit hits occurred during US business hours (9am-6pm EST). That’s exactly when I needed Claude Code most.

Meanwhile, my European colleague—who works evening hours in his timezone—rarely hit limits. The “peak hours” throttling is real, and it’s brutal if you’re a US-based developer.

The Model Tier Penalty

I experimented with different model combinations to understand the token consumption:

class ModelTokenComparison:
    """Compare token costs across Claude models."""

    # Approximate token multipliers relative to Haiku
    MODEL_MULTIPLIERS = {
        "haiku": 1.0,
        "sonnet": 3.0,
        "opus": 10.0
    }

    def __init__(self, daily_token_budget: int = 200000):
        self.budget = daily_token_budget

    def estimate_sessions(self, model: str, avg_session_tokens: int = 5000) -> dict:
        """Estimate how many coding sessions you can do per model."""
        effective_cost = avg_session_tokens * self.MODEL_MULTIPLIERS[model]
        sessions = self.budget // effective_cost

        return {
            "model": model,
            "multiplier": self.MODEL_MULTIPLIERS[model],
            "effective_cost_per_session": effective_cost,
            "estimated_sessions": sessions,
            "hours_of_coding": sessions * 0.5  # Assuming 30min sessions
        }

    def compare_all(self):
        """Compare all models side-by-side."""
        results = []
        for model in self.MODEL_MULTIPLIERS:
            results.append(self.estimate_sessions(model))
        return results

comparison = ModelTokenComparison(daily_token_budget=200000)

print("Daily Token Budget: 200,000")
print("-" * 60)
for result in comparison.compare_all():
    print(f"{result['model'].upper():8} | "
          f"Multiplier: {result['multiplier']:4.1f}x | "
          f"Sessions: {result['estimated_sessions']:3} | "
          f"Hours: {result['hours_of_coding']:.1f}h")

The output told the story:

Daily Token Budget: 200,000
------------------------------------------------------------
HAIKU    | Multiplier:  1.0x | Sessions:  40 | Hours: 20.0h
SONNET   | Multiplier:  3.0x | Sessions:  13 | Hours:  6.5h
OPUS     | Multiplier: 10.0x | Sessions:   4 | Hours:  2.0h

If I stuck to Haiku, I could code for 20 hours. With Opus? Just 2 hours. The $100 Max plan becomes essentially useless for extended Opus sessions.

The No-Rollover Problem

Here’s another frustration: unused tokens don’t carry over.

On days when I had meetings, or focused on non-AI-assisted tasks, my token allowance vanished. Come the weekend when I wanted to do a deep coding marathon, I still only had the standard daily allowance.

from dataclasses import dataclass
from datetime import date, timedelta

@dataclass
class DailyUsage:
    date: date
    allowance: int
    used: int
    wasted: int  # allowance - used, but doesn't carry over

    @property
    def utilization(self) -> float:
        return self.used / self.allowance if self.allowance > 0 else 0

class RolloverSimulator:
    """Demonstrate the no-rollover policy impact."""

    def __init__(self, daily_allowance: int = 200000):
        self.daily_allowance = daily_allowance
        self.weekly_usage = []

    def simulate_week(self, daily_usage_hours: list[int]):
        """Simulate a week's worth of usage.

        Args:
            daily_usage_hours: List of 7 integers representing coding hours per day
        """
        start_date = date(2026, 3, 23)  # Monday

        for day_offset, hours in enumerate(daily_usage_hours):
            current_date = start_date + timedelta(days=day_offset)
            # Assume 20000 tokens consumed per hour of active coding
            used = min(hours * 20000, self.daily_allowance)
            wasted = self.daily_allowance - used

            self.weekly_usage.append(DailyUsage(
                date=current_date,
                allowance=self.daily_allowance,
                used=used,
                wasted=wasted
            ))

    def calculate_totals(self) -> dict:
        """Calculate weekly totals."""
        total_used = sum(d.used for d in self.weekly_usage)
        total_wasted = sum(d.wasted for d in self.weekly_usage)
        total_allowance = sum(d.allowance for d in self.weekly_usage)

        return {
            "total_tokens_available": total_allowance,
            "total_tokens_used": total_used,
            "total_tokens_wasted": total_wasted,
            "effective_utilization": total_used / total_allowance
        }

# My actual week - meeting-heavy Mon-Thu, coding marathon Sat-Sun
sim = RolloverSimulator(daily_allowance=200000)
sim.simulate_week([
    1,  # Monday - meetings
    2,  # Tuesday - planning
    1,  # Wednesday - code review
    0,  # Thursday - all-day meeting
    4,  # Friday - light coding
    8,  # Saturday - marathon
    8,  # Sunday - marathon
])

totals = sim.calculate_totals()
print(f"Weekly Allowance: {totals['total_tokens_available']:,}")
print(f"Tokens Used: {totals['total_tokens_used']:,}")
print(f"Tokens Wasted (no rollover): {totals['total_tokens_wasted']:,}")
print(f"Utilization: {totals['effective_utilization']*100:.0f}%")

Results:

Weekly Allowance: 1,400,000
Tokens Used: 480,000
Tokens Wasted (no rollover): 920,000
Utilization: 34%

I wasted 920,000 tokens in a week—almost a million tokens that could have powered my weekend marathons. Instead, I hit my limit on Sunday afternoon despite having “unused” tokens from earlier in the week.

The Shared Quota Problem

Then I discovered the final insult: Claude Code and the web interface share the same token pool.

When I used Claude’s web interface for research, documentation reading, or quick questions, those tokens counted against my Claude Code allowance.

from typing import Literal

ActivityType = Literal["code", "research", "documentation", "planning"]

class SharedQuotaTracker:
    """Track shared quota between Claude Code and web interface."""

    def __init__(self, daily_allowance: int = 200000):
        self.daily_allowance = daily_allowance
        self.activities = []

    def log_activity(self,
                     source: Literal["claude-code", "web"],
                     activity_type: ActivityType,
                     tokens: int):
        """Log an activity that consumes tokens."""
        self.activities.append({
            "source": source,
            "type": activity_type,
            "tokens": tokens
        })

    def analyze_consumption(self) -> dict:
        """Analyze where tokens are going."""
        by_source = {}
        by_activity = {}

        for activity in self.activities:
            source = activity["source"]
            activity_type = activity["type"]
            tokens = activity["tokens"]

            by_source[source] = by_source.get(source, 0) + tokens
            by_activity[activity_type] = by_activity.get(activity_type, 0) + tokens

        total = sum(by_source.values())
        remaining = self.daily_allowance - total

        return {
            "by_source": by_source,
            "by_activity": by_activity,
            "total_used": total,
            "remaining": remaining,
            "exceeded": remaining < 0
        }

# My typical day
tracker = SharedQuotaTracker(daily_allowance=200000)

# Morning: web interface for research
tracker.log_activity("web", "research", 15000)
tracker.log_activity("web", "documentation", 10000)

# Late morning: Claude Code for actual coding
tracker.log_activity("claude-code", "code", 60000)

# Afternoon: more web interface for planning
tracker.log_activity("web", "planning", 8000)

# Evening: trying to code more with Claude Code
tracker.log_activity("claude-code", "code", 50000)

# Later evening: blocked!
tracker.log_activity("claude-code", "code", 57001)  # Would exceed limit

analysis = tracker.analyze_consumption()
print(f"Web interface used: {analysis['by_source']['web']:,} tokens")
print(f"Claude Code used: {analysis['by_source']['claude-code']:,} tokens")
print(f"Total: {analysis['total_used']:,} tokens")
print(f"Remaining: {analysis['remaining']:,} tokens")
print(f"Exceeded: {analysis['exceeded']}")

I had been inadvertently consuming my coding allowance with web interface usage. 25,000 tokens spent on research in the morning meant 25,000 fewer tokens for coding in the evening.

Optimization Strategies That Actually Work

After weeks of frustration, I developed a systematic approach to maximize value from my subscription.

1. Strategic Model Selection

from enum import Enum
from typing import Optional

class TaskComplexity(Enum):
    SIMPLE = "simple"           # Quick queries, simple refactors
    MODERATE = "moderate"       # Standard coding, debugging
    COMPLEX = "complex"         # Architecture, optimization
    DEEP_REASONING = "deep"     # Research, planning

class ModelSelector:
    """Select the optimal model based on task complexity."""

    MODEL_COSTS = {
        "haiku": 1.0,
        "sonnet": 3.0,
        "opus": 10.0
    }

    COMPLEXITY_MODEL_MAP = {
        TaskComplexity.SIMPLE: "haiku",
        TaskComplexity.MODERATE: "sonnet",
        TaskComplexity.COMPLEX: "sonnet",
        TaskComplexity.DEEP_REASONING: "opus"
    }

    def __init__(self, remaining_tokens: int, daily_budget: int = 200000):
        self.remaining_tokens = remaining_tokens
        self.daily_budget = daily_budget

    def select_model(self,
                     task_complexity: TaskComplexity,
                     estimated_duration_minutes: int = 30) -> dict:
        """Select optimal model for current situation."""

        # Base model for complexity
        preferred_model = self.COMPLEXITY_MODEL_MAP[task_complexity]

        # Estimate token cost (20000 tokens/hour for Haiku baseline)
        baseline_tokens = (estimated_duration_minutes / 60) * 20000
        estimated_cost = baseline_tokens * self.MODEL_COSTS[preferred_model]

        # Can we afford the preferred model?
        if estimated_cost > self.remaining_tokens:
            # Downgrade
            if preferred_model == "opus":
                preferred_model = "sonnet"
                estimated_cost = baseline_tokens * self.MODEL_COSTS["sonnet"]
            if estimated_cost > self.remaining_tokens and preferred_model == "sonnet":
                preferred_model = "haiku"
                estimated_cost = baseline_tokens * self.MODEL_COSTS["haiku"]

        # Check budget percentage
        budget_percentage = (self.remaining_tokens / self.daily_budget) * 100

        # Special rules for end-of-day
        if budget_percentage < 20 and task_complexity in [TaskComplexity.COMPLEX, TaskComplexity.DEEP_REASONING]:
            preferred_model = "sonnet"  # Conserve tokens

        return {
            "recommended_model": preferred_model,
            "estimated_cost": estimated_cost,
            "remaining_after": self.remaining_tokens - estimated_cost,
            "budget_percentage": budget_percentage
        }

# Example usage
selector = ModelSelector(remaining_tokens=50000)

print("Task: Deep reasoning (would prefer Opus)")
result = selector.select_model(TaskComplexity.DEEP_REASONING, estimated_duration_minutes=60)
print(f"Recommended: {result['recommended_model'].upper()}")
print(f"Cost: {result['estimated_cost']:,.0f} tokens")
print(f"Remaining: {result['remaining_after']:,.0f} tokens")
print(f"Budget: {result['budget_percentage']:.0f}%")

2. Off-Peak Scheduling

from datetime import datetime, time
from enum import Enum

class TimeSlot(Enum):
    PEAK = "peak"           # High congestion, high throttling
    MODERATE = "moderate"   # Normal usage
    OFF_PEAK = "off_peak"   # Best performance

class PeakHourScheduler:
    """Schedule heavy tasks during off-peak hours."""

    # US Eastern Time zones (UTC offsets)
    # Peak hours: 9am-6pm EST = 14:00-23:00 UTC
    PEAK_START_UTC = 14
    PEAK_END_UTC = 23

    def __init__(self, timezone_offset: int = -5):  # EST
        self.timezone_offset = timezone_offset

    def get_current_slot(self, current_time: Optional[datetime] = None) -> TimeSlot:
        """Determine current time slot quality."""
        if current_time is None:
            current_time = datetime.utcnow()

        hour_utc = current_time.hour

        if self.PEAK_START_UTC <= hour_utc < self.PEAK_END_UTC:
            return TimeSlot.PEAK
        elif 6 <= hour_utc < 10 or 0 <= hour_utc < 6:
            return TimeSlot.OFF_PEAK
        else:
            return TimeSlot.MODERATE

    def recommend_scheduling(self,
                              task_complexity: str,
                              estimated_tokens: int,
                              current_tokens_remaining: int) -> dict:
        """Recommend whether to proceed now or schedule later."""

        current_slot = self.get_current_slot()

        # Simple tasks: proceed anytime
        if task_complexity == "simple":
            return {
                "action": "proceed",
                "reason": "Simple tasks can run during any time slot",
                "time_slot": current_slot.value
            }

        # Complex tasks during peak: schedule for later
        if task_complexity in ["complex", "deep_reasoning"] and current_slot == TimeSlot.PEAK:
            return {
                "action": "schedule",
                "reason": "Save heavy tasks for off-peak hours",
                "recommended_time": "After 11pm EST or before 9am EST",
                "current_time_slot": current_slot.value
            }

        # Token budget low and peak hours: wait
        if current_tokens_remaining < 50000 and current_slot == TimeSlot.PEAK:
            return {
                "action": "wait",
                "reason": "Low token budget + peak hours = poor experience",
                "recommended_action": "Use Haiku for simple tasks only",
                "time_slot": current_slot.value
            }

        return {
            "action": "proceed",
            "reason": "Good conditions for task",
            "time_slot": current_slot.value
        }

# Practical example
scheduler = PeakHourScheduler(timezone_offset=-5)  # EST

# Check current situation
now = datetime.utcnow()
print(f"Current time slot: {scheduler.get_current_slot(now).value}")

# Should I do a complex refactoring now?
recommendation = scheduler.recommend_scheduling(
    task_complexity="deep_reasoning",
    estimated_tokens=80000,
    current_tokens_remaining=40000
)

print(f"\nAction: {recommendation['action']}")
print(f"Reason: {recommendation['reason']}")
if 'recommended_time' in recommendation:
    print(f"Recommended time: {recommendation['recommended_time']}")

3. Multi-Tool Workflow

I stopped using Claude Code for everything. Instead, I built a hybrid workflow:

from dataclasses import dataclass
from typing import List

@dataclass
class Tool:
    name: str
    cost_per_month: int
    strengths: List[str]
    usage_style: str

class HybridWorkflow:
    """Optimize across multiple AI tools."""

    def __init__(self):
        self.tools = {
            "claude-code": Tool(
                name="Claude Code (Max)",
                cost_per_month=100,
                strengths=["planning", "architecture", "deep-reasoning"],
                usage_style="Strategic sessions during off-peak"
            ),
            "codex": Tool(
                name="Codex",
                cost_per_month=20,
                strengths=["code-generation", "execution", "boilerplate"],
                usage_style="Heavy daily use"
            ),
            "claude-api": Tool(
                name="Claude API (Direct)",
                cost_per_month=0,  # Pay per use
                strengths=["haiku-tasks", "simple-queries", "bulk-operations"],
                usage_style="Spillover when subscription limits hit"
            )
        }

    def route_task(self, task_type: str, complexity: str) -> dict:
        """Route a task to the optimal tool."""

        routing = {
            # Planning and architecture: Claude Code with Sonnet/Opus
            ("planning", "complex"): {
                "tool": "claude-code",
                "model": "opus",
                "reason": "Deep reasoning for architecture"
            },
            ("planning", "moderate"): {
                "tool": "claude-code",
                "model": "sonnet",
                "reason": "Good balance for planning"
            },

            # Code generation: Codex for volume
            ("code-generation", "simple"): {
                "tool": "codex",
                "model": "gpt-4",
                "reason": "Cost-effective for boilerplate"
            },
            ("code-generation", "moderate"): {
                "tool": "codex",
                "model": "gpt-4",
                "reason": "Better value for frequent coding"
            },

            # Simple queries: Claude API with Haiku
            ("query", "simple"): {
                "tool": "claude-api",
                "model": "haiku",
                "reason": "Pay per use, extremely cheap"
            },

            # Debugging: Depends on complexity
            ("debugging", "complex"): {
                "tool": "claude-code",
                "model": "sonnet",
                "reason": "Need good reasoning for complex bugs"
            },
            ("debugging", "simple"): {
                "tool": "codex",
                "model": "gpt-4",
                "reason": "Quick fixes, better value"
            }
        }

        key = (task_type, complexity)
        result = routing.get(key, {
            "tool": "codex",
            "model": "gpt-4",
            "reason": "Default to most cost-effective"
        })

        return {
            **result,
            "tool_details": self.tools[result["tool"]]
        }

# Example routing decisions
workflow = HybridWorkflow()

tasks = [
    ("planning", "complex"),
    ("code-generation", "simple"),
    ("query", "simple"),
    ("debugging", "complex")
]

for task_type, complexity in tasks:
    route = workflow.route_task(task_type, complexity)
    print(f"\nTask: {task_type} ({complexity})")
    print(f"  Tool: {route['tool']}")
    print(f"  Model: {route['model']}")
    print(f"  Reason: {route['reason']}")

4. API Alternative for Overflow

When subscription limits hit, I use the direct API:

import os
from anthropic import Anthropic

class APIFallback:
    """Use direct API when subscription limits are exhausted."""

    def __init__(self):
        self.client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
        self.cost_per_1k_tokens = {
            "haiku": {"input": 0.00025, "output": 0.00125},
            "sonnet": {"input": 0.003, "output": 0.015},
            "opus": {"input": 0.015, "output": 0.075}
        }

    def estimate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
        """Estimate cost for an API call."""
        rates = self.cost_per_1k_tokens[model]
        input_cost = (input_tokens / 1000) * rates["input"]
        output_cost = (output_tokens / 1000) * rates["output"]
        return input_cost + output_cost

    def cheap_haiku_query(self, prompt: str) -> dict:
        """Execute a simple query with Haiku via API."""

        # Typical simple query: 500 input, 200 output
        estimated_cost = self.estimate_cost("haiku", 500, 200)

        message = self.client.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=200,
            messages=[{"role": "user", "content": prompt}]
        )

        return {
            "response": message.content[0].text,
            "input_tokens": message.usage.input_tokens,
            "output_tokens": message.usage.output_tokens,
            "actual_cost": self.estimate_cost(
                "haiku",
                message.usage.input_tokens,
                message.usage.output_tokens
            )
        }

# Example: 100 simple queries via API vs subscription
fallback = APIFallback()

# Cost for 100 simple queries
single_query_cost = fallback.estimate_cost("haiku", 500, 200)
hundred_queries_cost = single_query_cost * 100

print(f"100 simple queries via API (Haiku):")
print(f"  Estimated cost: ${hundred_queries_cost:.4f}")
print(f"  Claude Max subscription: $100.00")
print(f"  API is {100/hundred_queries_cost:.0f}x cheaper for simple tasks!")

The math is eye-opening: for simple tasks, the API is dramatically cheaper than the subscription. A hundred Haiku queries costs less than $0.20 via API, compared to a $100/month subscription.

What I Actually Do Now

My current workflow:

Morning planning (off-peak): Use Claude Code with Sonnet for architecture and planning
Daily coding: Use Codex for most code generation and refactoring
Simple queries: Use Claude API with Haiku (pay per use)
Deep debugging: Use Claude Code with Sonnet during off-peak hours
Research: Use Claude web interface sparingly (it counts against my quota)

This hybrid approach costs me about $120/month ($100 Claude Max + $20 Codex) but gives me the equivalent of unlimited usage. I get the deep reasoning when I need it, and the high-volume code generation for daily work.

The Core Problem: Subscription Model Mismatch

The fundamental issue is that Claude Code’s subscription model doesn’t match developer workflows.

Developers don’t work in neat, predictable 8-hour blocks. We have bursty patterns:

Long coding marathons on weekends
Meeting-heavy weekdays with light coding
Periods of intense debugging followed by quiet periods
On-call incidents requiring sudden extended sessions

A subscription model with daily limits and no rollover penalizes these patterns. Meanwhile, Codex’s more generous limits accommodate developer reality.

What Anthropic Should Do

Rollover tokens: Unused tokens should carry over for at least a week
Transparent throttling: Publish the peak hours schedule so developers can plan
Model-aware limits: Separate quotas for different models, not a shared pool
Developer tier: A $50/month plan optimized for coding (not general web usage)
Usage analytics: Show exactly where tokens are going

Until then, the $100 Max plan will continue to feel restrictive compared to alternatives.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Anthropic Pricing Page
👨‍💻 Reddit Discussion: Claude Code Usage Limits

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!