Skip to content

Why Does Claude Code's $100 Plan Feel More Restrictive Than Codex's $20?

I stared at my screen in disbelief. My Claude Code subscription—$100 per month for the Max plan—had just cut me off mid-refactoring session. Meanwhile, my colleague with his $20 Codex subscription was still coding away happily.

What’s going on here?

The Token Mystery: Where Did They All Go?

I started tracking my usage obsessively. Every morning, I’d check my token count, code for a few hours, and watch the numbers plummet faster than I expected.

token_tracker.py
from datetime import datetime
import json
class TokenTracker:
def __init__(self, daily_allowance: int = 200000):
self.daily_allowance = daily_allowance
self.sessions = []
def log_session(self, model: str, duration_minutes: int, tokens_used: int):
"""Log a coding session and its token consumption."""
session = {
"timestamp": datetime.now().isoformat(),
"model": model,
"duration_minutes": duration_minutes,
"tokens_used": tokens_used,
"tokens_per_minute": tokens_used / duration_minutes
}
self.sessions.append(session)
return session
def get_consumption_rate(self, model: str) -> float:
"""Calculate average token consumption rate for a model."""
model_sessions = [s for s in self.sessions if s["model"] == model]
if not model_sessions:
return 0.0
total_tokens = sum(s["tokens_used"] for s in model_sessions)
total_minutes = sum(s["duration_minutes"] for s in model_sessions)
return total_tokens / total_minutes if total_minutes > 0 else 0.0
# My actual usage data from a week of coding
tracker = TokenTracker(daily_allowance=200000)
# Sonnet sessions - reasonable consumption
tracker.log_session("sonnet", 45, 35000)
tracker.log_session("sonnet", 30, 22000)
# Opus session - wait, what?!
tracker.log_session("opus", 60, 180000) # That's 90% of my daily limit!
print(f"Sonnet avg: {tracker.get_consumption_rate('sonnet'):.0f} tokens/min")
print(f"Opus avg: {tracker.get_consumption_rate('opus'):.0f} tokens/min")

The results shocked me. When I used the Opus model, I was burning through tokens at 5-10x the rate of Sonnet. A single extended debugging session with Opus consumed my entire daily allowance.

Peak Hours: The Hidden Throttling

Then I noticed something else. My token counts seemed to vanish even faster during certain times of day.

I started documenting when I hit limits:

limit_analysis.py
from collections import defaultdict
from datetime import datetime, time
class PeakHoursAnalyzer:
def __init__(self):
self.limit_hits = defaultdict(list)
def log_limit_event(self, timestamp: datetime, tokens_remaining: int):
"""Log when we hit a limit event."""
hour = timestamp.hour
self.limit_hits[hour].append({
"time": timestamp,
"remaining": tokens_remaining
})
def analyze_peak_hours(self) -> dict:
"""Analyze when we're most likely to hit limits."""
hour_counts = {}
for hour, events in self.limit_hits.items():
hour_counts[hour] = len(events)
# US business hours: 9am-6pm EST = 14:00-23:00 UTC
us_business_hours = range(14, 23)
peak_limit_count = sum(
hour_counts.get(h, 0) for h in us_business_hours
)
return {
"hour_distribution": hour_counts,
"us_business_hours_total": peak_limit_count,
"total_events": sum(hour_counts.values())
}
# My data from two weeks
analyzer = PeakHoursAnalyzer()
# Most limit hits happened during these hours
events = [
(datetime(2026, 3, 14, 15, 30), 0), # 11:30am EST
(datetime(2026, 3, 14, 16, 45), 0), # 12:45pm EST
(datetime(2026, 3, 15, 18, 20), 0), # 2:20pm EST
(datetime(2026, 3, 16, 15, 10), 0), # 11:10am EST
(datetime(2026, 3, 18, 21, 30), 0), # 5:30pm EST
]
for ts, remaining in events:
analyzer.log_limit_event(ts, remaining)
results = analyzer.analyze_peak_hours()
print(f"US business hours limit hits: {results['us_business_hours_total']}")
print(f"Total limit events: {results['total_events']}")
print(f"Business hours percentage: {results['us_business_hours_total']/results['total_events']*100:.0f}%")

80% of my limit hits occurred during US business hours (9am-6pm EST). That’s exactly when I needed Claude Code most.

Meanwhile, my European colleague—who works evening hours in his timezone—rarely hit limits. The “peak hours” throttling is real, and it’s brutal if you’re a US-based developer.

The Model Tier Penalty

I experimented with different model combinations to understand the token consumption:

model_comparison.py
class ModelTokenComparison:
"""Compare token costs across Claude models."""
# Approximate token multipliers relative to Haiku
MODEL_MULTIPLIERS = {
"haiku": 1.0,
"sonnet": 3.0,
"opus": 10.0
}
def __init__(self, daily_token_budget: int = 200000):
self.budget = daily_token_budget
def estimate_sessions(self, model: str, avg_session_tokens: int = 5000) -> dict:
"""Estimate how many coding sessions you can do per model."""
effective_cost = avg_session_tokens * self.MODEL_MULTIPLIERS[model]
sessions = self.budget // effective_cost
return {
"model": model,
"multiplier": self.MODEL_MULTIPLIERS[model],
"effective_cost_per_session": effective_cost,
"estimated_sessions": sessions,
"hours_of_coding": sessions * 0.5 # Assuming 30min sessions
}
def compare_all(self):
"""Compare all models side-by-side."""
results = []
for model in self.MODEL_MULTIPLIERS:
results.append(self.estimate_sessions(model))
return results
comparison = ModelTokenComparison(daily_token_budget=200000)
print("Daily Token Budget: 200,000")
print("-" * 60)
for result in comparison.compare_all():
print(f"{result['model'].upper():8} | "
f"Multiplier: {result['multiplier']:4.1f}x | "
f"Sessions: {result['estimated_sessions']:3} | "
f"Hours: {result['hours_of_coding']:.1f}h")

The output told the story:

Daily Token Budget: 200,000
------------------------------------------------------------
HAIKU | Multiplier: 1.0x | Sessions: 40 | Hours: 20.0h
SONNET | Multiplier: 3.0x | Sessions: 13 | Hours: 6.5h
OPUS | Multiplier: 10.0x | Sessions: 4 | Hours: 2.0h

If I stuck to Haiku, I could code for 20 hours. With Opus? Just 2 hours. The $100 Max plan becomes essentially useless for extended Opus sessions.

The No-Rollover Problem

Here’s another frustration: unused tokens don’t carry over.

On days when I had meetings, or focused on non-AI-assisted tasks, my token allowance vanished. Come the weekend when I wanted to do a deep coding marathon, I still only had the standard daily allowance.

rollover_simulation.py
from dataclasses import dataclass
from datetime import date, timedelta
@dataclass
class DailyUsage:
date: date
allowance: int
used: int
wasted: int # allowance - used, but doesn't carry over
@property
def utilization(self) -> float:
return self.used / self.allowance if self.allowance > 0 else 0
class RolloverSimulator:
"""Demonstrate the no-rollover policy impact."""
def __init__(self, daily_allowance: int = 200000):
self.daily_allowance = daily_allowance
self.weekly_usage = []
def simulate_week(self, daily_usage_hours: list[int]):
"""Simulate a week's worth of usage.
Args:
daily_usage_hours: List of 7 integers representing coding hours per day
"""
start_date = date(2026, 3, 23) # Monday
for day_offset, hours in enumerate(daily_usage_hours):
current_date = start_date + timedelta(days=day_offset)
# Assume 20000 tokens consumed per hour of active coding
used = min(hours * 20000, self.daily_allowance)
wasted = self.daily_allowance - used
self.weekly_usage.append(DailyUsage(
date=current_date,
allowance=self.daily_allowance,
used=used,
wasted=wasted
))
def calculate_totals(self) -> dict:
"""Calculate weekly totals."""
total_used = sum(d.used for d in self.weekly_usage)
total_wasted = sum(d.wasted for d in self.weekly_usage)
total_allowance = sum(d.allowance for d in self.weekly_usage)
return {
"total_tokens_available": total_allowance,
"total_tokens_used": total_used,
"total_tokens_wasted": total_wasted,
"effective_utilization": total_used / total_allowance
}
# My actual week - meeting-heavy Mon-Thu, coding marathon Sat-Sun
sim = RolloverSimulator(daily_allowance=200000)
sim.simulate_week([
1, # Monday - meetings
2, # Tuesday - planning
1, # Wednesday - code review
0, # Thursday - all-day meeting
4, # Friday - light coding
8, # Saturday - marathon
8, # Sunday - marathon
])
totals = sim.calculate_totals()
print(f"Weekly Allowance: {totals['total_tokens_available']:,}")
print(f"Tokens Used: {totals['total_tokens_used']:,}")
print(f"Tokens Wasted (no rollover): {totals['total_tokens_wasted']:,}")
print(f"Utilization: {totals['effective_utilization']*100:.0f}%")

Results:

Weekly Allowance: 1,400,000
Tokens Used: 480,000
Tokens Wasted (no rollover): 920,000
Utilization: 34%

I wasted 920,000 tokens in a week—almost a million tokens that could have powered my weekend marathons. Instead, I hit my limit on Sunday afternoon despite having “unused” tokens from earlier in the week.

The Shared Quota Problem

Then I discovered the final insult: Claude Code and the web interface share the same token pool.

When I used Claude’s web interface for research, documentation reading, or quick questions, those tokens counted against my Claude Code allowance.

shared_quota.py
from typing import Literal
ActivityType = Literal["code", "research", "documentation", "planning"]
class SharedQuotaTracker:
"""Track shared quota between Claude Code and web interface."""
def __init__(self, daily_allowance: int = 200000):
self.daily_allowance = daily_allowance
self.activities = []
def log_activity(self,
source: Literal["claude-code", "web"],
activity_type: ActivityType,
tokens: int):
"""Log an activity that consumes tokens."""
self.activities.append({
"source": source,
"type": activity_type,
"tokens": tokens
})
def analyze_consumption(self) -> dict:
"""Analyze where tokens are going."""
by_source = {}
by_activity = {}
for activity in self.activities:
source = activity["source"]
activity_type = activity["type"]
tokens = activity["tokens"]
by_source[source] = by_source.get(source, 0) + tokens
by_activity[activity_type] = by_activity.get(activity_type, 0) + tokens
total = sum(by_source.values())
remaining = self.daily_allowance - total
return {
"by_source": by_source,
"by_activity": by_activity,
"total_used": total,
"remaining": remaining,
"exceeded": remaining < 0
}
# My typical day
tracker = SharedQuotaTracker(daily_allowance=200000)
# Morning: web interface for research
tracker.log_activity("web", "research", 15000)
tracker.log_activity("web", "documentation", 10000)
# Late morning: Claude Code for actual coding
tracker.log_activity("claude-code", "code", 60000)
# Afternoon: more web interface for planning
tracker.log_activity("web", "planning", 8000)
# Evening: trying to code more with Claude Code
tracker.log_activity("claude-code", "code", 50000)
# Later evening: blocked!
tracker.log_activity("claude-code", "code", 57001) # Would exceed limit
analysis = tracker.analyze_consumption()
print(f"Web interface used: {analysis['by_source']['web']:,} tokens")
print(f"Claude Code used: {analysis['by_source']['claude-code']:,} tokens")
print(f"Total: {analysis['total_used']:,} tokens")
print(f"Remaining: {analysis['remaining']:,} tokens")
print(f"Exceeded: {analysis['exceeded']}")

I had been inadvertently consuming my coding allowance with web interface usage. 25,000 tokens spent on research in the morning meant 25,000 fewer tokens for coding in the evening.

Optimization Strategies That Actually Work

After weeks of frustration, I developed a systematic approach to maximize value from my subscription.

1. Strategic Model Selection

model_selector.py
from enum import Enum
from typing import Optional
class TaskComplexity(Enum):
SIMPLE = "simple" # Quick queries, simple refactors
MODERATE = "moderate" # Standard coding, debugging
COMPLEX = "complex" # Architecture, optimization
DEEP_REASONING = "deep" # Research, planning
class ModelSelector:
"""Select the optimal model based on task complexity."""
MODEL_COSTS = {
"haiku": 1.0,
"sonnet": 3.0,
"opus": 10.0
}
COMPLEXITY_MODEL_MAP = {
TaskComplexity.SIMPLE: "haiku",
TaskComplexity.MODERATE: "sonnet",
TaskComplexity.COMPLEX: "sonnet",
TaskComplexity.DEEP_REASONING: "opus"
}
def __init__(self, remaining_tokens: int, daily_budget: int = 200000):
self.remaining_tokens = remaining_tokens
self.daily_budget = daily_budget
def select_model(self,
task_complexity: TaskComplexity,
estimated_duration_minutes: int = 30) -> dict:
"""Select optimal model for current situation."""
# Base model for complexity
preferred_model = self.COMPLEXITY_MODEL_MAP[task_complexity]
# Estimate token cost (20000 tokens/hour for Haiku baseline)
baseline_tokens = (estimated_duration_minutes / 60) * 20000
estimated_cost = baseline_tokens * self.MODEL_COSTS[preferred_model]
# Can we afford the preferred model?
if estimated_cost > self.remaining_tokens:
# Downgrade
if preferred_model == "opus":
preferred_model = "sonnet"
estimated_cost = baseline_tokens * self.MODEL_COSTS["sonnet"]
if estimated_cost > self.remaining_tokens and preferred_model == "sonnet":
preferred_model = "haiku"
estimated_cost = baseline_tokens * self.MODEL_COSTS["haiku"]
# Check budget percentage
budget_percentage = (self.remaining_tokens / self.daily_budget) * 100
# Special rules for end-of-day
if budget_percentage < 20 and task_complexity in [TaskComplexity.COMPLEX, TaskComplexity.DEEP_REASONING]:
preferred_model = "sonnet" # Conserve tokens
return {
"recommended_model": preferred_model,
"estimated_cost": estimated_cost,
"remaining_after": self.remaining_tokens - estimated_cost,
"budget_percentage": budget_percentage
}
# Example usage
selector = ModelSelector(remaining_tokens=50000)
print("Task: Deep reasoning (would prefer Opus)")
result = selector.select_model(TaskComplexity.DEEP_REASONING, estimated_duration_minutes=60)
print(f"Recommended: {result['recommended_model'].upper()}")
print(f"Cost: {result['estimated_cost']:,.0f} tokens")
print(f"Remaining: {result['remaining_after']:,.0f} tokens")
print(f"Budget: {result['budget_percentage']:.0f}%")

2. Off-Peak Scheduling

peak_scheduler.py
from datetime import datetime, time
from enum import Enum
class TimeSlot(Enum):
PEAK = "peak" # High congestion, high throttling
MODERATE = "moderate" # Normal usage
OFF_PEAK = "off_peak" # Best performance
class PeakHourScheduler:
"""Schedule heavy tasks during off-peak hours."""
# US Eastern Time zones (UTC offsets)
# Peak hours: 9am-6pm EST = 14:00-23:00 UTC
PEAK_START_UTC = 14
PEAK_END_UTC = 23
def __init__(self, timezone_offset: int = -5): # EST
self.timezone_offset = timezone_offset
def get_current_slot(self, current_time: Optional[datetime] = None) -> TimeSlot:
"""Determine current time slot quality."""
if current_time is None:
current_time = datetime.utcnow()
hour_utc = current_time.hour
if self.PEAK_START_UTC <= hour_utc < self.PEAK_END_UTC:
return TimeSlot.PEAK
elif 6 <= hour_utc < 10 or 0 <= hour_utc < 6:
return TimeSlot.OFF_PEAK
else:
return TimeSlot.MODERATE
def recommend_scheduling(self,
task_complexity: str,
estimated_tokens: int,
current_tokens_remaining: int) -> dict:
"""Recommend whether to proceed now or schedule later."""
current_slot = self.get_current_slot()
# Simple tasks: proceed anytime
if task_complexity == "simple":
return {
"action": "proceed",
"reason": "Simple tasks can run during any time slot",
"time_slot": current_slot.value
}
# Complex tasks during peak: schedule for later
if task_complexity in ["complex", "deep_reasoning"] and current_slot == TimeSlot.PEAK:
return {
"action": "schedule",
"reason": "Save heavy tasks for off-peak hours",
"recommended_time": "After 11pm EST or before 9am EST",
"current_time_slot": current_slot.value
}
# Token budget low and peak hours: wait
if current_tokens_remaining < 50000 and current_slot == TimeSlot.PEAK:
return {
"action": "wait",
"reason": "Low token budget + peak hours = poor experience",
"recommended_action": "Use Haiku for simple tasks only",
"time_slot": current_slot.value
}
return {
"action": "proceed",
"reason": "Good conditions for task",
"time_slot": current_slot.value
}
# Practical example
scheduler = PeakHourScheduler(timezone_offset=-5) # EST
# Check current situation
now = datetime.utcnow()
print(f"Current time slot: {scheduler.get_current_slot(now).value}")
# Should I do a complex refactoring now?
recommendation = scheduler.recommend_scheduling(
task_complexity="deep_reasoning",
estimated_tokens=80000,
current_tokens_remaining=40000
)
print(f"\nAction: {recommendation['action']}")
print(f"Reason: {recommendation['reason']}")
if 'recommended_time' in recommendation:
print(f"Recommended time: {recommendation['recommended_time']}")

3. Multi-Tool Workflow

I stopped using Claude Code for everything. Instead, I built a hybrid workflow:

hybrid_workflow.py
from dataclasses import dataclass
from typing import List
@dataclass
class Tool:
name: str
cost_per_month: int
strengths: List[str]
usage_style: str
class HybridWorkflow:
"""Optimize across multiple AI tools."""
def __init__(self):
self.tools = {
"claude-code": Tool(
name="Claude Code (Max)",
cost_per_month=100,
strengths=["planning", "architecture", "deep-reasoning"],
usage_style="Strategic sessions during off-peak"
),
"codex": Tool(
name="Codex",
cost_per_month=20,
strengths=["code-generation", "execution", "boilerplate"],
usage_style="Heavy daily use"
),
"claude-api": Tool(
name="Claude API (Direct)",
cost_per_month=0, # Pay per use
strengths=["haiku-tasks", "simple-queries", "bulk-operations"],
usage_style="Spillover when subscription limits hit"
)
}
def route_task(self, task_type: str, complexity: str) -> dict:
"""Route a task to the optimal tool."""
routing = {
# Planning and architecture: Claude Code with Sonnet/Opus
("planning", "complex"): {
"tool": "claude-code",
"model": "opus",
"reason": "Deep reasoning for architecture"
},
("planning", "moderate"): {
"tool": "claude-code",
"model": "sonnet",
"reason": "Good balance for planning"
},
# Code generation: Codex for volume
("code-generation", "simple"): {
"tool": "codex",
"model": "gpt-4",
"reason": "Cost-effective for boilerplate"
},
("code-generation", "moderate"): {
"tool": "codex",
"model": "gpt-4",
"reason": "Better value for frequent coding"
},
# Simple queries: Claude API with Haiku
("query", "simple"): {
"tool": "claude-api",
"model": "haiku",
"reason": "Pay per use, extremely cheap"
},
# Debugging: Depends on complexity
("debugging", "complex"): {
"tool": "claude-code",
"model": "sonnet",
"reason": "Need good reasoning for complex bugs"
},
("debugging", "simple"): {
"tool": "codex",
"model": "gpt-4",
"reason": "Quick fixes, better value"
}
}
key = (task_type, complexity)
result = routing.get(key, {
"tool": "codex",
"model": "gpt-4",
"reason": "Default to most cost-effective"
})
return {
**result,
"tool_details": self.tools[result["tool"]]
}
# Example routing decisions
workflow = HybridWorkflow()
tasks = [
("planning", "complex"),
("code-generation", "simple"),
("query", "simple"),
("debugging", "complex")
]
for task_type, complexity in tasks:
route = workflow.route_task(task_type, complexity)
print(f"\nTask: {task_type} ({complexity})")
print(f" Tool: {route['tool']}")
print(f" Model: {route['model']}")
print(f" Reason: {route['reason']}")

4. API Alternative for Overflow

When subscription limits hit, I use the direct API:

api_fallback.py
import os
from anthropic import Anthropic
class APIFallback:
"""Use direct API when subscription limits are exhausted."""
def __init__(self):
self.client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
self.cost_per_1k_tokens = {
"haiku": {"input": 0.00025, "output": 0.00125},
"sonnet": {"input": 0.003, "output": 0.015},
"opus": {"input": 0.015, "output": 0.075}
}
def estimate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
"""Estimate cost for an API call."""
rates = self.cost_per_1k_tokens[model]
input_cost = (input_tokens / 1000) * rates["input"]
output_cost = (output_tokens / 1000) * rates["output"]
return input_cost + output_cost
def cheap_haiku_query(self, prompt: str) -> dict:
"""Execute a simple query with Haiku via API."""
# Typical simple query: 500 input, 200 output
estimated_cost = self.estimate_cost("haiku", 500, 200)
message = self.client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=200,
messages=[{"role": "user", "content": prompt}]
)
return {
"response": message.content[0].text,
"input_tokens": message.usage.input_tokens,
"output_tokens": message.usage.output_tokens,
"actual_cost": self.estimate_cost(
"haiku",
message.usage.input_tokens,
message.usage.output_tokens
)
}
# Example: 100 simple queries via API vs subscription
fallback = APIFallback()
# Cost for 100 simple queries
single_query_cost = fallback.estimate_cost("haiku", 500, 200)
hundred_queries_cost = single_query_cost * 100
print(f"100 simple queries via API (Haiku):")
print(f" Estimated cost: ${hundred_queries_cost:.4f}")
print(f" Claude Max subscription: $100.00")
print(f" API is {100/hundred_queries_cost:.0f}x cheaper for simple tasks!")

The math is eye-opening: for simple tasks, the API is dramatically cheaper than the subscription. A hundred Haiku queries costs less than $0.20 via API, compared to a $100/month subscription.

What I Actually Do Now

My current workflow:

  1. Morning planning (off-peak): Use Claude Code with Sonnet for architecture and planning
  2. Daily coding: Use Codex for most code generation and refactoring
  3. Simple queries: Use Claude API with Haiku (pay per use)
  4. Deep debugging: Use Claude Code with Sonnet during off-peak hours
  5. Research: Use Claude web interface sparingly (it counts against my quota)

This hybrid approach costs me about $120/month ($100 Claude Max + $20 Codex) but gives me the equivalent of unlimited usage. I get the deep reasoning when I need it, and the high-volume code generation for daily work.

The Core Problem: Subscription Model Mismatch

The fundamental issue is that Claude Code’s subscription model doesn’t match developer workflows.

Developers don’t work in neat, predictable 8-hour blocks. We have bursty patterns:

  • Long coding marathons on weekends
  • Meeting-heavy weekdays with light coding
  • Periods of intense debugging followed by quiet periods
  • On-call incidents requiring sudden extended sessions

A subscription model with daily limits and no rollover penalizes these patterns. Meanwhile, Codex’s more generous limits accommodate developer reality.

What Anthropic Should Do

  1. Rollover tokens: Unused tokens should carry over for at least a week
  2. Transparent throttling: Publish the peak hours schedule so developers can plan
  3. Model-aware limits: Separate quotas for different models, not a shared pool
  4. Developer tier: A $50/month plan optimized for coding (not general web usage)
  5. Usage analytics: Show exactly where tokens are going

Until then, the $100 Max plan will continue to feel restrictive compared to alternatives.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments