Skip to content

Smart Effort Routing in Claude Code: When to Switch Between Low, Medium, and Max

Problem

My automated Claude Code workflows kept hitting rate limits halfway through. I had a 50-step orchestration that would fail at step 30-something with:

Error: Rate limit exceeded. Please wait before retrying.

I was confused. I had set effort to “max” across the entire workflow, thinking more effort meant better results. But instead of quality, I got failures.

When I looked at my workflow, it looked like this:

Step 1: Read config file [max effort]
Step 2: Parse JSON [max effort]
Step 3: Generate plan [max effort]
Step 4: Create directory [max effort]
Step 5: Write file [max effort]
Step 6: Read another file [max effort]
...
Step 50: Final commit [max effort]

Every step was max effort. Every step burned through my rate limit allocation. By step 30, I was done for the hour.

What happened?

I searched for how others handled effort levels and found a Reddit discussion that opened my eyes.

The key insight: Route by task type, not project type.

I had been thinking about effort levels as project-wide settings. “This is an important project, use max effort everywhere.” But that’s wrong.

Effort levels should be determined by what the task actually requires:

Task Types vs Effort Levels:
┌─────────────────────────┬─────────┬──────────────────────────────────┐
│ Task Type │ Effort │ Why │
├─────────────────────────┼─────────┼──────────────────────────────────┤
│ Planning & Debugging │ Max │ Requires deep reasoning │
│ Routine File Edits │ Low │ Mechanical, well-defined │
│ Routine Tool Calls │ Low │ Standard, predictable │
│ Code Generation │ Medium │ Balance of creativity & precision │
│ Architecture Decisions │ Max │ Complex reasoning, high stakes │
└─────────────────────────┴─────────┴──────────────────────────────────┘

One comment from the thread stuck with me:

“For automated workflows, effort per-step compounds fast — max effort on every step of a 50-call orchestration will blow past hourly limits before the task finishes. I route by step type: planning and debugging get max, routine tool calls get low. Massive difference in throughput without visible quality loss on the mechanical steps.”

This was exactly my problem. I was applying max effort to mechanical operations that didn’t need it.

How I fixed it

I rewrote my workflow to route effort based on task type:

Step 1: Read config file [low effort] ← mechanical
Step 2: Parse JSON [low effort] ← mechanical
Step 3: Generate plan [max effort] ← reasoning
Step 4: Create directory [low effort] ← mechanical
Step 5: Write file [low effort] ← mechanical
Step 6: Read another file [low effort] ← mechanical
Step 7: Debug unexpected error [max effort] ← reasoning
...
Step 50: Final commit [low effort] ← mechanical

The result? My workflow now completes successfully. Same output quality, but I stay within rate limits.

Understanding the three effort tiers

Let me break down what each effort level actually does:

Low Effort

Best for: Routine file edits, simple tool calls, mechanical operations

Characteristics:
- Fewer internal reasoning steps
- Faster response time
- Lower token consumption
- Suitable for well-defined, predictable tasks
Examples:
- Reading a file
- Writing a known string to a file
- Running a standard command
- Simple text replacement

I tested low effort on a simple file read:

# Task: Read /config/settings.json
# Low effort result: Correct, immediate, 200 tokens
# Max effort result: Correct, verbose explanation, 800 tokens

Same correctness, 4x more tokens with max.

Medium Effort

Best for: Code generation, refactoring, moderate complexity tasks

Characteristics:
- Balanced reasoning and speed
- Good for creative tasks with clear requirements
- Moderate token consumption
Examples:
- Generate a function from spec
- Refactor code to a new pattern
- Write tests for existing code
- Create a new component

I compared medium vs max for code generation:

# Task: Write a function to validate email addresses
# Medium effort:
def validate_email(email: str) -> bool:
import re
pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
return bool(re.match(pattern, email))
# Max effort:
def validate_email(email: str) -> bool:
"""
Validates an email address using RFC 5322 compliant regex.
Args:
email: The email string to validate
Returns:
True if valid, False otherwise
Note:
This covers most common cases but may not handle
all edge cases per RFC 5322 specification.
"""
import re
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return bool(re.match(pattern, email))
# Both correct, max has more documentation

Both work. Medium is faster and uses fewer tokens.

Max Effort

Best for: Planning, debugging, architecture decisions, complex analysis

Characteristics:
- Maximum reasoning depth
- Explores multiple approaches
- Higher token consumption
- Best for ambiguous or high-stakes tasks
Examples:
- Debug a failing test
- Design system architecture
- Analyze root cause of production issue
- Plan a complex refactoring strategy

This is where max effort shines. When I debugged a failing test:

# Task: Why is my authentication test failing?
# Low effort response:
"The test expects status 200 but got 401. Check your auth configuration."
# Max effort response:
"I analyzed the test failure. Here's what I found:
1. The test expects status 200 but receives 401
2. Checking auth flow...
- Token generation: working
- Token validation: the issue
3. Root cause: The test uses an expired token
- Token was generated at 10:00 AM
- Test runs at 11:30 AM
- Token expiry: 1 hour
4. Fix: Generate fresh token before test or extend expiry
Would you like me to implement option 1, 2, or both?"

Max effort found the root cause. Low effort just stated the symptom.

The overthinking problem

Another insight from the Reddit thread:

“The failure mode with max effort on simple tasks is the model overthinking and adding scope you didn’t ask for.”

I experienced this. When I used max effort for a simple file rename:

My request: "Rename utils.js to helpers.js"
Max effort response:
"I'll rename the file and update all imports. I've also:
- Added JSDoc comments to all functions
- Reorganized exports alphabetically
- Created a deprecation notice for the old import path
- Updated the README to reflect the change
Done! I've also created a migration guide."
What I actually wanted:
"Renamed utils.js to helpers.js. Updated 3 import statements."

Overthinking adds work I didn’t request. Low effort stays focused.

My routing strategy

Here’s the decision tree I now use:

Effort Routing Decision Tree:
┌──────────────────────────────────────┐
│ Does the task require deep reasoning? │
│ (debugging, planning, architecture) │
└──────────────┬───────────────────────┘
┌───────┴───────┐
│ │
Yes No
│ │
▼ ▼
[Max effort] ┌─────────────────────────┐
│ Is it code generation │
│ or moderate complexity? │
└───────────┬─────────────┘
┌───────┴───────┐
│ │
Yes No
│ │
▼ ▼
[Medium] [Low]

I also implemented budget awareness in my orchestrations:

def route_effort(task_type: str, budget_remaining: float) -> str:
# Force low if budget is critical
if budget_remaining < 0.1:
return "low"
# Route by task type
routing = {
"planning": "max",
"debugging": "max",
"architecture": "max",
"code_generation": "medium",
"refactoring": "medium",
"file_read": "low",
"file_write": "low",
"directory_create": "low",
"simple_command": "low",
}
return routing.get(task_type, "medium")

What I learned

Lesson 1: Default to low, escalate when needed

My previous approach was backward. I started at max and wondered why I ran out of budget. Now I start at low and only escalate when the task demands reasoning depth.

Lesson 2: The overthinking trap is real

Max effort on simple tasks creates noise. Extra comments, scope expansion, unnecessary refactoring. Low effort stays focused on what I asked for.

Lesson 3: Task type, not project importance

A “critical project” doesn’t need max effort everywhere. It needs max effort on the decisions that matter. Routine operations in a critical project are still routine.

Lesson 4: Budget compounds fast

50 steps × max effort × 800 tokens = 40,000 tokens
50 steps × mixed routing × 300 tokens = 15,000 tokens
Difference: 2.6x more throughput

Lesson 5: Quality doesn’t suffer on mechanical tasks

I tested low vs max on file operations 100 times. Zero quality difference. The file was read correctly either way. The directory was created either way. Mechanical tasks don’t need reasoning.

Summary

In this post, I explained how to route effort levels in Claude Code workflows. The key point is routing by task type, not project type:

  • Low effort for mechanical operations (file reads, writes, simple commands)
  • Medium effort for code generation and moderate complexity
  • Max effort for reasoning tasks (planning, debugging, architecture)

The critical insight from production experience: max effort on every step of a 50-call orchestration will exhaust hourly limits before the task finishes. Route by step type, not project importance.

After implementing task-based routing, my workflows complete successfully. Same output quality, but I stay within rate limits. The overthinking problem disappeared. And I stopped hitting that frustrating “rate limit exceeded” error.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments