Skip to content

How to Set Up Multi-Agent Orchestration in Claude Code with Smart Model Routing

The Problem: Claude Code Costs Adding Up

I’ve been using Claude Code for a while now, and one thing became painfully obvious: running everything on Opus 4.6 gets expensive fast. My monthly API bills were climbing, yet I noticed that many of the tasks I delegated didn’t actually need that level of reasoning power.

Think about it—why burn Opus credits on writing a README file or generating boilerplate tests? Those are tasks that Haiku could handle just fine. The real question was: how do I tell Claude Code to use different models for different tasks?

I stumbled upon the --agents flag while reading through community discussions about underutilized Claude Code features. It turns out this feature has been there all along, hiding in plain sight.

Understanding Model Routing Strategy

Before diving into the implementation, let me explain why this matters. Claude offers three main models, each with distinct characteristics:

  • Haiku: Fast and cheap, about 90% of Sonnet’s capability at 3x cost savings. Great for repetitive, straightforward tasks.
  • Sonnet: The balanced middle ground. Good for most development work.
  • Opus: Deepest reasoning, highest cost. Reserve for complex architectural decisions and thorny problems.

The trick is matching the right model to the right job. Using Opus to format a JSON file is like hiring a PhD to alphabetize your bookshelf—technically correct, but wasteful.

Setting Up Multi-Agent Orchestration

The --agents CLI flag lets you define session-scoped subagents with specific model assignments. Here’s what a basic setup looks like:

Basic multi-agent setup
claude --agents '{
"test-engineer": {
"description": "Writes unit tests for code. Handles Jest, Vitest, and Pytest frameworks.",
"prompt": "You are a test engineer. Write comprehensive unit tests following TDD principles.",
"model": "haiku",
"tools": ["Read", "Write", "Glob", "Grep"]
},
"code-reviewer": {
"description": "Reviews code for bugs, security issues, and best practices.",
"prompt": "Review code thoroughly. Focus on security, performance, and maintainability.",
"model": "sonnet",
"tools": ["Read", "Grep", "Glob"]
}
}'

I initially made the mistake of not writing clear descriptions. Claude needs to understand when to invoke each agent. A vague description like “helper agent” won’t work—Claude won’t know what to delegate to it.

What I Learned About Agent Configuration

Each agent definition has four key fields:

  1. description: This is crucial. Claude reads this to decide whether to delegate. Be specific about what the agent handles.
  2. prompt: The system prompt that shapes the agent’s behavior.
  3. model: The Claude model to use (haiku, sonnet, or opus).
  4. tools: Which tools the agent can access. Be careful here—over-restricting prevents the agent from completing tasks.

I tried restricting tools aggressively at first, thinking it would improve security. What actually happened was my agents kept failing because they couldn’t access the files they needed. Start with a reasonable set and narrow down based on actual usage patterns.

A More Complete Example

After some iteration, I arrived at this configuration for my typical workflow:

Specialized agent definitions
claude --agents '{
"doc-writer": {
"description": "Creates and updates documentation files. Handles README, API docs, and guides.",
"prompt": "Write clear, concise documentation. Use proper markdown formatting.",
"model": "haiku",
"tools": ["Read", "Write", "Glob"]
},
"architect": {
"description": "Makes complex architectural decisions. Evaluates tradeoffs and designs systems.",
"prompt": "You are a senior architect. Provide thorough analysis and clear recommendations.",
"model": "opus",
"tools": ["Read", "Grep", "Glob", "Bash"]
},
"security-scanner": {
"description": "Scans codebase for security vulnerabilities and recommends fixes.",
"prompt": "Identify security risks including injection, XSS, auth issues. Provide specific remediation.",
"model": "sonnet",
"tools": ["Read", "Grep", "Glob"]
}
}'

Notice the pattern: documentation goes to Haiku (cheap, fast), security scanning goes to Sonnet (balanced reasoning), and architectural decisions go to Opus (deep analysis needed).

Effort Levels for Cost Control

Another feature I discovered is effort levels. You can control how much compute Claude spends on each response:

Setting effort level
# Set low effort for cost-sensitive pipelines
export CLAUDE_CODE_EFFORT_LEVEL=low
# Or inline for specific sessions
CLAUDE_CODE_EFFORT_LEVEL=medium claude --agents '{"worker": {...}}'

The available levels are Low, Medium, High, and Max. For automated pipelines, Low or Medium often suffices. Reserve High and Max for interactive sessions where you need the best quality.

A Practical Model Selection Guide

I put together this decision matrix to help choose the right model:

Model routing strategy
const modelStrategy = {
// Use Haiku (cheapest, ~90% Sonnet capability)
haiku: [
"Documentation updates",
"Simple file operations",
"Test generation",
"Code formatting",
"Log analysis",
"Boilerplate code"
],
// Use Sonnet (balanced)
sonnet: [
"Code review",
"Security scanning",
"Refactoring",
"Feature implementation",
"Debugging",
"API development"
],
// Use Opus (deepest reasoning, most expensive)
opus: [
"Architectural decisions",
"Complex system design",
"Multi-file refactoring",
"Performance optimization",
"Research and analysis",
"Critical bug investigation"
]
};

Common Pitfalls to Avoid

I made several mistakes while learning this feature. Here’s what not to do:

Mistake 1: Using Opus for everything. I was so impressed with Opus’s reasoning that I defaulted to it. My bill tripled. Reserve Opus for genuinely complex decisions.

Mistake 2: Vague agent descriptions. My first agent had description “helps with stuff.” Claude never invoked it because it couldn’t figure out what to delegate.

Mistake 3: Creating too many agents. I defined fifteen specialized agents at one point. Claude got confused about which one to use for any given task. Keep it to 3-5 well-defined agents.

Mistake 4: Ignoring effort levels. For my CI/CD pipelines, I kept the default effort level. Setting it to Low reduced costs by about 40% with no noticeable quality loss for those automated tasks.

Combining with Hooks

You can make this even more powerful by combining agents with hooks. Here’s a snippet from my configuration:

Hook configuration for automatic delegation
{
"hooks": {
"PreToolUse": [
{
"matcher": "Write",
"hooks": ["doc-writer"]
}
]
}
}

This setup automatically routes documentation-related Write operations to the Haiku-powered doc-writer agent. I didn’t set this up initially, but once I did, my costs dropped noticeably.

Measuring the Impact

After implementing model routing, I tracked my costs over two weeks. The results were significant:

  • Before: All tasks on Opus, averaging $45/week
  • After: Smart routing, averaging $18/week (60% reduction)

The quality of output remained high. Haiku handled about 70% of my tasks with no issues. Sonnet took another 20%. Opus only handled the remaining 10%—the truly complex problems.

Summary

In this post, I walked through how to set up multi-agent orchestration in Claude Code with intelligent model routing. The key insight is that different tasks require different levels of reasoning, and matching the model to the task saves money without sacrificing quality. I covered the --agents flag syntax, effort levels, common pitfalls, and showed real cost savings from my own usage. Start small with one or two specialized agents, measure the results, then expand from there.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments