Skip to content

How to Build Opus-Sonnet-Haiku Multi-Model Orchestration

Purpose

I needed to build an AI system that could handle complex workflows without burning my budget. A single-model approach meant either overspending on simple tasks or getting poor results on complex ones. The solution: a hierarchical multi-model architecture.

The Architecture

+------------------+
| OPUS TIER |
| Orchestrator |
+--------+---------+
|
+--------------+--------------+
| |
+---------v----------+ +----------v---------+
| SONNET TIER | | SONNET TIER |
| Executor #1 | | Executor #2 |
+---------+----------+ +----------+---------+
| |
+---------v----------+ +----------v---------+
| HAIKU SWARM | | HAIKU SWARM |
| Sub-Agents | | Sub-Agents |
| [H][H][H][H][H] | | [H][H][H][H][H] |
+--------------------+ +--------------------+

Opus plans and coordinates. Sonnet executes complex tasks. Haiku handles high-volume narrow operations.

What Each Tier Does

Opus (Orchestrator)

Opus handles strategic decisions:

  • Analyze complex problems and decompose into subtasks
  • Create execution plans with dependencies
  • Route tasks to appropriate executors
  • Synthesize results from multiple agents

Sonnet (Executor)

Sonnet does the heavy lifting:

  • Execute plans created by Opus
  • Implement features from specifications
  • Debug and fix issues requiring reasoning
  • Generate content requiring nuance

Haiku (Sub-Agent)

Haiku swarms handle volume:

  • Execute narrow, well-defined operations
  • Process large batches in parallel
  • Perform validation and classification
  • Extract and transform data

How to Implement It

Here’s a LangGraph implementation:

multi_model_orchestrator.py
from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Annotated
import operator
import asyncio
class OrchestratorState(TypedDict):
task: str
plan: dict
subtasks: List[dict]
results: Annotated[List[dict], operator.add]
final_output: dict
def build_multi_model_graph():
graph = StateGraph(OrchestratorState)
# Opus creates the plan
async def opus_orchestrator(state: OrchestratorState):
plan = await opus_client.generate(
f"""Analyze this task and create an execution plan:
{state['task']}
Output JSON with:
- subtasks: list of {{"task": str, "assignee": "sonnet"|"haiku", "priority": int}}
- coordination_strategy: "parallel" | "sequential" | "hybrid"
"""
)
return {"plan": plan, "subtasks": plan["subtasks"]}
# Sonnet executes complex tasks
async def sonnet_executor(state: OrchestratorState):
sonnet_tasks = [t for t in state["subtasks"] if t["assignee"] == "sonnet"]
# Execute in parallel (max 5 concurrent)
results = await asyncio.gather(*[
sonnet_client.execute(task["task"])
for task in sonnet_tasks[:5]
])
return {"results": [{"task": t["task"], "result": r}
for t, r in zip(sonnet_tasks, results)]}
# Haiku handles high-volume tasks
async def haiku_swarm(state: OrchestratorState):
haiku_tasks = [t for t in state["subtasks"] if t["assignee"] == "haiku"]
# Execute in parallel (max 50 concurrent)
results = await asyncio.gather(*[
haiku_client.execute(task["task"])
for task in haiku_tasks[:50]
])
return {"results": [{"task": t["task"], "result": r}
for t, r in zip(haiku_tasks, results)]}
# Opus synthesizes final output
async def opus_synthesizer(state: OrchestratorState):
final = await opus_client.generate(
f"""Synthesize these results:
Original task: {state['task']}
Results: {state['results']}
Create a comprehensive response.
"""
)
return {"final_output": final}
# Build the graph
graph.add_node("orchestrator", opus_orchestrator)
graph.add_node("sonnet_executor", sonnet_executor)
graph.add_node("haiku_swarm", haiku_swarm)
graph.add_node("synthesizer", opus_synthesizer)
# Define edges
graph.set_entry_point("orchestrator")
graph.add_edge("orchestrator", "sonnet_executor")
graph.add_edge("orchestrator", "haiku_swarm")
graph.add_edge("sonnet_executor", "synthesizer")
graph.add_edge("haiku_swarm", "synthesizer")
graph.add_edge("synthesizer", END)
return graph.compile()

Task Router

Route tasks based on complexity:

task_router.py
class MultiModelRouter:
async def classify_task(self, task: str) -> str:
"""Use Haiku to classify task complexity."""
classification = await self.haiku.generate(
f"""Classify this task's complexity:
Task: {task}
Rules:
- "haiku": Narrow, well-defined, explicit instructions
- "sonnet": Moderate complexity, needs context
- "opus": Complex reasoning, architectural decisions
Output JSON: {{"tier": str, "reasoning": str}}
"""
)
return classification["tier"]
async def route(self, task: str) -> str:
tier = await self.classify_task(task)
if tier == "opus":
return await self.opus.generate(task)
elif tier == "sonnet":
return await self.sonnet.generate(task)
else:
return await self.haiku.generate(task)

Cost Comparison

The savings compound quickly:

Example: Research Report Generation
Traditional (Single Opus):
- Cost: $15.00 for 100k input + 20k output tokens
- Time: 45 minutes sequential
Multi-Model Orchestration:
- Opus (planning): 10k tokens = $1.50
- Sonnet (drafting): 30k tokens = $0.90
- Haiku swarm (data gathering): 200k tokens = $2.00
- Opus (synthesis): 15k tokens = $2.25
- Total: $6.65 (56% savings)
- Time: 15 minutes with parallel execution

Common Mistakes

Mistake 1: Opus Doing Grunt Work

# WRONG: Opus classifying reviews
def analyze_sentiment(reviews):
return opus.batch_classify(reviews, ["positive", "negative"])
# RIGHT: Opus plans, Haiku executes
async def analyze_sentiment(reviews):
criteria = await opus.define_criteria("Create sentiment classification rules")
results = await haiku_swarm.classify(reviews, criteria=criteria)
return results

Mistake 2: Sequential When Parallel Works

# WRONG: Sequential processing
results = []
for doc in documents:
results.append(haiku.extract(doc))
# RIGHT: Parallel processing
results = await asyncio.gather(*[
haiku.extract(doc) for doc in documents
])

Summary

In this post, I showed how to build multi-model orchestration with Opus, Sonnet, and Haiku. The key point is: Opus orchestrates, Sonnet executes, Haiku handles volume. Build systems that route intelligently between all three based on task complexity.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments