How to Build Opus-Sonnet-Haiku Multi-Model Orchestration

Mar 19, 2026

Purpose

I needed to build an AI system that could handle complex workflows without burning my budget. A single-model approach meant either overspending on simple tasks or getting poor results on complex ones. The solution: a hierarchical multi-model architecture.

The Architecture

                    +------------------+
                    |   OPUS TIER      |
                    |   Orchestrator   |
                    +--------+---------+
                             |
              +--------------+--------------+
              |                             |
    +---------v----------+       +----------v---------+
    |   SONNET TIER      |       |   SONNET TIER      |
    |   Executor #1      |       |   Executor #2      |
    +---------+----------+       +----------+---------+
              |                             |
    +---------v----------+       +----------v---------+
    |   HAIKU SWARM      |       |   HAIKU SWARM      |
    |   Sub-Agents       |       |   Sub-Agents       |
    |  [H][H][H][H][H]   |       |  [H][H][H][H][H]   |
    +--------------------+       +--------------------+

Opus plans and coordinates. Sonnet executes complex tasks. Haiku handles high-volume narrow operations.

What Each Tier Does

Opus (Orchestrator)

Opus handles strategic decisions:

Analyze complex problems and decompose into subtasks
Create execution plans with dependencies
Route tasks to appropriate executors
Synthesize results from multiple agents

Sonnet (Executor)

Sonnet does the heavy lifting:

Execute plans created by Opus
Implement features from specifications
Debug and fix issues requiring reasoning
Generate content requiring nuance

Haiku (Sub-Agent)

Haiku swarms handle volume:

Execute narrow, well-defined operations
Process large batches in parallel
Perform validation and classification
Extract and transform data

How to Implement It

Here’s a LangGraph implementation:

from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Annotated
import operator
import asyncio

class OrchestratorState(TypedDict):
    task: str
    plan: dict
    subtasks: List[dict]
    results: Annotated[List[dict], operator.add]
    final_output: dict

def build_multi_model_graph():
    graph = StateGraph(OrchestratorState)

    # Opus creates the plan
    async def opus_orchestrator(state: OrchestratorState):
        plan = await opus_client.generate(
            f"""Analyze this task and create an execution plan:
{state['task']}

Output JSON with:
- subtasks: list of {{"task": str, "assignee": "sonnet"|"haiku", "priority": int}}
- coordination_strategy: "parallel" | "sequential" | "hybrid"
"""
        )
        return {"plan": plan, "subtasks": plan["subtasks"]}

    # Sonnet executes complex tasks
    async def sonnet_executor(state: OrchestratorState):
        sonnet_tasks = [t for t in state["subtasks"] if t["assignee"] == "sonnet"]

        # Execute in parallel (max 5 concurrent)
        results = await asyncio.gather(*[
            sonnet_client.execute(task["task"])
            for task in sonnet_tasks[:5]
        ])

        return {"results": [{"task": t["task"], "result": r}
                           for t, r in zip(sonnet_tasks, results)]}

    # Haiku handles high-volume tasks
    async def haiku_swarm(state: OrchestratorState):
        haiku_tasks = [t for t in state["subtasks"] if t["assignee"] == "haiku"]

        # Execute in parallel (max 50 concurrent)
        results = await asyncio.gather(*[
            haiku_client.execute(task["task"])
            for task in haiku_tasks[:50]
        ])

        return {"results": [{"task": t["task"], "result": r}
                           for t, r in zip(haiku_tasks, results)]}

    # Opus synthesizes final output
    async def opus_synthesizer(state: OrchestratorState):
        final = await opus_client.generate(
            f"""Synthesize these results:
Original task: {state['task']}
Results: {state['results']}

Create a comprehensive response.
"""
        )
        return {"final_output": final}

    # Build the graph
    graph.add_node("orchestrator", opus_orchestrator)
    graph.add_node("sonnet_executor", sonnet_executor)
    graph.add_node("haiku_swarm", haiku_swarm)
    graph.add_node("synthesizer", opus_synthesizer)

    # Define edges
    graph.set_entry_point("orchestrator")
    graph.add_edge("orchestrator", "sonnet_executor")
    graph.add_edge("orchestrator", "haiku_swarm")
    graph.add_edge("sonnet_executor", "synthesizer")
    graph.add_edge("haiku_swarm", "synthesizer")
    graph.add_edge("synthesizer", END)

    return graph.compile()

Task Router

Route tasks based on complexity:

class MultiModelRouter:
    async def classify_task(self, task: str) -> str:
        """Use Haiku to classify task complexity."""
        classification = await self.haiku.generate(
            f"""Classify this task's complexity:

Task: {task}

Rules:
- "haiku": Narrow, well-defined, explicit instructions
- "sonnet": Moderate complexity, needs context
- "opus": Complex reasoning, architectural decisions

Output JSON: {{"tier": str, "reasoning": str}}
"""
        )
        return classification["tier"]

    async def route(self, task: str) -> str:
        tier = await self.classify_task(task)

        if tier == "opus":
            return await self.opus.generate(task)
        elif tier == "sonnet":
            return await self.sonnet.generate(task)
        else:
            return await self.haiku.generate(task)

Cost Comparison

The savings compound quickly:

Example: Research Report Generation

Traditional (Single Opus):
- Cost: $15.00 for 100k input + 20k output tokens
- Time: 45 minutes sequential

Multi-Model Orchestration:
- Opus (planning): 10k tokens = $1.50
- Sonnet (drafting): 30k tokens = $0.90
- Haiku swarm (data gathering): 200k tokens = $2.00
- Opus (synthesis): 15k tokens = $2.25
- Total: $6.65 (56% savings)
- Time: 15 minutes with parallel execution

Common Mistakes

Mistake 1: Opus Doing Grunt Work

# WRONG: Opus classifying reviews
def analyze_sentiment(reviews):
    return opus.batch_classify(reviews, ["positive", "negative"])

# RIGHT: Opus plans, Haiku executes
async def analyze_sentiment(reviews):
    criteria = await opus.define_criteria("Create sentiment classification rules")
    results = await haiku_swarm.classify(reviews, criteria=criteria)
    return results

Mistake 2: Sequential When Parallel Works

# WRONG: Sequential processing
results = []
for doc in documents:
    results.append(haiku.extract(doc))

# RIGHT: Parallel processing
results = await asyncio.gather(*[
    haiku.extract(doc) for doc in documents
])

Summary

In this post, I showed how to build multi-model orchestration with Opus, Sonnet, and Haiku. The key point is: Opus orchestrates, Sonnet executes, Haiku handles volume. Build systems that route intelligently between all three based on task complexity.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!