What Is Sakana AI Fugu? Understanding LLM Orchestration vs Foundation Models
Problem
When Sakana AI announced Fugu on June 22, the AI community’s default frame was “another foundation model benchmark battle.” People compared its scores directly against Claude Opus, GPT-5.5, and Gemini, asking which model was “better.”
The problem? Fugu is not a foundation model. Comparing it like one misses the entire point of what it is — and leads to bad decisions about when to use it.
What Is Fugu?
Sakana AI’s Fugu is an LLM-based orchestrator, not a new foundation model. It exposes a single OpenAI-compatible API endpoint, but internally it routes each request across a pool of public LLMs, decides when to call itself recursively, verifies intermediate results, and synthesizes the final response.
I think the correct mental model is “a learned router/coordinator productized as an API,” not “Sakana’s GPT competitor.”
The Architecture
Fugu’s design builds on two ICLR 2026 papers — TRINITY for routing and Conductor for verification. The flow looks like this:
User Query | v[Fugu Orchestrator LLM] -- plans task | | | v v v[Claude] [GPT] [Gemini] -- execute subtasks | | | +-------+-------+ | v[Fugu Synthesizer] -- verifies & merges | v ResponseLayer 1: The Orchestrator
The orchestrator LLM receives the user query and decides how to decompose it. It does not execute the work itself — it creates a plan, assigns subtasks to specific models in the pool, and can call itself recursively when a subtask requires deeper planning.
Layer 2: The Execution Pool
Fugu routes subtasks to models like Claude Opus, GPT-5.5, Gemini, and others. The orchestrator selects which model is best suited for each subtask based on the instruction, context size, and capability profile.
Layer 3: The Synthesizer
Once all subtask results come back, Conductor verifies consistency, detects conflicts, and merges outputs into a coherent response. This is the verification layer that distinguishes Fugu from simple round-robin or random routing.
Why This Changes the Economics
Orchestration-as-a-product shifts the AI value proposition from “build a better model” to “coordinate existing models better.”
- No single model leads every benchmark — routing lets you pick the best model per subtask.
- Export control circumvention — the “sovereignty via routing” pitch: if a region blocks access to one model, the orchestrator routes around it.
- Pool diversity — Fugu can include models from multiple providers, reducing vendor lock-in.
Common Mistakes
Comparing Benchmark Numbers Directly
Fugu’s SWE-Bench Pro score of 73.7 includes orchestration overhead (planning tokens, verification calls, multi-model synthesis). Comparing that to Claude Opus 4.8’s 69.2 is apples-to-oranges. The orchestrator consumes extra tokens simply to coordinate, so the gap is smaller than the raw numbers suggest.
Assuming the Pool Includes Everyone
Fugu’s pool notably excludes Fable 5 (the actual SWE-Bench Pro leader at press time). So claiming “Fugu is #1 on SWE-Bench Pro” is misleading — it’s #1 among models Sakana chose to include.
Calling It a Foundation Model
This is the biggest misconception. Fugu is a routing and coordination layer. It does not have its own pre-trained weights for general knowledge. It delegates that to the execution models.
Summary
In this post, I explained what Sakana AI Fugu actually is — an LLM orchestrator, not a foundation model. The key point is that orchestration changes the AI conversation from “who has the best single model” to “how well can you coordinate the models you have.” For developers evaluating Fugu, the real question is whether orchestration overhead pays off for their specific task complexity, not where it ranks on benchmark charts.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit Discussion: Sakana Fugu clarification
- 👨💻 Sakana AI Fugu Announcement
- 👨💻 ICLR 2026 — TRINITY: Task Routing via Instruction-aware Neural Inference
- 👨💻 ICLR 2026 — Conductor: Verification-driven Multi-LLM Coordination
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments