Skip to content

What Is Sakana AI Fugu? Understanding LLM Orchestration vs Foundation Models

Problem

When Sakana AI announced Fugu on June 22, the AI community’s default frame was “another foundation model benchmark battle.” People compared its scores directly against Claude Opus, GPT-5.5, and Gemini, asking which model was “better.”

The problem? Fugu is not a foundation model. Comparing it like one misses the entire point of what it is — and leads to bad decisions about when to use it.

What Is Fugu?

Sakana AI’s Fugu is an LLM-based orchestrator, not a new foundation model. It exposes a single OpenAI-compatible API endpoint, but internally it routes each request across a pool of public LLMs, decides when to call itself recursively, verifies intermediate results, and synthesizes the final response.

I think the correct mental model is “a learned router/coordinator productized as an API,” not “Sakana’s GPT competitor.”

The Architecture

Fugu’s design builds on two ICLR 2026 papers — TRINITY for routing and Conductor for verification. The flow looks like this:

Fugu orchestration flow
User Query
|
v
[Fugu Orchestrator LLM] -- plans task
| | |
v v v
[Claude] [GPT] [Gemini] -- execute subtasks
| | |
+-------+-------+
|
v
[Fugu Synthesizer] -- verifies & merges
|
v
Response

Layer 1: The Orchestrator

The orchestrator LLM receives the user query and decides how to decompose it. It does not execute the work itself — it creates a plan, assigns subtasks to specific models in the pool, and can call itself recursively when a subtask requires deeper planning.

Layer 2: The Execution Pool

Fugu routes subtasks to models like Claude Opus, GPT-5.5, Gemini, and others. The orchestrator selects which model is best suited for each subtask based on the instruction, context size, and capability profile.

Layer 3: The Synthesizer

Once all subtask results come back, Conductor verifies consistency, detects conflicts, and merges outputs into a coherent response. This is the verification layer that distinguishes Fugu from simple round-robin or random routing.

Why This Changes the Economics

Orchestration-as-a-product shifts the AI value proposition from “build a better model” to “coordinate existing models better.”

  • No single model leads every benchmark — routing lets you pick the best model per subtask.
  • Export control circumvention — the “sovereignty via routing” pitch: if a region blocks access to one model, the orchestrator routes around it.
  • Pool diversity — Fugu can include models from multiple providers, reducing vendor lock-in.

Common Mistakes

Comparing Benchmark Numbers Directly

Fugu’s SWE-Bench Pro score of 73.7 includes orchestration overhead (planning tokens, verification calls, multi-model synthesis). Comparing that to Claude Opus 4.8’s 69.2 is apples-to-oranges. The orchestrator consumes extra tokens simply to coordinate, so the gap is smaller than the raw numbers suggest.

Assuming the Pool Includes Everyone

Fugu’s pool notably excludes Fable 5 (the actual SWE-Bench Pro leader at press time). So claiming “Fugu is #1 on SWE-Bench Pro” is misleading — it’s #1 among models Sakana chose to include.

Calling It a Foundation Model

This is the biggest misconception. Fugu is a routing and coordination layer. It does not have its own pre-trained weights for general knowledge. It delegates that to the execution models.

Summary

In this post, I explained what Sakana AI Fugu actually is — an LLM orchestrator, not a foundation model. The key point is that orchestration changes the AI conversation from “who has the best single model” to “how well can you coordinate the models you have.” For developers evaluating Fugu, the real question is whether orchestration overhead pays off for their specific task complexity, not where it ranks on benchmark charts.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments