Claude Code Architecture: How Anthropic's Agent Balances Simplicity and Power
I assumed Claude Code used a sophisticated multi-agent architecture. Complex tool orchestration, distributed task queues, maybe even a graph-based workflow engine. When I finally understood the actual implementation, I was shocked: it’s basically a while loop.
But that simplicity is exactly why it works. Let me explain how Anthropic built an agent that contributed 4% of GitHub’s public commits in February 2026 with an architecture you could explain in five minutes.
The Core Loop: Deceptively Simple
At its heart, Claude Code is a single-threaded agent that follows this pattern:
┌─────────────────────────────────────────────────────────────┐│ 1. Receive user request ││ 2. Build message history (system + user + tool results) ││ 3. Call Claude API with tools available ││ 4. If no tool calls → return response ││ 5. If tool calls → execute each, append results to history ││ 6. Go to step 3 │└─────────────────────────────────────────────────────────────┘I expected something like LangGraph’s state machine or AutoGen’s multi-agent conversation patterns. Instead, here’s the simplified core:
def claude_code_loop(user_request): messages = [system_prompt, user_request]
while True: response = claude.chat(messages=messages, tools=tools)
if not response.tool_calls: return response.content
for tool_call in response.tool_calls: result = execute_tool(tool_call)
# Only append - preserve cache messages.append({ "role": "tool_result", "content": result })That’s it. No complex state management. No multi-agent coordination. Just a flat message history that grows with each tool interaction.
Why This Works: The Cache Advantage
I initially thought this was a limitation. But the flat message history has a critical advantage: prompt caching.
When you use Anthropic’s prompt caching, you pay full price for the first request but only 10% for cached tokens on subsequent calls. Claude Code maximizes this by:
- Keeping system prompt at the start (always cached)
- Only appending new content to the end
- Never restructuring the message list
def build_messages(system_prompt, user_request, tool_results): # System prompt is always first for caching messages = [{"role": "system", "content": system_prompt}]
# User request follows messages.append({"role": "user", "content": user_request})
# Tool results are appended, never inserted for result in tool_results: messages.append({"role": "tool_result", "content": result})
return messagesIf Claude Code restructured messages (like some multi-agent frameworks do), it would lose cache hits and cost significantly more.
The Evolution: From TODO to Tasks
In January 2026 (v2.1.16), Anthropic added the Tasks system. I assumed this was just a renamed TODO list. I was wrong.
The old TODO approach was:
class TodoList: def __init__(self): self.items = [] # In-memory only
def add(self, description): self.items.append({"description": description, "done": False})
def complete(self, index): self.items[index]["done"] = TrueThis had three fatal flaws:
- Lost on restart - If Claude Code crashed, the TODO list vanished
- No dependencies - You couldn’t say “Task B depends on Task A”
- Single-agent only - Multiple agents couldn’t share the list
The new Tasks system solves all three:
from dataclasses import dataclass, fieldfrom typing import List, Setfrom pathlib import Pathimport json
@dataclassclass Task: id: str description: str dependencies: List[str] = field(default_factory=list) status: str = "pending" # pending, in_progress, completed, failed
def can_start(self, completed_task_ids: Set[str]) -> bool: """Check if all dependencies are satisfied.""" return all(dep in completed_task_ids for dep in self.dependencies)
class TaskManager: def __init__(self, tasks_dir: str = "~/.claude/tasks"): self.tasks_dir = Path(tasks_dir).expanduser() self.tasks_dir.mkdir(parents=True, exist_ok=True)
def create_task(self, task: Task) -> None: """Persist task to disk.""" task_file = self.tasks_dir / f"{task.id}.json" task_file.write_text(json.dumps({ "id": task.id, "description": task.description, "dependencies": task.dependencies, "status": task.status }))
def load_tasks(self) -> List[Task]: """Load all tasks from disk.""" tasks = [] for task_file in self.tasks_dir.glob("*.json"): data = json.loads(task_file.read_text()) tasks.append(Task(**data)) return tasks
def get_ready_tasks(self) -> List[Task]: """Get tasks with all dependencies satisfied.""" tasks = self.load_tasks() completed = {t.id for t in tasks if t.status == "completed"} return [t for t in tasks if t.status == "pending" and t.can_start(completed)]The key differences:
| Feature | Old TODO | New Tasks |
|---|---|---|
| Persistence | In-memory | File-based at ~/.claude/tasks/ |
| Dependencies | None | DAG with dependency tracking |
| Crash recovery | Lost | Survives restart |
| Multi-agent | No | Shared with file locking |
Agent Teams: When Simple Isn’t Enough
For most tasks, single-agent Claude Code is sufficient. But sometimes you need parallel work. That’s where Agent Teams comes in.
I tried using Agent Teams for a large refactoring project. Here’s what I learned:
┌─────────────────────────────────────────────────────────────┐│ Coordinator Agent ││ (main Claude Code instance) │└─────────────────────────────────────────────────────────────┘ │ ┌──────────────────┼──────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Teammate 1 │ │ Teammate 2 │ │ Teammate 3 │ │ (independent│ │ (independent│ │ (independent│ │ context) │ │ context) │ │ context) │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ └──────────────────┼──────────────────┘ │ ▼ ┌─────────────────────┐ │ Shared Tasks │ │ (file-based) │ └─────────────────────┘Each Teammate is an independent Claude Code instance:
- Own context - Loads project CLAUDE.md independently
- Own Skills - Can use specialized skills
- Shared tasks - Coordinates via file-based task list
- Mailbox system - Asynchronous message passing
But there’s a catch. The experimental status warning in the docs is real:
- Token cost: ~5x compared to single agent- Context duplication: Each agent maintains full context- Coordination overhead: Messages between agents add tokens- Not production-ready: Still experimental in v2.1.71I used Agent Teams to parallelize a documentation update across 50 files. It worked, but my token bill was 4.2x higher than expected. For most tasks, the single-agent approach is more cost-effective.
The Extreme Test: 16 Agents Building a Compiler
The most impressive demonstration of Agent Teams was when Anthropic engineers used 16 parallel agents to build a 100K-line Rust C compiler that could compile the Linux 6.9 kernel.
How they made it work:
from dataclasses import dataclassfrom typing import List, Dictfrom enum import Enum
class AgentRole(Enum): LEXER = "lexer" PARSER = "parser" CODEGEN = "codegen" OPTIMIZER = "optimizer" TESTER = "tester"
@dataclassclass AgentAssignment: agent_id: str role: AgentRole files_assigned: List[str] dependencies: List[str] # Other agents this one depends on
def coordinate_compiler_build(agents: List[AgentAssignment]) -> None: """ Coordinate 16 agents to build a C compiler.
Key strategies: 1. Clear module boundaries (lexer, parser, etc.) 2. Interface contracts defined upfront 3. Shared test suite for integration 4. File locking for concurrent writes """ for agent in agents: # Each agent works on its module # Results shared via file-based tasks passThis demonstrates when multi-agent makes sense: highly parallelizable work with clear boundaries.
Design Philosophy: Start Simple, Add Complexity Only When Needed
What struck me most about Claude Code’s architecture is the restraint. Anthropic could have built:
- A graph-based workflow engine
- A distributed multi-agent framework from day one
- Complex state machines for tool orchestration
Instead, they followed a principle: “Use the simplest architecture that works.”
v1.0 → Core loop (while + tool calls) │ ▼ (months of iteration)v2.1.16 → Tasks system (persistence + dependencies) │ ▼ (when parallelism needed)v2.1.71 → Agent Teams (experimental multi-agent)Each evolution added only what was necessary:
| Phase | Problem Solved | Complexity Added |
|---|---|---|
| Core loop | Basic agent functionality | Minimal |
| Tasks | Persistence, dependencies, recovery | File I/O |
| Agent Teams | Parallel work on large tasks | 5x token cost |
Common Misconceptions
I held several wrong beliefs about Claude Code before digging deeper:
Misconception 1: “Claude Code uses complex multi-agent by default”
No. Single-threaded execution is the default. Agent Teams is opt-in and experimental.
Misconception 2: “Tasks are just renamed TODOs”
No. Tasks have DAG dependencies, file persistence, and cross-agent sharing. TODOs were in-memory lists.
Misconception 3: “Agent Teams is production-ready”
No. The docs explicitly mark it experimental with 5x cost overhead.
Misconception 4: “The architecture is hidden/proprietary”
No. Anthropic has been transparent about the design. It’s a simple loop with progressive enhancement.
Why This Matters for Your Own Agents
If you’re building your own AI agents, Claude Code’s architecture offers a blueprint:
class SimpleAgent: """Minimal agent that works."""
def __init__(self, model, tools): self.model = model self.tools = tools self.messages = []
def run(self, user_input: str) -> str: self.messages.append({"role": "user", "content": user_input})
while True: response = self.model.chat( messages=self.messages, tools=self.tools )
if not response.tool_calls: return response.content
for tool_call in response.tool_calls: result = self.execute(tool_call) self.messages.append({ "role": "tool_result", "content": result })
def execute(self, tool_call) -> str: tool = self.tools.get(tool_call.name) return tool(**tool_call.arguments)Add complexity only when you hit real limits:
- Need persistence? Add file-based task storage
- Need dependencies? Add DAG tracking
- Need parallelism? Add multi-agent coordination
- Need memory? Add retrieval or summarization
But start with the simple loop. It handles 90% of use cases.
Summary
In this post, I explained Claude Code’s architecture and why its simplicity is a feature, not a limitation. The key point is that Anthropic deliberately chose a simple single-threaded loop, evolving it with persistent Tasks and optional Agent Teams only when needed. The core loop with flat message history maximizes prompt caching. Tasks add persistence and dependencies without breaking simplicity. Agent Teams provide parallelism for extreme cases but at 5x cost. The lesson: start simple, add complexity only when you hit real limits.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments