Skip to content

Claude Code Architecture: How Anthropic's Agent Balances Simplicity and Power

I assumed Claude Code used a sophisticated multi-agent architecture. Complex tool orchestration, distributed task queues, maybe even a graph-based workflow engine. When I finally understood the actual implementation, I was shocked: it’s basically a while loop.

But that simplicity is exactly why it works. Let me explain how Anthropic built an agent that contributed 4% of GitHub’s public commits in February 2026 with an architecture you could explain in five minutes.

The Core Loop: Deceptively Simple

At its heart, Claude Code is a single-threaded agent that follows this pattern:

Claude Code Core Loop
┌─────────────────────────────────────────────────────────────┐
│ 1. Receive user request │
│ 2. Build message history (system + user + tool results) │
│ 3. Call Claude API with tools available │
│ 4. If no tool calls → return response │
│ 5. If tool calls → execute each, append results to history │
│ 6. Go to step 3 │
└─────────────────────────────────────────────────────────────┘

I expected something like LangGraph’s state machine or AutoGen’s multi-agent conversation patterns. Instead, here’s the simplified core:

claude_code_loop.py
def claude_code_loop(user_request):
messages = [system_prompt, user_request]
while True:
response = claude.chat(messages=messages, tools=tools)
if not response.tool_calls:
return response.content
for tool_call in response.tool_calls:
result = execute_tool(tool_call)
# Only append - preserve cache
messages.append({
"role": "tool_result",
"content": result
})

That’s it. No complex state management. No multi-agent coordination. Just a flat message history that grows with each tool interaction.

Why This Works: The Cache Advantage

I initially thought this was a limitation. But the flat message history has a critical advantage: prompt caching.

When you use Anthropic’s prompt caching, you pay full price for the first request but only 10% for cached tokens on subsequent calls. Claude Code maximizes this by:

  1. Keeping system prompt at the start (always cached)
  2. Only appending new content to the end
  3. Never restructuring the message list
cache_optimization.py
def build_messages(system_prompt, user_request, tool_results):
# System prompt is always first for caching
messages = [{"role": "system", "content": system_prompt}]
# User request follows
messages.append({"role": "user", "content": user_request})
# Tool results are appended, never inserted
for result in tool_results:
messages.append({"role": "tool_result", "content": result})
return messages

If Claude Code restructured messages (like some multi-agent frameworks do), it would lose cache hits and cost significantly more.

The Evolution: From TODO to Tasks

In January 2026 (v2.1.16), Anthropic added the Tasks system. I assumed this was just a renamed TODO list. I was wrong.

The old TODO approach was:

old_todo.py
class TodoList:
def __init__(self):
self.items = [] # In-memory only
def add(self, description):
self.items.append({"description": description, "done": False})
def complete(self, index):
self.items[index]["done"] = True

This had three fatal flaws:

  1. Lost on restart - If Claude Code crashed, the TODO list vanished
  2. No dependencies - You couldn’t say “Task B depends on Task A”
  3. Single-agent only - Multiple agents couldn’t share the list

The new Tasks system solves all three:

tasks_system.py
from dataclasses import dataclass, field
from typing import List, Set
from pathlib import Path
import json
@dataclass
class Task:
id: str
description: str
dependencies: List[str] = field(default_factory=list)
status: str = "pending" # pending, in_progress, completed, failed
def can_start(self, completed_task_ids: Set[str]) -> bool:
"""Check if all dependencies are satisfied."""
return all(dep in completed_task_ids for dep in self.dependencies)
class TaskManager:
def __init__(self, tasks_dir: str = "~/.claude/tasks"):
self.tasks_dir = Path(tasks_dir).expanduser()
self.tasks_dir.mkdir(parents=True, exist_ok=True)
def create_task(self, task: Task) -> None:
"""Persist task to disk."""
task_file = self.tasks_dir / f"{task.id}.json"
task_file.write_text(json.dumps({
"id": task.id,
"description": task.description,
"dependencies": task.dependencies,
"status": task.status
}))
def load_tasks(self) -> List[Task]:
"""Load all tasks from disk."""
tasks = []
for task_file in self.tasks_dir.glob("*.json"):
data = json.loads(task_file.read_text())
tasks.append(Task(**data))
return tasks
def get_ready_tasks(self) -> List[Task]:
"""Get tasks with all dependencies satisfied."""
tasks = self.load_tasks()
completed = {t.id for t in tasks if t.status == "completed"}
return [t for t in tasks if t.status == "pending" and t.can_start(completed)]

The key differences:

FeatureOld TODONew Tasks
PersistenceIn-memoryFile-based at ~/.claude/tasks/
DependenciesNoneDAG with dependency tracking
Crash recoveryLostSurvives restart
Multi-agentNoShared with file locking

Agent Teams: When Simple Isn’t Enough

For most tasks, single-agent Claude Code is sufficient. But sometimes you need parallel work. That’s where Agent Teams comes in.

I tried using Agent Teams for a large refactoring project. Here’s what I learned:

Agent Teams Architecture
┌─────────────────────────────────────────────────────────────┐
│ Coordinator Agent │
│ (main Claude Code instance) │
└─────────────────────────────────────────────────────────────┘
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Teammate 1 │ │ Teammate 2 │ │ Teammate 3 │
│ (independent│ │ (independent│ │ (independent│
│ context) │ │ context) │ │ context) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└──────────────────┼──────────────────┘
┌─────────────────────┐
│ Shared Tasks │
│ (file-based) │
└─────────────────────┘

Each Teammate is an independent Claude Code instance:

  1. Own context - Loads project CLAUDE.md independently
  2. Own Skills - Can use specialized skills
  3. Shared tasks - Coordinates via file-based task list
  4. Mailbox system - Asynchronous message passing

But there’s a catch. The experimental status warning in the docs is real:

Agent Teams Cost Warning
- Token cost: ~5x compared to single agent
- Context duplication: Each agent maintains full context
- Coordination overhead: Messages between agents add tokens
- Not production-ready: Still experimental in v2.1.71

I used Agent Teams to parallelize a documentation update across 50 files. It worked, but my token bill was 4.2x higher than expected. For most tasks, the single-agent approach is more cost-effective.

The Extreme Test: 16 Agents Building a Compiler

The most impressive demonstration of Agent Teams was when Anthropic engineers used 16 parallel agents to build a 100K-line Rust C compiler that could compile the Linux 6.9 kernel.

How they made it work:

multi_agent_coordination.py
from dataclasses import dataclass
from typing import List, Dict
from enum import Enum
class AgentRole(Enum):
LEXER = "lexer"
PARSER = "parser"
CODEGEN = "codegen"
OPTIMIZER = "optimizer"
TESTER = "tester"
@dataclass
class AgentAssignment:
agent_id: str
role: AgentRole
files_assigned: List[str]
dependencies: List[str] # Other agents this one depends on
def coordinate_compiler_build(agents: List[AgentAssignment]) -> None:
"""
Coordinate 16 agents to build a C compiler.
Key strategies:
1. Clear module boundaries (lexer, parser, etc.)
2. Interface contracts defined upfront
3. Shared test suite for integration
4. File locking for concurrent writes
"""
for agent in agents:
# Each agent works on its module
# Results shared via file-based tasks
pass

This demonstrates when multi-agent makes sense: highly parallelizable work with clear boundaries.

Design Philosophy: Start Simple, Add Complexity Only When Needed

What struck me most about Claude Code’s architecture is the restraint. Anthropic could have built:

  • A graph-based workflow engine
  • A distributed multi-agent framework from day one
  • Complex state machines for tool orchestration

Instead, they followed a principle: “Use the simplest architecture that works.”

Claude Code Evolution Path
v1.0 → Core loop (while + tool calls)
▼ (months of iteration)
v2.1.16 → Tasks system (persistence + dependencies)
▼ (when parallelism needed)
v2.1.71 → Agent Teams (experimental multi-agent)

Each evolution added only what was necessary:

PhaseProblem SolvedComplexity Added
Core loopBasic agent functionalityMinimal
TasksPersistence, dependencies, recoveryFile I/O
Agent TeamsParallel work on large tasks5x token cost

Common Misconceptions

I held several wrong beliefs about Claude Code before digging deeper:

Misconception 1: “Claude Code uses complex multi-agent by default”

No. Single-threaded execution is the default. Agent Teams is opt-in and experimental.

Misconception 2: “Tasks are just renamed TODOs”

No. Tasks have DAG dependencies, file persistence, and cross-agent sharing. TODOs were in-memory lists.

Misconception 3: “Agent Teams is production-ready”

No. The docs explicitly mark it experimental with 5x cost overhead.

Misconception 4: “The architecture is hidden/proprietary”

No. Anthropic has been transparent about the design. It’s a simple loop with progressive enhancement.

Why This Matters for Your Own Agents

If you’re building your own AI agents, Claude Code’s architecture offers a blueprint:

your_agent_blueprint.py
class SimpleAgent:
"""Minimal agent that works."""
def __init__(self, model, tools):
self.model = model
self.tools = tools
self.messages = []
def run(self, user_input: str) -> str:
self.messages.append({"role": "user", "content": user_input})
while True:
response = self.model.chat(
messages=self.messages,
tools=self.tools
)
if not response.tool_calls:
return response.content
for tool_call in response.tool_calls:
result = self.execute(tool_call)
self.messages.append({
"role": "tool_result",
"content": result
})
def execute(self, tool_call) -> str:
tool = self.tools.get(tool_call.name)
return tool(**tool_call.arguments)

Add complexity only when you hit real limits:

  • Need persistence? Add file-based task storage
  • Need dependencies? Add DAG tracking
  • Need parallelism? Add multi-agent coordination
  • Need memory? Add retrieval or summarization

But start with the simple loop. It handles 90% of use cases.

Summary

In this post, I explained Claude Code’s architecture and why its simplicity is a feature, not a limitation. The key point is that Anthropic deliberately chose a simple single-threaded loop, evolving it with persistent Tasks and optional Agent Teams only when needed. The core loop with flat message history maximizes prompt caching. Tasks add persistence and dependencies without breaking simplicity. Agent Teams provide parallelism for extreme cases but at 5x cost. The lesson: start simple, add complexity only when you hit real limits.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments