How Deep Agents Uses Planning to Handle Complex Multi-Step Tasks

Mar 20, 2026

I built my first AI agent last year. It could call tools, process responses, and loop back for more. It worked beautifully for simple tasks like “search for X” or “calculate Y.” Then I gave it a real task: “Research LangGraph features, write a summary, and save it to a file.”

It failed. The agent searched, started writing, forgot what it was doing, and returned an incomplete result without saving anything.

The problem? No planning. My agent was stuck in a reactive loop - it couldn’t break down tasks, track progress, or recover when things went sideways.

This is exactly what Deep Agents solves with its built-in planning capabilities.

The Problem with Simple Agents

Most agent tutorials teach you this pattern:

from langchain.agents import initialize_agent

agent = initialize_agent(tools, llm, agent="react")
result = agent.run("Do something complex")

This works for single-step tasks. But complex workflows break down:

User: Which employee generated the most revenue and from which countries?

Agent: Let me search for employees...
[Agent searches]
Agent: I found employee data...
[Agent forgets the full question]
Agent: Here's the employee list: Jane, Bob, Alice

Result: Incomplete - never joined with revenue data

The agent jumped straight to execution without planning. It couldn’t:

Break down the multi-step task
Track which steps were done
Adjust when it realized it needed more data
Complete all requirements before responding

What Planning Looks Like

Deep Agents includes a write_todos tool powered by TodoListMiddleware. When you give it a complex task, here’s what happens:

Question: Which employee generated the most revenue and from which countries?

[Planning Step]
Using write_todos:
- [ ] List tables in database
- [ ] Examine Employee and Invoice schemas
- [ ] Plan multi-table JOIN query
- [ ] Execute and aggregate by employee and country
- [ ] Format results

[Execution Steps]
1. Listing tables...
2. Getting schema for: Employee, Invoice, InvoiceLine, Customer
3. Generating SQL query...
4. Executing query...
5. Formatting results...

[Final Answer]
Employee Jane Peacock (ID: 3) generated the most revenue...
Top countries: USA ($1000), Canada ($500)...

The agent plans first, then executes. Each step is tracked. If something fails, the agent knows exactly where it was and what’s left to do.

How write_todos Works

The write_todos tool is implemented through TodoListMiddleware. Looking at the Deep Agents source:

from langchain.agents.middleware import TodoListMiddleware

def create_deep_agent():
    # Planning middleware is applied first, before other tools
    deepagent_middleware: list[AgentMiddleware] = [
        TodoListMiddleware(),  # Planning comes first
        FilesystemMiddleware(backend=backend),
        SubAgentMiddleware(...),
        # ... other middleware
    ]

The middleware injects a write_todos tool into the agent’s available tools. The agent can call this tool to:

Create a plan - Break down a task into ordered steps
Update status - Mark steps as pending, in_progress, or completed
Reorder tasks - Adjust the plan as understanding evolves
Add/remove tasks - Handle new requirements or dead ends

Each todo item has three required fields:

{
    "content": "List tables in database",  # What to do
    "status": "in_progress",               # pending | in_progress | completed
    "activeForm": "Listing tables in database"  # Present tense for progress display
}

A Real Example: Text-to-SQL Agent

The text-to-sql-agent example in the Deep Agents repo shows planning in action. I ran it against the Chinook sample database:

python agent.py "Which employee generated the most revenue and from which countries?"

The agent’s internal process looked like this:

[1] write_todos: Create plan
    - List tables in database [pending]
    - Examine Employee and Invoice schemas [pending]
    - Plan multi-table JOIN query [pending]
    - Execute and aggregate by employee and country [pending]
    - Format results [pending]

[2] list_tables: ["Employee", "Customer", "Invoice", "InvoiceLine", ...]

[3] write_todos: Update plan
    - List tables in database [completed]
    - Examine Employee and Invoice schemas [in_progress]
    ...

[4] get_schema: Employee(id, name, title...), Invoice(id, customerId, total...)

[5] write_todos: Update plan
    - Examine Employee and Invoice schemas [completed]
    - Plan multi-table JOIN query [in_progress]
    ...

[6] execute_query: SELECT e.Name, c.Country, SUM(i.Total)...

The agent didn’t just react - it planned, executed, tracked, and adjusted. When it realized it needed the Customer table for country data, it updated its plan.

The Planning Middleware Stack

What I found interesting is how TodoListMiddleware is positioned in the stack:

┌─────────────────────────────────────────────────────────────┐
│                    Main Agent                                │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────┐   │
│  │ TodoListMiddleware (planning)                        │   │
│  │ - write_todos tool                                   │   │
│  │ - State management for todo list                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                           ↓                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ SkillsMiddleware (on-demand expertise)               │   │
│  └─────────────────────────────────────────────────────┘   │
│                           ↓                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ FilesystemMiddleware (file operations)               │   │
│  └─────────────────────────────────────────────────────┘   │
│                           ↓                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ SubAgentMiddleware (delegation)                      │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Planning comes first. This isn’t accidental - the agent needs to plan before it can decide which tools to use or which sub-agents to spawn.

Sub-Agents Also Plan

Here’s a detail I initially missed: sub-agents get their own TodoListMiddleware. From the source:

# Build general-purpose subagent with default middleware stack
gp_middleware: list[AgentMiddleware] = [
    TodoListMiddleware(),  # Sub-agents plan too!
    FilesystemMiddleware(backend=backend),
    create_summarization_middleware(model, backend),
    PatchToolCallsMiddleware(),
]

This means when the main agent delegates work to a sub-agent, that sub-agent can also plan and track its own progress. The Deep Research example demonstrates this:

Main Agent receives: "Compare React and Vue performance"

[Main Agent Plan]
1. Save request for reference
2. Plan research with TODOs
3. Delegate to sub-agents
4. Synthesize findings
5. Respond with comparison

[Sub-Agent A: React Research]
1. Search React performance docs
2. Extract benchmarks
3. Compile findings

[Sub-Agent B: Vue Research]
1. Search Vue performance docs
2. Extract benchmarks
3. Compile findings

Each sub-agent maintains its own todo list, isolated from the main agent’s context.

One Constraint: No Parallel write_todos

The middleware enforces an important constraint: you can’t call write_todos multiple times in the same message. I found this test in the source:

def test_todo_middleware_rejects_multiple_write_todos_in_same_message(self):
    """Test that todo middleware rejects multiple write_todos calls."""
    # When agent tries to call write_todos twice in one message
    # Middleware returns error for both calls
    expected_error = (
        "Error: The `write_todos` tool should never be called "
        "multiple times in parallel. Please call it only once per "
        "model invocation to update the todo list."
    )

This forces the agent to maintain a coherent plan rather than fragmenting into multiple competing task lists.

The Research Workflow Pattern

The Deep Research example shows a common pattern for planning-heavy workflows:

RESEARCH_WORKFLOW_INSTRUCTIONS = """
Defines a 5-step workflow:
1. Save request -> Store the original query for reference
2. Plan with TODOs -> Break down research approach
3. Delegate to sub-agents -> Parallel research execution
4. Synthesize -> Combine findings
5. Respond -> Format and deliver
"""

This pattern scales well:

Simple queries: 1 sub-agent, 2-3 searches
Comparisons: 1 sub-agent per element being compared
Multi-faceted research: 1 sub-agent per aspect

The planning tool enables this scaling because the agent can reason about task complexity before diving into execution.

When Planning Matters

Planning isn’t necessary for every task. “What’s 2+2?” doesn’t need a todo list. But planning becomes critical when:

Multiple steps required - Task needs sequential execution
Dependencies exist - Step B requires output from Step A
Backtracking possible - Agent might hit dead ends
Progress tracking needed - User wants status updates
Delegation involved - Sub-agents will handle pieces

For these scenarios, the difference between a planning agent and a reactive loop is the difference between success and frustration.

Using Planning in Your Agent

If you’re building with Deep Agents, planning is included by default:

from deepagents import create_deep_agent

agent = create_deep_agent()

# The agent will automatically use write_todos for complex tasks
result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Analyze the codebase and suggest improvements"
    }]
})

You don’t need to configure anything. The TodoListMiddleware is part of the default stack.

If you want to customize planning behavior, you can add your own middleware:

from langchain.agents.middleware import TodoListMiddleware
from deepagents import create_deep_agent

# Planning is customizable through middleware
agent = create_deep_agent(
    middleware=[
        # Your custom middleware here
    ]
)

My Take After Using It

I tried building a documentation agent with and without planning. The difference was stark:

Without planning:

Agent started writing before gathering all info
Forgot to save intermediate results
Lost track of original requirements
Often returned incomplete work

With planning (Deep Agents):

Agent broke down the task upfront
Tracked which files were already processed
Saved progress and could resume
Completed all requirements before responding

The planning tool transforms an agent from reactive to proactive. Instead of just responding to the last thing it saw, the agent maintains a coherent view of what it’s trying to accomplish.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!