How Deep Agents Uses Planning to Handle Complex Multi-Step Tasks
I built my first AI agent last year. It could call tools, process responses, and loop back for more. It worked beautifully for simple tasks like “search for X” or “calculate Y.” Then I gave it a real task: “Research LangGraph features, write a summary, and save it to a file.”
It failed. The agent searched, started writing, forgot what it was doing, and returned an incomplete result without saving anything.
The problem? No planning. My agent was stuck in a reactive loop - it couldn’t break down tasks, track progress, or recover when things went sideways.
This is exactly what Deep Agents solves with its built-in planning capabilities.
The Problem with Simple Agents
Most agent tutorials teach you this pattern:
from langchain.agents import initialize_agent
agent = initialize_agent(tools, llm, agent="react")result = agent.run("Do something complex")This works for single-step tasks. But complex workflows break down:
User: Which employee generated the most revenue and from which countries?
Agent: Let me search for employees...[Agent searches]Agent: I found employee data...[Agent forgets the full question]Agent: Here's the employee list: Jane, Bob, Alice
Result: Incomplete - never joined with revenue dataThe agent jumped straight to execution without planning. It couldn’t:
- Break down the multi-step task
- Track which steps were done
- Adjust when it realized it needed more data
- Complete all requirements before responding
What Planning Looks Like
Deep Agents includes a write_todos tool powered by TodoListMiddleware. When you give it a complex task, here’s what happens:
Question: Which employee generated the most revenue and from which countries?
[Planning Step]Using write_todos:- [ ] List tables in database- [ ] Examine Employee and Invoice schemas- [ ] Plan multi-table JOIN query- [ ] Execute and aggregate by employee and country- [ ] Format results
[Execution Steps]1. Listing tables...2. Getting schema for: Employee, Invoice, InvoiceLine, Customer3. Generating SQL query...4. Executing query...5. Formatting results...
[Final Answer]Employee Jane Peacock (ID: 3) generated the most revenue...Top countries: USA ($1000), Canada ($500)...The agent plans first, then executes. Each step is tracked. If something fails, the agent knows exactly where it was and what’s left to do.
How write_todos Works
The write_todos tool is implemented through TodoListMiddleware. Looking at the Deep Agents source:
from langchain.agents.middleware import TodoListMiddleware
def create_deep_agent(): # Planning middleware is applied first, before other tools deepagent_middleware: list[AgentMiddleware] = [ TodoListMiddleware(), # Planning comes first FilesystemMiddleware(backend=backend), SubAgentMiddleware(...), # ... other middleware ]The middleware injects a write_todos tool into the agent’s available tools. The agent can call this tool to:
- Create a plan - Break down a task into ordered steps
- Update status - Mark steps as pending, in_progress, or completed
- Reorder tasks - Adjust the plan as understanding evolves
- Add/remove tasks - Handle new requirements or dead ends
Each todo item has three required fields:
{ "content": "List tables in database", # What to do "status": "in_progress", # pending | in_progress | completed "activeForm": "Listing tables in database" # Present tense for progress display}A Real Example: Text-to-SQL Agent
The text-to-sql-agent example in the Deep Agents repo shows planning in action. I ran it against the Chinook sample database:
python agent.py "Which employee generated the most revenue and from which countries?"The agent’s internal process looked like this:
[1] write_todos: Create plan - List tables in database [pending] - Examine Employee and Invoice schemas [pending] - Plan multi-table JOIN query [pending] - Execute and aggregate by employee and country [pending] - Format results [pending]
[2] list_tables: ["Employee", "Customer", "Invoice", "InvoiceLine", ...]
[3] write_todos: Update plan - List tables in database [completed] - Examine Employee and Invoice schemas [in_progress] ...
[4] get_schema: Employee(id, name, title...), Invoice(id, customerId, total...)
[5] write_todos: Update plan - Examine Employee and Invoice schemas [completed] - Plan multi-table JOIN query [in_progress] ...
[6] execute_query: SELECT e.Name, c.Country, SUM(i.Total)...The agent didn’t just react - it planned, executed, tracked, and adjusted. When it realized it needed the Customer table for country data, it updated its plan.
The Planning Middleware Stack
What I found interesting is how TodoListMiddleware is positioned in the stack:
┌─────────────────────────────────────────────────────────────┐│ Main Agent │├─────────────────────────────────────────────────────────────┤│ ┌─────────────────────────────────────────────────────┐ ││ │ TodoListMiddleware (planning) │ ││ │ - write_todos tool │ ││ │ - State management for todo list │ ││ └─────────────────────────────────────────────────────┘ ││ ↓ ││ ┌─────────────────────────────────────────────────────┐ ││ │ SkillsMiddleware (on-demand expertise) │ ││ └─────────────────────────────────────────────────────┘ ││ ↓ ││ ┌─────────────────────────────────────────────────────┐ ││ │ FilesystemMiddleware (file operations) │ ││ └─────────────────────────────────────────────────────┘ ││ ↓ ││ ┌─────────────────────────────────────────────────────┐ ││ │ SubAgentMiddleware (delegation) │ ││ └─────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────┘Planning comes first. This isn’t accidental - the agent needs to plan before it can decide which tools to use or which sub-agents to spawn.
Sub-Agents Also Plan
Here’s a detail I initially missed: sub-agents get their own TodoListMiddleware. From the source:
# Build general-purpose subagent with default middleware stackgp_middleware: list[AgentMiddleware] = [ TodoListMiddleware(), # Sub-agents plan too! FilesystemMiddleware(backend=backend), create_summarization_middleware(model, backend), PatchToolCallsMiddleware(),]This means when the main agent delegates work to a sub-agent, that sub-agent can also plan and track its own progress. The Deep Research example demonstrates this:
Main Agent receives: "Compare React and Vue performance"
[Main Agent Plan]1. Save request for reference2. Plan research with TODOs3. Delegate to sub-agents4. Synthesize findings5. Respond with comparison
[Sub-Agent A: React Research]1. Search React performance docs2. Extract benchmarks3. Compile findings
[Sub-Agent B: Vue Research]1. Search Vue performance docs2. Extract benchmarks3. Compile findingsEach sub-agent maintains its own todo list, isolated from the main agent’s context.
One Constraint: No Parallel write_todos
The middleware enforces an important constraint: you can’t call write_todos multiple times in the same message. I found this test in the source:
def test_todo_middleware_rejects_multiple_write_todos_in_same_message(self): """Test that todo middleware rejects multiple write_todos calls.""" # When agent tries to call write_todos twice in one message # Middleware returns error for both calls expected_error = ( "Error: The `write_todos` tool should never be called " "multiple times in parallel. Please call it only once per " "model invocation to update the todo list." )This forces the agent to maintain a coherent plan rather than fragmenting into multiple competing task lists.
The Research Workflow Pattern
The Deep Research example shows a common pattern for planning-heavy workflows:
RESEARCH_WORKFLOW_INSTRUCTIONS = """Defines a 5-step workflow:1. Save request -> Store the original query for reference2. Plan with TODOs -> Break down research approach3. Delegate to sub-agents -> Parallel research execution4. Synthesize -> Combine findings5. Respond -> Format and deliver"""This pattern scales well:
- Simple queries: 1 sub-agent, 2-3 searches
- Comparisons: 1 sub-agent per element being compared
- Multi-faceted research: 1 sub-agent per aspect
The planning tool enables this scaling because the agent can reason about task complexity before diving into execution.
When Planning Matters
Planning isn’t necessary for every task. “What’s 2+2?” doesn’t need a todo list. But planning becomes critical when:
- Multiple steps required - Task needs sequential execution
- Dependencies exist - Step B requires output from Step A
- Backtracking possible - Agent might hit dead ends
- Progress tracking needed - User wants status updates
- Delegation involved - Sub-agents will handle pieces
For these scenarios, the difference between a planning agent and a reactive loop is the difference between success and frustration.
Using Planning in Your Agent
If you’re building with Deep Agents, planning is included by default:
from deepagents import create_deep_agent
agent = create_deep_agent()
# The agent will automatically use write_todos for complex tasksresult = agent.invoke({ "messages": [{ "role": "user", "content": "Analyze the codebase and suggest improvements" }]})You don’t need to configure anything. The TodoListMiddleware is part of the default stack.
If you want to customize planning behavior, you can add your own middleware:
from langchain.agents.middleware import TodoListMiddlewarefrom deepagents import create_deep_agent
# Planning is customizable through middlewareagent = create_deep_agent( middleware=[ # Your custom middleware here ])My Take After Using It
I tried building a documentation agent with and without planning. The difference was stark:
Without planning:
- Agent started writing before gathering all info
- Forgot to save intermediate results
- Lost track of original requirements
- Often returned incomplete work
With planning (Deep Agents):
- Agent broke down the task upfront
- Tracked which files were already processed
- Saved progress and could resume
- Completed all requirements before responding
The planning tool transforms an agent from reactive to proactive. Instead of just responding to the last thing it saw, the agent maintains a coherent view of what it’s trying to accomplish.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments