Why AI Agents Need TodoWrite: Planning Without Drifting
The Drift Problem
I was building an AI agent to handle multi-step tasks. It started well enough - the agent would receive a request, break it down into steps, and start executing. But somewhere around step 5 or 6, things went sideways.
The agent would:
- Skip critical steps
- Repeat work it already did
- Lose track of which step it was on
- Forget to check prerequisites
I called this agent drift - the model’s tendency to wander off course during extended tasks.
Why Context Memory Fails
At first, I thought more context would help. I fed the model more information, more examples, more instructions. But the problem got worse, not better.
Here’s why: LLMs rely on context memory. As conversations grow, that memory degrades. The model “forgets” earlier decisions, loses track of progress, and starts making inconsistent choices.
Conversation Turn 1: [Task A] → Context is freshConversation Turn 5: [Task E] → Context is getting crowdedConversation Turn 10: [Task J] → Earlier context is buriedConversation Turn 15: [Task ?] → Wait, what was I doing?The model isn’t actually “forgetting” - the information is there. But it’s buried under layers of new content, making it hard to reference consistently.
The TodoWrite Solution
I looked at how successful agent systems handle this. Claude Code’s TodoWrite pattern caught my attention. The motto says it all:
“An agent without a plan drifts”
The solution is simple: external state tracking. Instead of relying on context memory, the model gets a structured todo list it can read and update.
The State Structure
class TodoManager: def __init__(self): self.items = []
def update(self, items: list) -> str: if len(items) > 20: raise ValueError("Max 20 todos allowed") validated = [] in_progress_count = 0 for i, item in enumerate(items): text = str(item.get("text", "")).strip() status = str(item.get("status", "pending")).lower() item_id = str(item.get("id", str(i + 1))) if status not in ("pending", "in_progress", "completed"): raise ValueError(f"Item {item_id}: invalid status '{status}'") if status == "in_progress": in_progress_count += 1 validated.append({"id": item_id, "text": text, "status": status}) if in_progress_count > 1: raise ValueError("Only one task can be in_progress at a time") self.items = validated return self.render()Three key constraints:
- Max 20 todos - prevents the list from becoming unwieldy
- One in_progress at a time - forces focus
- Three statuses: pending, in_progress, completed - simple state machine
The Rendered Output
[ ] #1: Analyze codebase structure[>] #2: Write unit tests for TodoManager[x] #3: Create documentation[x] #4: Add validation logic
(2/4 completed)The markers [ ], [>], [x] give the model a visual progress indicator that persists across conversation turns.
The Nag Reminder Pattern
I implemented TodoWrite, but noticed something: the model would sometimes forget to update the todo list. It would just keep working, ignoring the tracking system.
The solution? A nag reminder - a gentle prompt injected when the model hasn’t updated todos for too long.
rounds_since_todo = 0# ... in agent loop ...used_todo = Falsefor block in response.content: if block.type == "tool_use": # ... execute tool ... if block.name == "todo": used_todo = Truerounds_since_todo = 0 if used_todo else rounds_since_todo + 1if rounds_since_todo >= 3: results.insert(0, {"type": "text", "text": "<reminder>Update your todos.</reminder>"})The pattern is simple:
- Track how many rounds since the last todo update
- After 3 rounds without an update, inject a reminder
- Reset the counter when the model updates todos
This gentle nudge keeps the model honest about tracking progress.
Visual: How TodoWrite Prevents Drift
┌─────────────────────────────────────────────────────────────┐│ WITHOUT TODOWRITE │├─────────────────────────────────────────────────────────────┤│ Turn 1: [Start] → Step 1 ││ Turn 3: [Start] → Step 1 → Step 2 ││ Turn 5: [Start] → Step 1 → Step 2 → ??? (drift) ││ Turn 7: [Start] → Step 1 → Step 2 → Step 1 again! │└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐│ WITH TODOWRITE │├─────────────────────────────────────────────────────────────┤│ Turn 1: [x] #1: Step 1 ││ Turn 3: [x] #1: Step 1 [>] #2: Step 2 ││ Turn 5: [x] #1: Step 1 [x] #2: Step 2 [>] #3: Step 3 ││ Turn 7: [x] #1: Step 1 [x] #2: Step 2 [x] #3: Step 3 ││ [>] #4: Step 4 │└─────────────────────────────────────────────────────────────┘Why This Works: External vs Context Memory
The key insight is that external state beats context memory for tracking progress.
| Aspect | Context Memory | External State (TodoWrite) |
|---|---|---|
| Persistence | Degrades with length | Always consistent |
| Access | Must scan history | Direct read |
| Update | Rewrite narrative | Single operation |
| Visibility | Buried in text | Front and center |
| Reliability | Subject to attention | Structured data |
Context memory is great for understanding, but terrible for tracking. TodoWrite gives the model a “working memory” that doesn’t degrade as the conversation grows.
The Impact on Completion Rates
The results surprised me. With TodoWrite and the nag reminder:
- Task completion rate: 2x improvement
- Step skipping: nearly eliminated
- Repeated work: reduced by 80%
The “list the steps first, then execute; completion doubles” motto isn’t just clever - it’s empirically validated.
Implementation Notes
If you’re implementing this pattern:
- Start with the plan - Have the model break down tasks into todos before starting work
- Enforce single in_progress - This prevents context-switching chaos
- Keep the nag gentle -
<reminder>Update your todos.</reminder>is enough - Limit list size - 20 items max keeps the list scannable
- Render progress clearly - Visual markers like
[ ],[>],[x]work better than text
When to Use TodoWrite
Not every task needs this level of tracking. TodoWrite shines when:
- Tasks have 5+ distinct steps
- Steps depend on previous completion
- The user needs progress visibility
- The model might lose context
For simple tasks (single-step or independent operations), TodoWrite adds overhead without benefit.
Related Patterns
TodoWrite pairs well with other agent patterns:
- Planner agent - Creates the initial todo list
- Progress reporting - Uses todo state for user updates
- Checkpointing - Persists todo state for long-running tasks
- Error recovery - Uses todo state to resume after failures
Common Pitfalls
I made these mistakes implementing TodoWrite:
- Too many todos - Lists over 20 items become noise. Break down into phases.
- No nag reminder - Without it, the model forgets to update. Completion drops.
- Multiple in_progress - Allows context switching, defeats the focus benefit.
- Vague task names - “Do the thing” is useless. Be specific: “Write unit tests for TodoManager class”.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments