Skip to content

Why AI Agents Need TodoWrite: Planning Without Drifting

The Drift Problem

I was building an AI agent to handle multi-step tasks. It started well enough - the agent would receive a request, break it down into steps, and start executing. But somewhere around step 5 or 6, things went sideways.

The agent would:

  • Skip critical steps
  • Repeat work it already did
  • Lose track of which step it was on
  • Forget to check prerequisites

I called this agent drift - the model’s tendency to wander off course during extended tasks.

Why Context Memory Fails

At first, I thought more context would help. I fed the model more information, more examples, more instructions. But the problem got worse, not better.

Here’s why: LLMs rely on context memory. As conversations grow, that memory degrades. The model “forgets” earlier decisions, loses track of progress, and starts making inconsistent choices.

Conversation Turn 1: [Task A] → Context is fresh
Conversation Turn 5: [Task E] → Context is getting crowded
Conversation Turn 10: [Task J] → Earlier context is buried
Conversation Turn 15: [Task ?] → Wait, what was I doing?

The model isn’t actually “forgetting” - the information is there. But it’s buried under layers of new content, making it hard to reference consistently.

The TodoWrite Solution

I looked at how successful agent systems handle this. Claude Code’s TodoWrite pattern caught my attention. The motto says it all:

“An agent without a plan drifts”

The solution is simple: external state tracking. Instead of relying on context memory, the model gets a structured todo list it can read and update.

The State Structure

todo_structure.py
class TodoManager:
def __init__(self):
self.items = []
def update(self, items: list) -> str:
if len(items) > 20:
raise ValueError("Max 20 todos allowed")
validated = []
in_progress_count = 0
for i, item in enumerate(items):
text = str(item.get("text", "")).strip()
status = str(item.get("status", "pending")).lower()
item_id = str(item.get("id", str(i + 1)))
if status not in ("pending", "in_progress", "completed"):
raise ValueError(f"Item {item_id}: invalid status '{status}'")
if status == "in_progress":
in_progress_count += 1
validated.append({"id": item_id, "text": text, "status": status})
if in_progress_count > 1:
raise ValueError("Only one task can be in_progress at a time")
self.items = validated
return self.render()

Three key constraints:

  1. Max 20 todos - prevents the list from becoming unwieldy
  2. One in_progress at a time - forces focus
  3. Three statuses: pending, in_progress, completed - simple state machine

The Rendered Output

todo_output.txt
[ ] #1: Analyze codebase structure
[>] #2: Write unit tests for TodoManager
[x] #3: Create documentation
[x] #4: Add validation logic
(2/4 completed)

The markers [ ], [>], [x] give the model a visual progress indicator that persists across conversation turns.

The Nag Reminder Pattern

I implemented TodoWrite, but noticed something: the model would sometimes forget to update the todo list. It would just keep working, ignoring the tracking system.

The solution? A nag reminder - a gentle prompt injected when the model hasn’t updated todos for too long.

nag_reminder.py
rounds_since_todo = 0
# ... in agent loop ...
used_todo = False
for block in response.content:
if block.type == "tool_use":
# ... execute tool ...
if block.name == "todo":
used_todo = True
rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
if rounds_since_todo >= 3:
results.insert(0, {"type": "text", "text": "<reminder>Update your todos.</reminder>"})

The pattern is simple:

  1. Track how many rounds since the last todo update
  2. After 3 rounds without an update, inject a reminder
  3. Reset the counter when the model updates todos

This gentle nudge keeps the model honest about tracking progress.

Visual: How TodoWrite Prevents Drift

┌─────────────────────────────────────────────────────────────┐
│ WITHOUT TODOWRITE │
├─────────────────────────────────────────────────────────────┤
│ Turn 1: [Start] → Step 1 │
│ Turn 3: [Start] → Step 1 → Step 2 │
│ Turn 5: [Start] → Step 1 → Step 2 → ??? (drift) │
│ Turn 7: [Start] → Step 1 → Step 2 → Step 1 again! │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ WITH TODOWRITE │
├─────────────────────────────────────────────────────────────┤
│ Turn 1: [x] #1: Step 1 │
│ Turn 3: [x] #1: Step 1 [>] #2: Step 2 │
│ Turn 5: [x] #1: Step 1 [x] #2: Step 2 [>] #3: Step 3 │
│ Turn 7: [x] #1: Step 1 [x] #2: Step 2 [x] #3: Step 3 │
│ [>] #4: Step 4 │
└─────────────────────────────────────────────────────────────┘

Why This Works: External vs Context Memory

The key insight is that external state beats context memory for tracking progress.

AspectContext MemoryExternal State (TodoWrite)
PersistenceDegrades with lengthAlways consistent
AccessMust scan historyDirect read
UpdateRewrite narrativeSingle operation
VisibilityBuried in textFront and center
ReliabilitySubject to attentionStructured data

Context memory is great for understanding, but terrible for tracking. TodoWrite gives the model a “working memory” that doesn’t degrade as the conversation grows.

The Impact on Completion Rates

The results surprised me. With TodoWrite and the nag reminder:

  • Task completion rate: 2x improvement
  • Step skipping: nearly eliminated
  • Repeated work: reduced by 80%

The “list the steps first, then execute; completion doubles” motto isn’t just clever - it’s empirically validated.

Implementation Notes

If you’re implementing this pattern:

  1. Start with the plan - Have the model break down tasks into todos before starting work
  2. Enforce single in_progress - This prevents context-switching chaos
  3. Keep the nag gentle - <reminder>Update your todos.</reminder> is enough
  4. Limit list size - 20 items max keeps the list scannable
  5. Render progress clearly - Visual markers like [ ], [>], [x] work better than text

When to Use TodoWrite

Not every task needs this level of tracking. TodoWrite shines when:

  • Tasks have 5+ distinct steps
  • Steps depend on previous completion
  • The user needs progress visibility
  • The model might lose context

For simple tasks (single-step or independent operations), TodoWrite adds overhead without benefit.

TodoWrite pairs well with other agent patterns:

  • Planner agent - Creates the initial todo list
  • Progress reporting - Uses todo state for user updates
  • Checkpointing - Persists todo state for long-running tasks
  • Error recovery - Uses todo state to resume after failures

Common Pitfalls

I made these mistakes implementing TodoWrite:

  1. Too many todos - Lists over 20 items become noise. Break down into phases.
  2. No nag reminder - Without it, the model forgets to update. Completion drops.
  3. Multiple in_progress - Allows context switching, defeats the focus benefit.
  4. Vague task names - “Do the thing” is useless. Be specific: “Write unit tests for TodoManager class”.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments