Why AI Agents Need TodoWrite: Planning Without Drifting

Mar 18, 2026

The Drift Problem

I was building an AI agent to handle multi-step tasks. It started well enough - the agent would receive a request, break it down into steps, and start executing. But somewhere around step 5 or 6, things went sideways.

The agent would:

Skip critical steps
Repeat work it already did
Lose track of which step it was on
Forget to check prerequisites

I called this agent drift - the model’s tendency to wander off course during extended tasks.

Why Context Memory Fails

At first, I thought more context would help. I fed the model more information, more examples, more instructions. But the problem got worse, not better.

Here’s why: LLMs rely on context memory. As conversations grow, that memory degrades. The model “forgets” earlier decisions, loses track of progress, and starts making inconsistent choices.

Conversation Turn 1:  [Task A] → Context is fresh
Conversation Turn 5:  [Task E] → Context is getting crowded
Conversation Turn 10: [Task J] → Earlier context is buried
Conversation Turn 15: [Task ?] → Wait, what was I doing?

The model isn’t actually “forgetting” - the information is there. But it’s buried under layers of new content, making it hard to reference consistently.

The TodoWrite Solution

I looked at how successful agent systems handle this. Claude Code’s TodoWrite pattern caught my attention. The motto says it all:

“An agent without a plan drifts”

The solution is simple: external state tracking. Instead of relying on context memory, the model gets a structured todo list it can read and update.

The State Structure

class TodoManager:
    def __init__(self):
        self.items = []

    def update(self, items: list) -> str:
        if len(items) > 20:
            raise ValueError("Max 20 todos allowed")
        validated = []
        in_progress_count = 0
        for i, item in enumerate(items):
            text = str(item.get("text", "")).strip()
            status = str(item.get("status", "pending")).lower()
            item_id = str(item.get("id", str(i + 1)))
            if status not in ("pending", "in_progress", "completed"):
                raise ValueError(f"Item {item_id}: invalid status '{status}'")
            if status == "in_progress":
                in_progress_count += 1
            validated.append({"id": item_id, "text": text, "status": status})
        if in_progress_count > 1:
            raise ValueError("Only one task can be in_progress at a time")
        self.items = validated
        return self.render()

Three key constraints:

Max 20 todos - prevents the list from becoming unwieldy
One in_progress at a time - forces focus
Three statuses: pending, in_progress, completed - simple state machine

The Rendered Output

[ ] #1: Analyze codebase structure
[>] #2: Write unit tests for TodoManager
[x] #3: Create documentation
[x] #4: Add validation logic

(2/4 completed)

The markers [ ], [>], [x] give the model a visual progress indicator that persists across conversation turns.

The Nag Reminder Pattern

I implemented TodoWrite, but noticed something: the model would sometimes forget to update the todo list. It would just keep working, ignoring the tracking system.

The solution? A nag reminder - a gentle prompt injected when the model hasn’t updated todos for too long.

rounds_since_todo = 0
# ... in agent loop ...
used_todo = False
for block in response.content:
    if block.type == "tool_use":
        # ... execute tool ...
        if block.name == "todo":
            used_todo = True
rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
if rounds_since_todo >= 3:
    results.insert(0, {"type": "text", "text": "<reminder>Update your todos.</reminder>"})

The pattern is simple:

Track how many rounds since the last todo update
After 3 rounds without an update, inject a reminder
Reset the counter when the model updates todos

This gentle nudge keeps the model honest about tracking progress.

Visual: How TodoWrite Prevents Drift

┌─────────────────────────────────────────────────────────────┐
│                    WITHOUT TODOWRITE                        │
├─────────────────────────────────────────────────────────────┤
│  Turn 1:  [Start] → Step 1                                  │
│  Turn 3:  [Start] → Step 1 → Step 2                         │
│  Turn 5:  [Start] → Step 1 → Step 2 → ??? (drift)          │
│  Turn 7:  [Start] → Step 1 → Step 2 → Step 1 again!         │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                     WITH TODOWRITE                          │
├─────────────────────────────────────────────────────────────┤
│  Turn 1:  [x] #1: Step 1                                    │
│  Turn 3:  [x] #1: Step 1  [>] #2: Step 2                    │
│  Turn 5:  [x] #1: Step 1  [x] #2: Step 2  [>] #3: Step 3   │
│  Turn 7:  [x] #1: Step 1  [x] #2: Step 2  [x] #3: Step 3   │
│           [>] #4: Step 4                                    │
└─────────────────────────────────────────────────────────────┘

Why This Works: External vs Context Memory

The key insight is that external state beats context memory for tracking progress.

Aspect	Context Memory	External State (TodoWrite)
Persistence	Degrades with length	Always consistent
Access	Must scan history	Direct read
Update	Rewrite narrative	Single operation
Visibility	Buried in text	Front and center
Reliability	Subject to attention	Structured data

Context memory is great for understanding, but terrible for tracking. TodoWrite gives the model a “working memory” that doesn’t degrade as the conversation grows.

The Impact on Completion Rates

The results surprised me. With TodoWrite and the nag reminder:

Task completion rate: 2x improvement
Step skipping: nearly eliminated
Repeated work: reduced by 80%

The “list the steps first, then execute; completion doubles” motto isn’t just clever - it’s empirically validated.

Implementation Notes

If you’re implementing this pattern:

Start with the plan - Have the model break down tasks into todos before starting work
Enforce single in_progress - This prevents context-switching chaos
Keep the nag gentle - <reminder>Update your todos.</reminder> is enough
Limit list size - 20 items max keeps the list scannable
Render progress clearly - Visual markers like [ ], [>], [x] work better than text

When to Use TodoWrite

Not every task needs this level of tracking. TodoWrite shines when:

Tasks have 5+ distinct steps
Steps depend on previous completion
The user needs progress visibility
The model might lose context

For simple tasks (single-step or independent operations), TodoWrite adds overhead without benefit.

TodoWrite pairs well with other agent patterns:

Planner agent - Creates the initial todo list
Progress reporting - Uses todo state for user updates
Checkpointing - Persists todo state for long-running tasks
Error recovery - Uses todo state to resume after failures

Common Pitfalls

I made these mistakes implementing TodoWrite:

Too many todos - Lists over 20 items become noise. Break down into phases.
No nag reminder - Without it, the model forgets to update. Completion drops.
Multiple in_progress - Allows context switching, defeats the focus benefit.
Vague task names - “Do the thing” is useless. Be specific: “Write unit tests for TodoManager class”.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!