Stop Wasting Tokens: Use a HANDOFF File Instead of /compact in Claude Code

Jun 29, 2026

Problem

I had a four-hour Claude Code session going, took a lunch break, came back, and hit “resume.” The next prompt ate the rest of my daily limit.

I asked myself: what just happened? I had a perfectly good long thread with all the context in it. Why did resuming it cost so much?

The answer was that the thread’s prompt cache had expired, and resuming it re-paid the cost of the entire history. /compact would have shrunk the in-session context but would not have helped with the resume cost. The fix was a HANDOFF file at the end of long sessions, not /compact.

In this post, I will show why /compact is the wrong tool, what a HANDOFF file looks like, and how to use one to cut the resume cost from a full history to a few hundred tokens.

Environment

Claude Pro plan ($20/month)
Claude Code CLI on macOS
Sonnet 4.6 as the default model

What happened?

I was working on a Flask project. The session was about four hours old. I had /compacted once mid-session to keep the in-context size down, and it seemed to work. Then I went to lunch, came back an hour and a half later, and sent a normal follow-up prompt. The budget vanished.

Here is what I saw when I ran /usage mid-session before lunch:

context: 100k tokens
  - claude.md + system: ~5k
  - session body: ~95k

The session body, not the system prompt, was most of the size. biggest_muzzy (score 5) confirmed this in the r/ClaudeAI thread:

“try using the /usage command in the middle of a session. You might see something like 100k in context (half of the full context where CC starts to compact), with like 5k tokens for claude.md and stuff, and the rest being the body of your session.”

The mechanism is documented in the thread. jomi-se (score 7) said: “conversations with claude code are held on cache for 1h when on a subscription. If you restart an existing long conversation after that you’ll re-pay the usage cost of the full conversation!”

Note: the exact cache TTL varies by source. The mod-bot in the thread estimated 5-10 minutes, jomi-se said about an hour. The authoritative number is in Anthropic’s prompt caching docs. The exact number does not change the fix: do not rely on the cache to make a long resume cheap.

Here is what the token accumulation looks like across an agent loop:

Stacked bar chart: cumulative token usage across 10 rounds of agent tool calls, breaking down system prompt, tool descriptions, tool calls, and conversation history

The session body, not the system prompt, is what gets re-paid on resume. That is the silent cost.

The architecture that motivates HANDOFF

The reason HANDOFF works comes down to how a Claude Code session is actually shaped. I think of it as three slots:

Slot A: system prompt. Small and fixed. The harness rebuilds it on every turn.
Slot B: per-task working memory. Medium. The current plan, recent tool results, the active diff.
Slot C: on-demand retrieval. Large. The repo, prior sessions, external docs. The harness only reads from it when Slot B points at a specific file or doc.

Three-slot context window architecture: Slot A (small fixed system prompt) on top, Slot B (medium per-task working memory with plan, recent tool results, and current diff) in the middle, Slot C (large on-demand retrieval from files, prior sessions, and external docs) on the bottom, with arrows showing the harness rebuilding A and B on every turn before querying C

The diagram above is the architecture that justifies HANDOFF. A fresh session fed by a small HANDOFF file is a clean Slot B. Resuming a long thread forces the harness to rebuild Slot B by re-reading the entire session history, which is exactly the cost we are trying to avoid.

How to solve it?

I stopped using /compact for cost reasons and started using HANDOFF files instead. The pattern is a three-step loop.

Step 1: End-of-session dump

When the session gets long or before I step away, I ask Claude to write a HANDOFF-{ISO timestamp}.md file. spdustin (score 12) defined the canonical structure:

“I never, ever /compact. I always ask Claude to create a HANDOFF-yyyy-mm-dd-hhmm.md file that contains: Updated understanding of the project … Work accomplished … Architecture decisions and the rationale behind them … Mistakes made and lessons learned … TODO items remaining … Potential issues on the horizon … Links to relevant docs and skills that were used/updated … Subjective read on User’s mood regarding Claude’s work.”

My end-of-session command:

> Write HANDOFF-2026-06-29-1430.md in the project root.
> Use the template we've used before. Include: project goal,
> current state per feature, architecture decisions and
> rationale, mistakes and lessons, open TODOs, known gotchas,
> links to the 5-10 most relevant files, and a read on
> how the user feels about the last batch of work.

Here is what a real HANDOFF file looks like in my projects:

# Project: report-exporter
## Goal
CSV and PDF export service for the reporting dashboard. Target users
are ops analysts who need scheduled exports to S3.

## Current state
- auth: cookie + refresh, half-migrated (see src/auth/middleware.py)
- exports: CSV streaming works; PDF not started
- tests: 71% coverage, gaps in src/exports/

## Architecture decisions
- Use SQLAlchemy Core (no ORM) for new tables; see ADR-0003
- Background jobs via the existing in-house queue, not Celery

## Mistakes / lessons
- Don't pin pydantic v2.8 in CI; it breaks our migrations
- The "staging" env actually points at prod's read replica

## TODOs
- [ ] Finish auth migration (src/auth/session.py → cookie store)
- [ ] Add CSV export integration test
- [ ] Decide on PDF library (wkhtmltopdf vs weasyprint)

## Gotchas
- The dev DB is reset at 03:00 UTC nightly
- Branch `feature/auth` rebases on main every morning

## Links
- src/auth/middleware.py
- src/exports/csv.py
- docs/adr/0003-sqlalchemy-core.md
- HANDOFF-2026-06-28-1815.md

A HANDOFF that fits on one screen is the goal. A 10k-token HANDOFF defeats the point.

Step 2: Fresh session, paste as first message

Close the current session. Start a new one and paste the HANDOFF file as the first user message:

> Here is the HANDOFF for this project. Treat it as the
> authoritative context. Don't re-read the full repo
> unless you need to confirm a specific detail.
> <paste HANDOFF-2026-06-29-1430.md>
> What's next: <your one-line ask for the day>

travelswithtea (score 1) asked the same question in the thread: “when I start a new session in this same project, do I copy and paste the handoff document into the project directions, or do I just introduce my new Claude Sonnet to it as the first chat?” The answer is the first chat, and the HANDOFF file is the only context the new session needs.

Step 3: Optional persistent project log

Save HANDOFFs to a project directory (or Google Drive, like travelswithtea does) so I have a per-project timeline and a human-readable record of decisions. Greppable, shareable, and a real artifact independent of the model.

Quick check on what the cache is doing

I still run /usage mid-session to see the cost shape:

> /usage
# Look at "context" vs "body" — most of the size is the
# session body, not your CLAUDE.md or system prompt.

If “body” is creeping past half the context window, that is the signal to write a HANDOFF and break for the day.

The reason

I think the key reasons HANDOFF beats /compact and resuming are:

/compact does not re-prime the prompt cache. It shrinks the in-session context, but the next long reply still has to read the compacted prefix. After a few compactions, you have paid for several full re-reads and lost the details you wanted to keep.
Resuming a long thread re-pays the history. Once the cache window expires, every part of the old conversation has to be re-read. That is the silent cost the biggest_muzzy /usage snapshot shows.
A HANDOFF file is the right shape of context. It is project state, decisions, TODOs, gotchas, and links. The model does not need the previous turn-by-turn reasoning to do useful work; it needs the destination.
HANDOFF is a multiplier on every other optimization. Off-peak hours, model tiering, and prompt caching all assume a workable session boundary. HANDOFF defines that boundary cheaply.

Common mistakes

Treating /compact as the answer. It reduces in-session size but does not re-prime the prompt cache.
Resuming a 4-hour thread after lunch because “it has all the context in it.” The context is there, but the cost of re-reading it is the entire conversation.
Writing HANDOFFs that are too verbose. A 10k-token HANDOFF defeats the point. Keep it to one screen of plain text.
Skipping the “links to relevant files” section. This is the most valuable part of a HANDOFF. File paths let the new session ground itself in the actual code instantly.
Switching models in the middle of a session instead of ending with a HANDOFF and starting a new session on the new model. Mixing model “personalities” mid-task is what loses working memory.

Summary

In this post, I showed why /compact is the wrong tool for long Claude Code sessions and how a HANDOFF file at the end of each session cuts the resume cost from a full history to a few hundred tokens. The key point is that the single highest-leverage habit on a Claude Pro plan is to end long sessions with a HANDOFF file and start the next session on it, instead of relying on /compact or resuming a stale thread.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit Discussion: How do you get the most out of your $20 Pro plan?
👨‍💻 Anthropic prompt caching documentation
👨‍💻 Claude Code /compact and /usage reference

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!