How to Manage Context and Memory in AI Coding Assistants Effectively
The Problem
My AI coding assistant kept forgetting what I told it.
I was working on a React project with Claude Code. I spent the first 30 minutes explaining the architecture, coding standards, and current task. Everything worked great. Then, two hours later, I asked Claude to add a new component—and it suggested patterns that contradicted what I’d explained earlier.
Same thing happened with Cursor. I’d set up conventions at the start of a session, but by the time I was deep in implementation, the AI started generating code that didn’t match my project’s style.
The problem isn’t that AI tools have poor memory. It’s that context windows are finite, and there’s no visible indicator of when you’re running low.
What Happened
I started investigating. Here’s what I found:
- Context fills up silently - No warning when you’re at 70%, 80%, or 90% capacity
- Quality degrades gradually - The AI doesn’t suddenly break; it just gets worse at following earlier instructions
- Information gets pushed out - Earlier context is replaced by newer messages
- No visibility - Most tools don’t show you context usage stats
I found a Reddit thread that nailed the issue:
“We need project level compaction visibility/context stats so we can visibly see when memory/context is becoming a mess and pivot to markdown files for agents to reference.”
This was exactly my problem. I couldn’t see when my context was getting fragmented. I had no idea when to start a fresh session vs. push through.
Why Context Management Matters
AI coding assistants like Claude, Cursor, and GitHub Copilot operate within token limits. Claude has a 200K token context window. That sounds like a lot, but:
Initial system prompt: ~5,000 tokensYour project structure: ~10,000 tokensRecent code you shared: ~30,000 tokensConversation history: ~50,000 tokensTool outputs (grep, etc): ~20,000 tokens---Total used: ~115,000 tokensRemaining: ~85,000 tokens (43%)As the session continues, conversation history grows. Earlier instructions get pushed toward the edges of the window. The AI starts losing track of:
- Coding standards you defined at the start
- Architecture decisions from an hour ago
- The specific file you were working on
- Constraints you mentioned in passing
This explains why my AI kept suggesting patterns that contradicted my earlier instructions—the instructions were still technically “in the conversation,” but they’d been pushed to the edges of the context window where they had less influence.
The Solution: Treat Context Like RAM
I changed my approach. Instead of treating AI context as unlimited storage, I started treating it like RAM:
- Valuable but limited
- Needs active management
- Requires externalization for stable data
Here’s the workflow I developed:
+-------------------------------------------------------------+| Context Management Flow |+-------------------------------------------------------------+| || 1. SESSION START || +-> Load .ai/context.md for project context || || 2. WORK PHASE || +-> Monitor context usage || +-> Externalize decisions to markdown || +-> Keep scope focused || || 3. THRESHOLD CHECK (80% used) || +-> Option A: Compact current context || +-> Option B: Save progress, start new session || || 4. SESSION END || +-> Update .ai/context.md with new decisions || |+-------------------------------------------------------------+Strategy 1: Externalize Reference Material
I created a .ai/ directory in my project for reference files:
project/+-- .ai/| +-- context.md # Project overview, tech stack| +-- decisions.md # Architecture decisions| +-- patterns.md # Code patterns to follow| +-- todo.md # Current task breakdownHere’s what my context.md looks like:
## Project Overview- Tech Stack: TypeScript, React, Node.js, PostgreSQL- Testing: Vitest, Playwright- Architecture: Monorepo with shared packages
## Current Feature: User Authentication- OAuth2 with Google/GitHub providers- Session management via JWT- Rate limiting: 100 req/min per user
## Coding Standards- Use functional components with hooks- Prefer composition over inheritance- All async functions need error boundaries
## Recent Decisions- [2026-03-18] Chose Zustand over Redux for simplicity- [2026-03-17] Adopted Zod for runtime validationWhen I start a new session, I reference this file:
Read .ai/context.md for project context, then help me implement the password reset flow.The AI reads the file fresh each session. This information never gets pushed out of context because it’s not in the conversation—it’s externalized.
Strategy 2: Session Hygiene
I implemented these rules:
- One focused task per session - Don’t mix refactoring with new feature development
- Clear session boundaries - Close and reopen for new tasks
- Summarize before context limits - Write progress to markdown before hitting 80%
This last rule was crucial. I needed to know when I was approaching limits.
Strategy 3: Context Monitoring
I built a simple context monitor to visualize my usage:
class ContextMonitor: def __init__(self, max_tokens: int = 200000): self.max_tokens = max_tokens self.current_usage = 0
def add_message(self, tokens: int): self.current_usage += tokens self._check_threshold()
def _check_threshold(self): percentage = (self.current_usage / self.max_tokens) * 100 if percentage > 80: self.alert("Context approaching limit. Consider compaction.") if percentage > 95: self.alert("CRITICAL: Context nearly full. Start new session.")
def health_status(self) -> dict: return { "tokens_used": self.current_usage, "tokens_remaining": self.max_tokens - self.current_usage, "percentage": round(self.current_usage / self.max_tokens * 100, 1), "recommendation": self._get_recommendation() }
def _get_recommendation(self) -> str: pct = (self.current_usage / self.max_tokens) * 100 if pct < 50: return "Context healthy. Continue working." elif pct < 70: return "Context moderate. Consider externalizing reference material." elif pct < 85: return "Context high. Start planning session wrap-up." else: return "Context critical. Summarize progress and start fresh session."This isn’t integrated into any tool (yet), but I run it manually to track my sessions. It gives me visibility I didn’t have before.
Strategy 4: Memory Compaction
When I hit 80% context usage, I use this prompt to compact:
Before we continue, summarize:
1. What have we accomplished in this session?2. What is the current state of the code?3. What is pending or incomplete?4. What critical decisions did we make?
Save this summary to .ai/session-log.md so we can reference it in a fresh session.This captures the important context in an external file. Then I start a new session with:
Read .ai/session-log.md for context continuity. Continue from where we left off.How This Changed My Workflow
Before implementing these strategies, my sessions looked like this:
Time Context AI Quality---- ------- ----------0:00 10% Excellent - follows all instructions0:30 40% Great - occasional drift1:00 60% Good - needs reminders1:30 75% Degraded - repeats questions, forgets patterns2:00 85% Poor - contradicts earlier instructions2:30 90%+ Frustrating - essentially working with fresh AIAfter implementing context management:
Time Context AI Quality---- ------- ----------0:00 10% Excellent - loaded from .ai/context.md0:30 40% Excellent - externalized decisions1:00 60% Great - continue focused work1:15 75% [Session wrap] Summarize to .ai/session-log.md1:20 15% [New session] Excellent - loaded from session-log.mdThe key difference: I never let context quality degrade. I proactively manage it.
Common Mistakes I Made
Mistake 1: Never restarting sessions
I thought long sessions were more efficient because I didn’t have to re-explain things. But context degradation made the AI less effective. Starting fresh with externalized context is actually faster.
Mistake 2: Relying solely on conversation memory
I’d explain my project architecture once and expect the AI to remember. But that information gets pushed out. External files are reliable; conversation history is volatile.
Mistake 3: Ignoring token usage
I had no idea how much context I was using. Without visibility, I’d hit problems unexpectedly. The context monitor gave me control.
Mistake 4: Copying entire files into chat
This wastes tokens on boilerplate. I learned to:
- Reference file paths instead of pasting content
- Ask AI to read files directly (IDE tools can do this)
- Show diffs instead of full files when discussing changes
Mistake 5: Long, unfocused sessions
Mixing refactoring, bug fixes, and feature development in one session creates context pollution. Each task type has different context needs.
Performance Impact
The cost of poor context management isn’t just frustration. It’s real:
| Issue | Impact |
|---|---|
| Degraded context | Lower code quality, more bugs |
| Information loss | AI hallucinates or contradicts earlier decisions |
| Re-explaining | Wasted time repeating context |
| Large contexts | Higher API costs (token usage) |
Efficient context management reduces token usage and improves output quality.
Summary
In this post, I showed how to manage context and memory in AI coding assistants. The key points are:
- Context windows are finite - 200K tokens sounds like a lot, but it fills quickly
- No visibility is the real problem - You can’t manage what you can’t see
- Externalize stable information - Use markdown files for context that shouldn’t change
- Monitor your usage - Track when you’re approaching limits
- Session hygiene matters - Focused sessions with clear boundaries work better than marathon sessions
The mental model that helped me most: treat AI context like RAM, not hard drive storage. It’s fast but limited. Keep what you need, externalize what you can, and restart before degradation hits.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments