How to Manage Context and Memory in AI Coding Assistants Effectively

Mar 19, 2026

The Problem

My AI coding assistant kept forgetting what I told it.

I was working on a React project with Claude Code. I spent the first 30 minutes explaining the architecture, coding standards, and current task. Everything worked great. Then, two hours later, I asked Claude to add a new component—and it suggested patterns that contradicted what I’d explained earlier.

Same thing happened with Cursor. I’d set up conventions at the start of a session, but by the time I was deep in implementation, the AI started generating code that didn’t match my project’s style.

The problem isn’t that AI tools have poor memory. It’s that context windows are finite, and there’s no visible indicator of when you’re running low.

What Happened

I started investigating. Here’s what I found:

Context fills up silently - No warning when you’re at 70%, 80%, or 90% capacity
Quality degrades gradually - The AI doesn’t suddenly break; it just gets worse at following earlier instructions
Information gets pushed out - Earlier context is replaced by newer messages
No visibility - Most tools don’t show you context usage stats

I found a Reddit thread that nailed the issue:

“We need project level compaction visibility/context stats so we can visibly see when memory/context is becoming a mess and pivot to markdown files for agents to reference.”

This was exactly my problem. I couldn’t see when my context was getting fragmented. I had no idea when to start a fresh session vs. push through.

Why Context Management Matters

AI coding assistants like Claude, Cursor, and GitHub Copilot operate within token limits. Claude has a 200K token context window. That sounds like a lot, but:

Initial system prompt:     ~5,000 tokens
Your project structure:    ~10,000 tokens
Recent code you shared:    ~30,000 tokens
Conversation history:      ~50,000 tokens
Tool outputs (grep, etc):  ~20,000 tokens
---
Total used:                ~115,000 tokens
Remaining:                 ~85,000 tokens (43%)

As the session continues, conversation history grows. Earlier instructions get pushed toward the edges of the window. The AI starts losing track of:

Coding standards you defined at the start
Architecture decisions from an hour ago
The specific file you were working on
Constraints you mentioned in passing

This explains why my AI kept suggesting patterns that contradicted my earlier instructions—the instructions were still technically “in the conversation,” but they’d been pushed to the edges of the context window where they had less influence.

The Solution: Treat Context Like RAM

I changed my approach. Instead of treating AI context as unlimited storage, I started treating it like RAM:

Valuable but limited
Needs active management
Requires externalization for stable data

Here’s the workflow I developed:

+-------------------------------------------------------------+
|                    Context Management Flow                   |
+-------------------------------------------------------------+
|                                                             |
|  1. SESSION START                                           |
|     +-> Load .ai/context.md for project context             |
|                                                             |
|  2. WORK PHASE                                              |
|     +-> Monitor context usage                               |
|     +-> Externalize decisions to markdown                   |
|     +-> Keep scope focused                                  |
|                                                             |
|  3. THRESHOLD CHECK (80% used)                              |
|     +-> Option A: Compact current context                    |
|     +-> Option B: Save progress, start new session          |
|                                                             |
|  4. SESSION END                                             |
|     +-> Update .ai/context.md with new decisions            |
|                                                             |
+-------------------------------------------------------------+

Strategy 1: Externalize Reference Material

I created a .ai/ directory in my project for reference files:

project/
+-- .ai/
|   +-- context.md          # Project overview, tech stack
|   +-- decisions.md        # Architecture decisions
|   +-- patterns.md         # Code patterns to follow
|   +-- todo.md            # Current task breakdown

Here’s what my context.md looks like:

## Project Overview
- Tech Stack: TypeScript, React, Node.js, PostgreSQL
- Testing: Vitest, Playwright
- Architecture: Monorepo with shared packages

## Current Feature: User Authentication
- OAuth2 with Google/GitHub providers
- Session management via JWT
- Rate limiting: 100 req/min per user

## Coding Standards
- Use functional components with hooks
- Prefer composition over inheritance
- All async functions need error boundaries

## Recent Decisions
- [2026-03-18] Chose Zustand over Redux for simplicity
- [2026-03-17] Adopted Zod for runtime validation

When I start a new session, I reference this file:

Read .ai/context.md for project context, then help me implement the password reset flow.

The AI reads the file fresh each session. This information never gets pushed out of context because it’s not in the conversation—it’s externalized.

Strategy 2: Session Hygiene

I implemented these rules:

One focused task per session - Don’t mix refactoring with new feature development
Clear session boundaries - Close and reopen for new tasks
Summarize before context limits - Write progress to markdown before hitting 80%

This last rule was crucial. I needed to know when I was approaching limits.

Strategy 3: Context Monitoring

I built a simple context monitor to visualize my usage:

class ContextMonitor:
    def __init__(self, max_tokens: int = 200000):
        self.max_tokens = max_tokens
        self.current_usage = 0

    def add_message(self, tokens: int):
        self.current_usage += tokens
        self._check_threshold()

    def _check_threshold(self):
        percentage = (self.current_usage / self.max_tokens) * 100
        if percentage > 80:
            self.alert("Context approaching limit. Consider compaction.")
        if percentage > 95:
            self.alert("CRITICAL: Context nearly full. Start new session.")

    def health_status(self) -> dict:
        return {
            "tokens_used": self.current_usage,
            "tokens_remaining": self.max_tokens - self.current_usage,
            "percentage": round(self.current_usage / self.max_tokens * 100, 1),
            "recommendation": self._get_recommendation()
        }

    def _get_recommendation(self) -> str:
        pct = (self.current_usage / self.max_tokens) * 100
        if pct < 50:
            return "Context healthy. Continue working."
        elif pct < 70:
            return "Context moderate. Consider externalizing reference material."
        elif pct < 85:
            return "Context high. Start planning session wrap-up."
        else:
            return "Context critical. Summarize progress and start fresh session."

This isn’t integrated into any tool (yet), but I run it manually to track my sessions. It gives me visibility I didn’t have before.

Strategy 4: Memory Compaction

When I hit 80% context usage, I use this prompt to compact:

Before we continue, summarize:

1. What have we accomplished in this session?
2. What is the current state of the code?
3. What is pending or incomplete?
4. What critical decisions did we make?

Save this summary to .ai/session-log.md so we can reference it in a fresh session.

This captures the important context in an external file. Then I start a new session with:

Read .ai/session-log.md for context continuity. Continue from where we left off.

How This Changed My Workflow

Before implementing these strategies, my sessions looked like this:

Time  Context  AI Quality
----  -------  ----------
0:00  10%      Excellent - follows all instructions
0:30  40%      Great - occasional drift
1:00  60%      Good - needs reminders
1:30  75%      Degraded - repeats questions, forgets patterns
2:00  85%      Poor - contradicts earlier instructions
2:30  90%+     Frustrating - essentially working with fresh AI

After implementing context management:

Time  Context  AI Quality
----  -------  ----------
0:00  10%      Excellent - loaded from .ai/context.md
0:30  40%      Excellent - externalized decisions
1:00  60%      Great - continue focused work
1:15  75%      [Session wrap] Summarize to .ai/session-log.md
1:20  15%      [New session] Excellent - loaded from session-log.md

The key difference: I never let context quality degrade. I proactively manage it.

Common Mistakes I Made

Mistake 1: Never restarting sessions

I thought long sessions were more efficient because I didn’t have to re-explain things. But context degradation made the AI less effective. Starting fresh with externalized context is actually faster.

Mistake 2: Relying solely on conversation memory

I’d explain my project architecture once and expect the AI to remember. But that information gets pushed out. External files are reliable; conversation history is volatile.

Mistake 3: Ignoring token usage

I had no idea how much context I was using. Without visibility, I’d hit problems unexpectedly. The context monitor gave me control.

Mistake 4: Copying entire files into chat

This wastes tokens on boilerplate. I learned to:

Reference file paths instead of pasting content
Ask AI to read files directly (IDE tools can do this)
Show diffs instead of full files when discussing changes

Mistake 5: Long, unfocused sessions

Mixing refactoring, bug fixes, and feature development in one session creates context pollution. Each task type has different context needs.

Performance Impact

The cost of poor context management isn’t just frustration. It’s real:

Issue	Impact
Degraded context	Lower code quality, more bugs
Information loss	AI hallucinates or contradicts earlier decisions
Re-explaining	Wasted time repeating context
Large contexts	Higher API costs (token usage)

Efficient context management reduces token usage and improves output quality.

Summary

In this post, I showed how to manage context and memory in AI coding assistants. The key points are:

Context windows are finite - 200K tokens sounds like a lot, but it fills quickly
No visibility is the real problem - You can’t manage what you can’t see
Externalize stable information - Use markdown files for context that shouldn’t change
Monitor your usage - Track when you’re approaching limits
Session hygiene matters - Focused sessions with clear boundaries work better than marathon sessions

The mental model that helped me most: treat AI context like RAM, not hard drive storage. It’s fast but limited. Keep what you need, externalize what you can, and restart before degradation hits.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: Context Stats for AI Coding Assistants

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!