SOUL.md Rules Ignored? Fix Context Drift in Long Chats

Mar 30, 2026

“Absolutely! I’d be happy to help you with that…”

Wait. I explicitly told my SOUL.md to never use “absolutely” or “I’d be happy to.” It worked for the first 15 messages. What happened?

The Problem

I set up my SOUL.md file with clear rules:

## Response Rules
- Maximum 3 sentences unless code is requested
- No "absolutely", "certainly", "I'd be happy to"
- Direct answers only - no preamble

Testing it worked great. First 10 messages? Perfect. Short, direct, no filler.

But then I noticed something odd. Around message 20, “absolutely” started creeping back. By message 40, responses were getting longer. My rules were being… ignored?

What I Tried First (And Failed)

Attempt 1: Make SOUL.md More Explicit

## Response Rules
- Maximum 3 sentences unless code is requested
- No "absolutely", "certainly", "I'd be happy to"
- Direct answers only - no preamble
- REALLY MEAN IT - NO FILLER PHRASES

Result: Same drift.

Attempt 2: Add More Rules

I added sections for tone, style, formatting. More detail. More specificity.

Result: Drifted even faster. The file was now bigger, taking up more tokens, but somehow less effective.

The Realization

I found a discussion on Reddit’s r/OpenClaw community that explained exactly what I was experiencing:

“SOUL.md loaded once at session start, gets drowned out by thousands of tokens of recent conversation”

And then the technical explanation:

“Model pays more attention to last 15 messages than system prompt from beginning”

Oh. It’s not that my rules were bad. It’s that the model’s attention shifted.

How LLM Attention Actually Works

LLMs don’t treat all context equally. The attention mechanism gives disproportionate weight to recent tokens.

Here’s what happens as your session grows:

Session Start (5 messages)
┌─────────────────────────────────────────┐
│  SOUL.md ████████ 40%                   │
│  Recent msgs ████ 20%                   │
│  Other context ████████ 40%             │
└─────────────────────────────────────────┘

Session Middle (25 messages)
┌─────────────────────────────────────────┐
│  SOUL.md ████ 10%                       │
│  Recent msgs ████████████████████ 50%   │
│  Other context ████████████ 40%         │
└─────────────────────────────────────────┘

Long Session (50+ messages)
┌─────────────────────────────────────────┐
│  SOUL.md █ 3%                           │
│  Recent msgs ████████████████████████ 70%│
│  Other context ████████████████ 27%     │
└─────────────────────────────────────────┘

This isn’t a bug. It’s how transformer attention works. The model patterns behavior on what it sees most recently.

The Analogy

Think of SOUL.md like a job description handed out on day one. By week three, nobody’s consulting that document - they’re doing what “feels right” based on recent patterns and habits.

The same applies to AI sessions. Your rules aren’t “forgotten” - they’re just… drowned out.

What Actually Works

Solution 1: Keep SOUL.md Small

I cut my SOUL.md from 300 lines to under 100:

# Core Identity
You are a concise, direct assistant.

## Response Rules
- Max 3 sentences
- No filler phrases (absolutely, certainly, happy to)
- Direct answers only

## See Also
- ./rules/style-enforcement.md
- ./rules/technical-detail.md

The modular approach means less token bloat in the main file.

Solution 2: Session Resets

I started resetting every 20-25 messages for critical work:

┌──────────────────────────────────────────────────────────┐
│  Session 1: Messages 1-25                                 │
│  ├── Rules strong (40% attention at start)               │
│  ├── Rules fading (10% by message 20)                   │
│  └── Reset before drift becomes problematic             │
│                                                          │
│  Session 2: Fresh start                                  │
│  ├── SOUL.md re-loaded at full attention                 │
│  └── Carry forward only essential context               │
└──────────────────────────────────────────────────────────┘

Solution 3: Hook-Based Reinforcement (For Developers)

If you’re building on top of LLMs, inject reminders mid-session:

Every 10 messages:
  ┌─────────────────────────────────────┐
  │  Inject system reminder:            │
  │  "Remember: concise, no filler"     │
  │  (Not buried at start, but active)  │
  └─────────────────────────────────────┘

This isn’t something end-users can do with SOUL.md directly, but it’s how platform developers should think about rule persistence.

Common Mistakes I Made

Assumed SOUL.md is “always active” - It’s only as active as the model’s attention to it
Tested only short sessions - 5-message tests won’t reveal drift problems
Added more rules instead of fewer - More tokens = more to ignore
No reinforcement strategy - Expected one-time loading to persist forever

Comparison of Solutions

┌────────────────────────┬──────────────┬───────────────┐
│ Approach               │ Complexity   │ Effectiveness │
├────────────────────────┼──────────────┼───────────────┤
│ Session resets         │ Low          │ Medium        │
│ Hook-based injection   │ Medium       │ High          │
│ Modular file structure │ Low          │ Medium-High   │
│ Shorter SOUL.md        │ Low          │ Medium        │
│ Compaction strategies  │ High         │ High          │
└────────────────────────┴──────────────┴───────────────┘

The Takeaway

SOUL.md drift isn’t a bug - it’s an inherent property of how LLMs allocate attention across long contexts.

The fix isn’t to fight the architecture. It’s to work with it:

Keep SOUL.md under 150 lines
Split into modules for topic-specific rules
Reset sessions every 20-25 messages for critical work
Test in realistic conditions (50+ message sessions)

Next step: Open your SOUL.md right now. Is it one monolithic file? Split it. Are you only testing short conversations? Run a 50-message test and watch the drift happen in real-time.

Context Window Management - The same attention drift affects all long-context scenarios, not just SOUL.md
Prompt Engineering - Understanding attention helps explain why “say X” works better at the end of a prompt than the beginning
RAG Systems - Retrieved context placed at the end of prompts for similar reasons

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!