SOUL.md Structure Guide: Optimize File Layout for Better LLM Agent Compliance
I spent three hours debugging why my AI agent kept deleting files without confirmation. I had the rule right there at the top of my SOUL.md file:
# Hard Rules
- NEVER delete files without user confirmation- ALWAYS ask before executing destructive operations
# Personality
I am a helpful coding assistant...The agent ignored it completely. After moving that same rule to the bottom of the file? Compliance improved instantly.
The Problem: LLM Attention Decay
LLMs don’t read prompts like humans do. They exhibit “recency bias” - paying more attention to content at the end of a prompt than content in the middle or beginning.
Here’s what attention distribution looks like across a typical prompt:
┌─────────────────────────────────────────────────────────────┐│ PROMPT CONTENT │├─────────────────────────────────────────────────────────────┤│ ││ BEGINNING ││ ████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ (Medium) ││ ││ MIDDLE ││ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ (Lowest) ││ ││ END ││ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██████████████ (High) ││ │└─────────────────────────────────────────────────────────────┘
█ = Attention strength░ = Attention weaknessThis explains why my file-deletion rule at the top was being ignored. The agent’s attention had already decayed by the time it started generating responses.
My Trial-and-Error Process
I tested three different SOUL.md structures over several days:
Attempt 1: Rules First (Failed)
┌────────────────────────┐│ Hard Rules │ ← Agent ignored these├────────────────────────┤│ Personality │├────────────────────────┤│ Communication Style │└────────────────────────┘
Compliance Rate: ~40%Attempt 2: Rules in Middle (Failed)
┌────────────────────────┐│ Personality │├────────────────────────┤│ Hard Rules │ ← Still in the "dead zone"├────────────────────────┤│ Communication Style │└────────────────────────┘
Compliance Rate: ~50%Attempt 3: Rules Last (Success!)
┌────────────────────────┐│ Personality │├────────────────────────┤│ Communication Style │├────────────────────────┤│ Hard Rules │ ← Maximum attention here!└────────────────────────┘
Compliance Rate: ~95%The improvement was dramatic. Moving hard rules to the end boosted compliance from 40% to 95%.
Why This Works: Transformer Attention Mechanics
The recency bias stems from how transformer attention mechanisms process sequential information:
- Self-Attention Mechanism: When generating each token, the model computes attention weights across all previous tokens in the prompt
- Positional Encoding: Tokens closer to the current generation position naturally receive higher attention weights
- Context Window Pressure: As response length increases, earlier context becomes “distant” from the generation window
Prompt: [P1][P2][P3]...[P50][P51][P52]...[P99][P100] ↑ ↑ Middle End │ │ ▼ ▼Attention: Low High (near generation)When the model is generating a response, it’s looking backward from its current position. The end of the prompt is always “close” to where the model is working.
The Optimal SOUL.md Structure
Based on my experiments and Reddit community insights, here’s the template that works best:
# Soul
I am [agent-name], a [role description].
## Personality
- [Trait 1]- [Trait 2]- [Trait 3]
## Values
- [Value 1]- [Value 2]- [Value 3]
## Communication Style
- [Style guideline 1]- [Style guideline 2]- [Style guideline 3]
## Hard Rules
- NEVER [forbidden action 1]- ALWAYS [required action 1]- NEVER [forbidden action 2]- ALWAYS [required action 2]
Before every response, silently re-read and apply all rules above.The key ordering principle:
┌─────────────────────────────────────┐│ ││ SECTION PRIORITY ││ ─────────────────────────────────││ Personality LOW ││ Values LOW ││ Communication Style MEDIUM ││ Hard Rules HIGH ││ Reinforcement Line HIGHEST ││ │└─────────────────────────────────────┘Advanced Strategy: Two-Tier Rule System
A Reddit commenter suggested an even better approach: separating personality from hard constraints using AGENTS.md:
┌─────────────────────────────────────────────────────┐│ ││ SOUL.md AGENTS.md ││ ───────── ────────── ││ - Personality - Safety rules ││ - Communication style - Forbidden actions ││ - Preferences - Operational limits ││ ││ Purpose: "Who I am" Purpose: "What I must ││ never/always do" ││ ││ Model sees as: Suggestions Model sees as: ││ Hard constraints ││ │└─────────────────────────────────────────────────────┘SOUL.md Example (Personality Focus)
# Soul
I am Claude, a helpful coding assistant created by Anthropic.
## Personality
- Thoughtful and thorough in analysis- Direct and matter-of-fact in communication- Proactive in identifying potential issues
## Values
- Code quality over speed- Security as a primary concern- Clear documentation as a requirement
## Communication Style
- Explain the "why" before the "what"- Provide code examples with context- Use analogies for complex concepts
Before every response, silently re-read and apply all rules above.AGENTS.md Example (Hard Constraints Focus)
# Agent Operational Constraints
## Safety Rules
- NEVER execute code without user confirmation- NEVER modify files outside the project directory- NEVER expose secrets or API keys- ALWAYS validate user input before processing
## File Operations
- NEVER delete files without explicit confirmation- ALWAYS create backups before destructive operations- ALWAYS use atomic file operations
## Code Generation
- NEVER generate code that bypasses security- ALWAYS include error handling in generated code- ALWAYS follow the project's coding standards
Before every action, verify compliance with all constraints above.Why Separate Files Works Better
┌──────────────────────────────────────────────────────────────┐│ ││ SINGLE FILE APPROACH ││ ──────────────────── ││ Problem: Rules compete for attention with personality ││ Result: Model may conflate suggestions with requirements ││ ││ TWO-FILE APPROACH ││ ──────────────────── ││ Benefit 1: Semantic clarity (different purposes) ││ Benefit 2: Each file optimized independently ││ Benefit 3: Model treats AGENTS.md as directives ││ Benefit 4: Easier maintenance and updates ││ │└──────────────────────────────────────────────────────────────┘OpenClaw and similar tools load both files but process them differently:
- SOUL.md: Processed as context for personality and style
- AGENTS.md: Processed as constraints that override personality
The Reinforcement Technique
I added one simple line at the end of both files:
Before every response, silently re-read and apply all rules above.This line acts as a “reminder” mechanism:
┌─────────────────────────────────────────────────────────────┐│ ││ WITHOUT REINFORCEMENT ││ ───────────────────── ││ [Rules] → [Processing] → [Response] ││ ││ WITH REINFORCEMENT ││ ────────────────── ││ [Rules] → [Processing] → [Re-read] → [Response] ││ ↑ ││ │ ││ Forces attention back ││ to critical rules ││ │└─────────────────────────────────────────────────────────────┘This simple addition improved compliance by another 5-10% in my testing.
Practical Implementation Checklist
When creating or updating your SOUL.md / AGENTS.md:
□ Identify hard rules (things that MUST happen/NEVER happen)□ Separate personality traits from operational constraints□ Place personality in SOUL.md□ Place hard rules in AGENTS.md (or at END of SOUL.md if single file)□ Add reinforcement line at the very end□ Test with edge cases (try to make the agent break rules)□ Iterate based on resultsCommon Mistakes to Avoid
I made all of these mistakes before finding the right structure:
┌─────────────────────────────────────────────────────────────┐│ ││ ❌ Putting hard rules at the top ││ → Attention decay will cause violations ││ ││ ❌ Mixing personality with hard rules in one section ││ → Model confuses suggestions with requirements ││ ││ ❌ Using vague language for hard rules ││ → "Try to avoid deleting files" is ignored ││ ││ ❌ No reinforcement at the end ││ → Model may "forget" rules mid-response ││ ││ ❌ Too many hard rules ││ → Diminishing returns after ~10 rules ││ │└─────────────────────────────────────────────────────────────┘Measuring Compliance Improvement
I tracked my agent’s behavior over 100 interactions with each structure:
Structure | Rule Violations | Compliance Rate───────────────────┼─────────────────┼────────────────Rules at Top | 60 | 40%Rules in Middle | 50 | 50%Rules at End | 5 | 95%Separate Files | 2 | 98%
Note: Measured over 100 interactions eachThe jump from 40% to 95%+ represents a significant improvement in agent reliability.
Related Concepts
This finding connects to broader prompt engineering principles:
- Prompt Positioning: Important instructions should be placed at the beginning OR end of prompts, never in the middle
- Instruction Density: Fewer, clearer rules work better than many complex ones
- Repetition: Repeating critical instructions (via reinforcement) improves compliance
- Constraint Hierarchy: Some rules are more critical than others - organize accordingly
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments