How Many Requirements Should Your Claude Code System Prompt Have? The Science-Backed Answer
Problem
When I first started using Claude Code, I stuffed my CLAUDE.md file with every rule I could think of. Coding standards, git workflow, testing requirements, security guidelines, performance rules - I ended up with 19+ requirements thinking more rules meant better compliance.
But something strange happened. During long coding sessions, Claude would forget critical security rules. It would skip tests. It would make mistakes on things I explicitly told it not to do.
I thought I needed even MORE requirements. I was wrong.
What I Discovered
After reading through academic research and community discussions, I found the answer: At 19 requirements, accuracy is measurably lower than at 5 requirements.
A Reddit user on r/ClaudeAI who analyzed 17 papers on agentic AI workflows put it bluntly:
“At 19 requirements in a system prompt, accuracy is lower than at 5. More instructions isn’t better - it’s measurably worse.”
This isn’t opinion. It’s science.
The Lost in the Middle Phenomenon
The core problem is what researchers call the “Lost in the Middle” phenomenon. Liu et al. (2024) published a paper showing that LLMs struggle to recall and apply information positioned in the middle of long contexts.
Position in Context Accuracy Rate─────────────────────────────────────────Beginning (primacy) ~70-80%Middle positions ~30% (dead zone)End (recency) ~70-80%When you have 19 requirements, the ones in the middle get pushed into a 30% accuracy “dead zone.” One Reddit user explained:
“Point 6 regarding the ‘Lost in the Middle’ phenomenon (Liu et al., 2024) is exactly why most ‘vibe coding’ sessions fall apart after the first hour. When an agent is 50 tool calls deep, the initial architectural constraints get pushed into that 30% accuracy ‘dead zone.’”
This is why my long sessions failed. My security requirements were sitting in the middle of my CLAUDE.md, getting ignored.
Why This Happens
LLMs use transformer architecture with attention mechanisms. This creates natural bias:
- Primacy effect - Information at the beginning gets more attention
- Recency effect - Information at the end gets more attention
- Middle degradation - Information in the middle gets less attention
┌─────────────────────────────────────────────────────────────┐│ High Attention ││ ┌─────────┐ ┌─────────┐ ││ │ Start │ │ End │ ││ │ ~75% │ LOW ATTENTION ZONE │ ~75% │ ││ │ recall │ ~30% recall │ recall │ ││ └─────────┘ └─────────┘ ││ ││ ▲ Position 5-15 in long context ▲ ││ │ Your critical rules live here │ ││ └──────── And get ignored ───────┘ │└─────────────────────────────────────────────────────────────┘Each additional requirement dilutes attention given to previous ones. With 19+ requirements, the model cannot determine which matter most.
The Solution: 5-7 Core Requirements
Based on the research, here’s what works:
┌─────────────────────────────────────────────────────────┐│ 1. Project Context (1-2 sentences) │ ← HIGH RECALL│ What is this project, key technologies, primary goal │├─────────────────────────────────────────────────────────┤│ 2. Critical Constraints (2-3 rules) │ ← HIGH RECALL│ Must-never-break rules at THE BEGINNING │├─────────────────────────────────────────────────────────┤│ 3. Output Standards (1-2 guidelines) │ ← MEDIUM│ Format requirements, quality standards │├─────────────────────────────────────────────────────────┤│ 4. Frequent Commands (optional) │ ← HIGH RECALL│ Place at THE END for easy access │└─────────────────────────────────────────────────────────┘ Total: 5-7 requirementsKey positioning rules:
- Critical constraints at the BEGINNING (primacy effect)
- Frequently-needed info at the END (recency effect)
- Avoid putting important rules in the middle
Before and After
Here’s what my overloaded CLAUDE.md looked like:
# Working relationship (7 rules)# Tooling (4 rules)# Coding style (11 rules)# Git workflow (7 rules)# Testing (6 rules)# Security (8 rules)# Performance (5 rules)...Result: Critical security rules positioned in middle get ~30% accuracy during extended sessions.And here’s the optimized version:
# ProjectFlask blog management system with Alpine.js frontend.
# NEVER Violate1. No hardcoded secrets - use environment variables2. Immutable patterns - return new objects, never mutate3. All user inputs validated with zod schemas
# Output StandardsClean code. Functions <50 lines. Files <800 lines. 80%+ test coverage.
# Commands- npm run test: Run tests- npm run build: Production buildResult: All 6 requirements maintain ~70%+ accuracy even during extended sessions.
The Math Behind It
Requirements Count Middle Position Recall Session Stability─────────────────────────────────────────────────────────────────5-7 requirements ~70%+ Stable10 requirements ~50% Degrades15 requirements ~35-40% Unreliable19+ requirements ~30% Falls apartThe Reddit discussion highlighted this pattern:
“When the CLAUDE.md framework first dropped, a lot of vibe coders thought that filling it with rules/requirements would lead to more compliant outputs, even though that’s not how LLMs work.”
Common Mistakes
I made all of these mistakes:
| Mistake | What I Did | Why It Failed |
|---|---|---|
| More = Better | Added 19+ rules thinking more coverage = more compliance | Diluted attention, pushed critical rules into dead zone |
| No priority order | Put security in middle of file | Security got ignored during long sessions |
| Redundancy | Said “no console.log” in 3 different sections | Wasted context window |
| Context bloat | Included full API docs, 50+ commands | Pushed requirements into degraded zones |
| Ignored position | Treated all positions as equal | Beginning/end have 2x better recall than middle |
How to Fix Your CLAUDE.md
- Audit your current file - Count your actual requirements
- Identify critical rules - What must NEVER be violated?
- Cut ruthlessly - Remove redundancy, merge similar rules
- Reposition strategically - Critical constraints at top, commands at bottom
- Test in long sessions - See if compliance improves after 50+ tool calls
┌────────────────────────────────────────────┐│ TOP: What must NEVER happen ││ (Security, critical constraints) ││ ││ MIDDLE: Less critical, general guidance ││ (Style, format preferences) ││ ││ BOTTOM: Frequently referenced ││ (Commands, common patterns) │└────────────────────────────────────────────┘Real-World Impact
This explains why:
- “Vibe coding” sessions fall apart after the first hour
- Complex refactoring fails to maintain original constraints
- Security guidelines added early get forgotten mid-session
- Code style rules apply inconsistently across large changes
A requirement at position 5 initially can end up at position 30+ after extended operations - entering the dead zone.
Summary
In this post, I explained why Claude Code system prompts should have 5-7 requirements, not 19+. The key point is the “Lost in the Middle” phenomenon - LLMs have ~30% recall for information in the middle of long contexts, compared to ~70%+ for beginning and end positions. Place critical constraints at the top and frequently-needed commands at the bottom. Audit your CLAUDE.md today and cut it down to essentials.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Lost in the Middle: How Language Models Use Long Contexts (Liu et al., 2024)
- 👨💻 Claude Code Best Practices
- 👨💻 Anthropic Prompt Engineering Guide
- 👨💻 Reddit Discussion: I read 17 papers on agentic AI workflows
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments