Skip to content

How Many Requirements Should Your Claude Code System Prompt Have? The Science-Backed Answer

Problem

When I first started using Claude Code, I stuffed my CLAUDE.md file with every rule I could think of. Coding standards, git workflow, testing requirements, security guidelines, performance rules - I ended up with 19+ requirements thinking more rules meant better compliance.

But something strange happened. During long coding sessions, Claude would forget critical security rules. It would skip tests. It would make mistakes on things I explicitly told it not to do.

I thought I needed even MORE requirements. I was wrong.

What I Discovered

After reading through academic research and community discussions, I found the answer: At 19 requirements, accuracy is measurably lower than at 5 requirements.

A Reddit user on r/ClaudeAI who analyzed 17 papers on agentic AI workflows put it bluntly:

“At 19 requirements in a system prompt, accuracy is lower than at 5. More instructions isn’t better - it’s measurably worse.”

This isn’t opinion. It’s science.

The Lost in the Middle Phenomenon

The core problem is what researchers call the “Lost in the Middle” phenomenon. Liu et al. (2024) published a paper showing that LLMs struggle to recall and apply information positioned in the middle of long contexts.

LLM Recall Accuracy by Position
Position in Context Accuracy Rate
─────────────────────────────────────────
Beginning (primacy) ~70-80%
Middle positions ~30% (dead zone)
End (recency) ~70-80%

When you have 19 requirements, the ones in the middle get pushed into a 30% accuracy “dead zone.” One Reddit user explained:

“Point 6 regarding the ‘Lost in the Middle’ phenomenon (Liu et al., 2024) is exactly why most ‘vibe coding’ sessions fall apart after the first hour. When an agent is 50 tool calls deep, the initial architectural constraints get pushed into that 30% accuracy ‘dead zone.’”

This is why my long sessions failed. My security requirements were sitting in the middle of my CLAUDE.md, getting ignored.

Why This Happens

LLMs use transformer architecture with attention mechanisms. This creates natural bias:

  1. Primacy effect - Information at the beginning gets more attention
  2. Recency effect - Information at the end gets more attention
  3. Middle degradation - Information in the middle gets less attention
Attention Distribution Across Context
┌─────────────────────────────────────────────────────────────┐
│ High Attention │
│ ┌─────────┐ ┌─────────┐ │
│ │ Start │ │ End │ │
│ │ ~75% │ LOW ATTENTION ZONE │ ~75% │ │
│ │ recall │ ~30% recall │ recall │ │
│ └─────────┘ └─────────┘ │
│ │
│ ▲ Position 5-15 in long context ▲ │
│ │ Your critical rules live here │ │
│ └──────── And get ignored ───────┘ │
└─────────────────────────────────────────────────────────────┘

Each additional requirement dilutes attention given to previous ones. With 19+ requirements, the model cannot determine which matter most.

The Solution: 5-7 Core Requirements

Based on the research, here’s what works:

Optimal CLAUDE.md Structure
┌─────────────────────────────────────────────────────────┐
│ 1. Project Context (1-2 sentences) │ ← HIGH RECALL
│ What is this project, key technologies, primary goal │
├─────────────────────────────────────────────────────────┤
│ 2. Critical Constraints (2-3 rules) │ ← HIGH RECALL
│ Must-never-break rules at THE BEGINNING │
├─────────────────────────────────────────────────────────┤
│ 3. Output Standards (1-2 guidelines) │ ← MEDIUM
│ Format requirements, quality standards │
├─────────────────────────────────────────────────────────┤
│ 4. Frequent Commands (optional) │ ← HIGH RECALL
│ Place at THE END for easy access │
└─────────────────────────────────────────────────────────┘
Total: 5-7 requirements

Key positioning rules:

  • Critical constraints at the BEGINNING (primacy effect)
  • Frequently-needed info at the END (recency effect)
  • Avoid putting important rules in the middle

Before and After

Here’s what my overloaded CLAUDE.md looked like:

Bad Example: 19+ Requirements (What I Had)
# Working relationship (7 rules)
# Tooling (4 rules)
# Coding style (11 rules)
# Git workflow (7 rules)
# Testing (6 rules)
# Security (8 rules)
# Performance (5 rules)
...
Result: Critical security rules positioned in middle
get ~30% accuracy during extended sessions.

And here’s the optimized version:

Good Example: 6 Requirements (What I Use Now)
# Project
Flask blog management system with Alpine.js frontend.
# NEVER Violate
1. No hardcoded secrets - use environment variables
2. Immutable patterns - return new objects, never mutate
3. All user inputs validated with zod schemas
# Output Standards
Clean code. Functions <50 lines. Files <800 lines. 80%+ test coverage.
# Commands
- npm run test: Run tests
- npm run build: Production build

Result: All 6 requirements maintain ~70%+ accuracy even during extended sessions.

The Math Behind It

Recall Accuracy Comparison
Requirements Count Middle Position Recall Session Stability
─────────────────────────────────────────────────────────────────
5-7 requirements ~70%+ Stable
10 requirements ~50% Degrades
15 requirements ~35-40% Unreliable
19+ requirements ~30% Falls apart

The Reddit discussion highlighted this pattern:

“When the CLAUDE.md framework first dropped, a lot of vibe coders thought that filling it with rules/requirements would lead to more compliant outputs, even though that’s not how LLMs work.”

Common Mistakes

I made all of these mistakes:

MistakeWhat I DidWhy It Failed
More = BetterAdded 19+ rules thinking more coverage = more complianceDiluted attention, pushed critical rules into dead zone
No priority orderPut security in middle of fileSecurity got ignored during long sessions
RedundancySaid “no console.log” in 3 different sectionsWasted context window
Context bloatIncluded full API docs, 50+ commandsPushed requirements into degraded zones
Ignored positionTreated all positions as equalBeginning/end have 2x better recall than middle

How to Fix Your CLAUDE.md

  1. Audit your current file - Count your actual requirements
  2. Identify critical rules - What must NEVER be violated?
  3. Cut ruthlessly - Remove redundancy, merge similar rules
  4. Reposition strategically - Critical constraints at top, commands at bottom
  5. Test in long sessions - See if compliance improves after 50+ tool calls
Optimal Positioning Strategy
┌────────────────────────────────────────────┐
│ TOP: What must NEVER happen │
│ (Security, critical constraints) │
│ │
│ MIDDLE: Less critical, general guidance │
│ (Style, format preferences) │
│ │
│ BOTTOM: Frequently referenced │
│ (Commands, common patterns) │
└────────────────────────────────────────────┘

Real-World Impact

This explains why:

  • “Vibe coding” sessions fall apart after the first hour
  • Complex refactoring fails to maintain original constraints
  • Security guidelines added early get forgotten mid-session
  • Code style rules apply inconsistently across large changes

A requirement at position 5 initially can end up at position 30+ after extended operations - entering the dead zone.

Summary

In this post, I explained why Claude Code system prompts should have 5-7 requirements, not 19+. The key point is the “Lost in the Middle” phenomenon - LLMs have ~30% recall for information in the middle of long contexts, compared to ~70%+ for beginning and end positions. Place critical constraints at the top and frequently-needed commands at the bottom. Audit your CLAUDE.md today and cut it down to essentials.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments