Skip to content

What's the Difference Between OpenClaw Bootstrap Context and Semantic Search Memory?

I spent a frustrating afternoon watching my AI agent forget who it was. Every new conversation started from scratch. My carefully crafted instructions? Gone. My accumulated knowledge? Scattered across dozens of disconnected chat sessions.

Then I discovered OpenClaw’s memory architecture. But understanding it wasn’t straightforward—I kept confusing bootstrap context with semantic search, applying the wrong tool to the wrong problem. Here’s what I learned through trial and error.

The Problem: Context Management at Scale

When building AI agents that persist across sessions, you quickly run into a fundamental tension:

┌─────────────────────────────────────────────────────────┐
│ LLM Context Window │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Identity │ │ Rules │ │ Current │ │
│ │ & Rules │ │ & Tone │ │ Task │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ You can't fit EVERYTHING in here. So what stays? │
│ What gets retrieved? And when? │
└─────────────────────────────────────────────────────────┘

The naive approach dumps everything into the context. That works until you hit token limits. The sophisticated approach uses retrieval—but then you face a new problem: what if the retrieval misses something critical?

OpenClaw solves this with a two-level memory system. But I initially misunderstood how they complement each other.

Level 1: Bootstrap Context (What the Agent ALWAYS Sees)

My first attempt at managing memory was to throw everything into MEMORY.md and hope semantic search would find it. This backfired spectacularly when my agent started ignoring its core personality instructions.

Here’s what bootstrap context actually does:

┌─────────────────────────────────────────────────────────┐
│ Bootstrap Context Injection │
│ │
│ Every LLM Request Gets: │
│ ┌─────────────────────────────────────────────────┐ │
│ │ 1. AGENTS.md → Agent behavior rules │ │
│ │ 2. SOUL.md → Core identity & values │ │
│ │ 3. USER.md → User preferences & context │ │
│ │ 4. IDENTITY.md → Role definitions │ │
│ │ 5. Daily Log → Today's specific context │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ This happens BEFORE your message reaches the LLM │
└─────────────────────────────────────────────────────────┘

I discovered this the hard way. I had written extensive rules about tone and behavior in MEMORY.md, expecting semantic search to retrieve them. But semantic search only pulls what’s relevant to the current query. When I asked a technical question, my agent responded with the wrong personality—because semantic search didn’t retrieve the tone instructions.

Moving those rules to SOUL.md fixed everything:

SOUL.md
# Agent Soul
You are a pragmatic software engineer who values clarity over cleverness.
## Communication Style
- Be direct and concise
- Explain trade-offs, not just solutions
- Admit uncertainty when appropriate
- Never oversell solutions
## Non-Negotiable Rules
- Always check existing tests before suggesting changes
- Prefer standard library over external dependencies
- Document public APIs

Now these instructions appear in every single request. The agent never “forgets” who it is.

Level 2: Semantic Search Memory (What the Agent RETRIEVES When Needed)

Bootstrap context is powerful but expensive. You can’t put everything there without burning through your context window. This is where semantic search shines.

┌─────────────────────────────────────────────────────────┐
│ Semantic Search Retrieval │
│ │
│ User Query: "How did we fix the database lock issue?" │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ MEMORY.md (Vector Indexed) │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Entry 1 │ │ Entry 2 │ │ Entry 3 │ ... │ │
│ │ │ Lock │ │ API │ │ Auth │ │ │
│ │ │ Issue │ │ Design │ │ Bug │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ ↓ │ │
│ │ [Semantic Similarity Search] │ │
│ │ ↓ │ │
│ │ Retrieved: "Lock issue - added timeout..." │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ Only relevant chunks are pulled into context │
└─────────────────────────────────────────────────────────┘

The key insight: semantic search retrieves based on meaning, not keywords. When I documented a solution using different terminology than my query, semantic search still found it.

Here’s how I structure MEMORY.md:

MEMORY.md
# Project Memory
## Architecture Decisions
### 2026-02-15: Database Connection Pooling
We switched from individual connections to a connection pool.
Problem: Database was running out of connections under load.
Solution: Implemented pgx pool with max 20 connections.
Trade-off: Slightly higher latency per query, but better throughput.
## Solved Problems
### 2026-03-01: Redis Connection Drops
Symptoms: Intermittent "connection reset by peer" errors
Root cause: Redis server was closing idle connections faster than client timeout
Fix: Set client idle timeout to 4 minutes (server was at 5 minutes)
## Open Questions
- Should we migrate to PostgreSQL 16 for parallel query improvement?
- Is the caching layer worth the complexity for our read patterns?

Now when I ask about connection issues, semantic search pulls relevant entries—even if I don’t use the exact same words.

The Mistake I Made (So You Don’t Have To)

Initially, I treated bootstrap and semantic search as alternatives rather than complements:

WRONG APPROACH:
┌─────────────────────┐ ┌─────────────────────┐
│ Bootstrap Context │ OR │ Semantic Search │
│ │ │ │
│ Everything here? │ │ Everything here? │
└─────────────────────┘ └─────────────────────┘
↓ Either burns context ↓ Risks missing critical info

The correct approach uses both strategically:

CORRECT APPROACH:
┌─────────────────────────────────────────────────────────┐
│ Memory Strategy │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Bootstrap Context │ │ Semantic Search │ │
│ │ │ │ │ │
│ │ • Core identity │ │ • Past solutions │ │
│ │ • Essential rules │ │ • Reference docs │ │
│ │ • Non-negotiables │ │ • Historical data │ │
│ │ • Tone & style │ │ • Detailed notes │ │
│ │ │ │ │ │
│ │ ALWAYS injected │ │ RETRIEVED when │ │
│ │ │ │ relevant │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘

When to Use Which?

Through experimentation, I developed this decision tree:

┌─────────────────────────────────────────────────────────┐
│ Where Should This Information Live? │
└─────────────────────────────────────────────────────────┘
┌────────────────────────────────┐
│ Is this something the agent │
│ MUST know for EVERY request? │
└────────────────────────────────┘
┌─────────┴─────────┐
│ │
YES NO
│ │
▼ ▼
┌─────────────┐ ┌────────────────────────┐
│ BOOTSTRAP │ │ Can it be retrieved by │
│ │ │ relevance to a query? │
└─────────────┘ └────────────────────────┘
┌─────────┴─────────┐
│ │
YES NO
│ │
▼ ▼
┌─────────────┐ ┌─────────────────┐
│ SEMANTIC │ │ Consider if it │
│ SEARCH │ │ belongs at all │
└─────────────┘ └─────────────────┘

Bootstrap candidates:

  • Agent personality and communication style
  • Hard rules that should never be violated
  • Core identity and role definitions
  • Current project context (via daily logs)

Semantic search candidates:

  • Past solutions to similar problems
  • Reference documentation
  • Historical decisions and their rationale
  • Detailed technical notes

A Practical Example

I maintain a coding agent that helps with a large TypeScript project. Here’s how I split the memory:

AGENTS.md (Bootstrap)
# Agent Configuration
## Mandatory Checks
- Run type checker after ANY TypeScript/TSX edit
- Run tests after modifying test files
- Check for console.log before finalizing
## Code Style
- Use const over let when possible
- Prefer early returns over nested conditionals
- Extract complex conditions to named variables
MEMORY.md (Semantic Search)
# Project Memory
## 2026-02-20: Handling Circular Dependencies
Problem: Module A imported B, which imported A.
Solution: Extracted shared types to types/common.ts
Pattern: When circular deps detected, create a new shared module.
## 2026-02-28: API Rate Limiting Strategy
Implemented token bucket algorithm for external API calls.
Burst: 10 requests, refill: 1/second
Why: Prevented 429 errors during batch operations.
Alternative considered: Sliding window (more complex, same effect).
## 2026-03-05: Database Migration Workflow
1. Create migration file in migrations/
2. Run local test against test database
3. Apply to staging, verify no errors
4. Apply to production during low-traffic window
5. Update MEMORY.md with any manual fixes required

When I ask the agent to help with a dependency issue, semantic search retrieves the circular dependency note. When I ask for general coding help, the bootstrap rules ensure consistent behavior.

The Trade-offs

Bootstrap context guarantees delivery but costs tokens on every request. Semantic search is efficient but probabilistic—it might miss something important.

┌─────────────────────┬──────────────────┬───────────────────┐
│ │ Bootstrap │ Semantic Search │
├─────────────────────┼──────────────────┼───────────────────┤
│ Guaranteed delivery │ ✓ │ ✗ │
│ Token efficiency │ ✗ │ ✓ │
│ Works for all cases │ ✓ │ ✗ │
│ Scales with volume │ ✗ │ ✓ │
│ Retrieval accuracy │ N/A │ Varies │
└─────────────────────┴──────────────────┴───────────────────┘

I learned to accept this trade-off. Critical rules go in bootstrap. Everything useful but not critical goes to semantic search.

Final Thoughts

Understanding this distinction transformed how I work with AI agents. Instead of hoping the agent remembers context, I strategically place information where it will be found—either always (bootstrap) or when relevant (semantic search).

The key insight: bootstrap is for identity, semantic search is for knowledge. Mix them up, and you’ll either burn context on unnecessary repetition or lose critical information to retrieval failures.


Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments