RAG vs Compaction Tree for AI Agent Memory - Which Approach Is Better?

Mar 21, 2026

My AI agent kept asking the same questions over and over. Every session, it would forget what we discussed. So I built a RAG system. Problem solved, right?

Wrong. The agent still didn’t know what it knew.

The RAG Illusion

I indexed all our conversations into a vector database. Semantic search, embeddings, the whole stack. When the agent needed to recall something, it could query and retrieve relevant context.

def recall_conversation(query: str) -> list[Document]:
    """Retrieve relevant past conversations"""
    embeddings = embedding_model.embed(query)
    results = vector_store.similarity_search(embeddings, k=5)
    return results

This worked great for queries like:

“What was the bug in the authentication module?”
“Show me the database schema we designed”
“What API endpoints did we implement last week?”

But then I noticed something strange. The agent would ask: “Do we have any context on database migrations?”

I’d stare at the screen. Yes, we discussed this extensively three days ago. Just search for it!

But the agent didn’t know to search. That’s when I realized the fundamental flaw.

The Awareness Problem

RAG is great if you know what to search for. But “do I already have context on database migrations?” isn’t a search query - it’s an awareness question.

The agent doesn’t know what it knows.

Think about it this way: imagine you have a perfectly indexed library, but no card catalog. If you know exactly what book you want, you can find it. But you can’t browse by topic. You can’t see what’s available.

Traditional RAG Flow:
1. Agent receives query
2. Agent decides to search memory
3. Agent formulates search query
4. RAG retrieves relevant documents
5. Agent uses retrieved context

The Missing Step:
0. Agent should first ask: "Do I know anything about this?"

The missing step is awareness. Without it, agents either:

Never search their memory (wasteful rework)
Always search their memory (expensive, noisy)
Guess when to search (inconsistent)

None of these are acceptable.

Enter the Compaction Tree

I stumbled upon a different approach while researching agent memory architectures. Instead of just storing documents for retrieval, maintain a hierarchical summary tree.

class CompactionLevel:
    """Single level in the compaction hierarchy"""
    def __init__(self, level: int):
        self.level = level
        self.summaries: list[Summary] = []
        self.child_level: CompactionLevel | None = None

class CompactionTree:
    """Hierarchical memory with multiple compression levels"""
    def __init__(self):
        self.level_0 = CompactionLevel(0)  # Raw logs
        self.level_1 = CompactionLevel(1)  # Hour summaries
        self.level_2 = CompactionLevel(2)  # Day summaries
        self.level_3 = CompactionLevel(3)  # Week summaries
        self.level_4 = CompactionLevel(4)  # ROOT.md

Here’s how it works:

Level 0: Raw conversation logs, preserved exactly Level 1: Summaries of each hour/session Level 2: Daily summaries Level 3: Weekly summaries Level 4: ROOT.md - the highest-level overview

At the top sits ROOT.md, a single document answering “what do I know?” without any query needed.

# Agent Memory Overview

## Active Projects
- User authentication system (last discussed: 2026-03-20)
- Database migration strategy (last discussed: 2026-03-19)
- API rate limiting (last discussed: 2026-03-18)

## Recent Decisions
- Chose PostgreSQL over MongoDB for main database
- Implemented JWT-based auth with refresh tokens
- Decided on Redis for session caching

## Key Technical Context
- Schema includes users, sessions, audit_logs tables
- Using SQLAlchemy ORM with async support
- Tests written with pytest, 78% coverage

The agent reads ROOT.md at the start of each session. Now it knows what it knows.

But what if the agent needs more detail? That’s where the tree structure shines.

The compaction tree answers “what did we discuss last week?” with hierarchical drill-down:

ROOT.md (overview)
  |
  +-- Week of 2026-03-15
  |     +-- Day: 2026-03-20
  |     |     +-- Session: Morning (auth implementation)
  |     |     +-- Session: Afternoon (testing)
  |     |
  |     +-- Day: 2026-03-19
  |           +-- Session: Database migration planning
  |
  +-- Week of 2026-03-08
        +-- Summary: Initial project setup

The agent can:

Read ROOT.md for awareness (no query)
Navigate to a week summary for context
Drill down to a specific day
Access raw logs at level 0 if needed

Nothing is ever lost - raw logs are preserved, just compressed through levels.

The Crucial Difference

Here’s the key insight: RAG and compaction trees solve different problems.

RAG:
- Problem: "Find specific information"
- Input: Search query
- Output: Relevant documents
- Use case: Targeted retrieval

Compaction Tree:
- Problem: "What do I know?"
- Input: None (awareness)
- Output: Hierarchical overview
- Use case: Self-knowledge

RAG + Compaction Tree:
- Problem: "Find specific info AND know what to search for"
- Best of both worlds

RAG is like having a search engine. Compaction trees are like having a table of contents.

You need both.

Building a Hybrid System

I ended up implementing both in my agent memory system (inspired by the Hipocampus architecture):

class AgentMemory:
    def __init__(self):
        self.compaction_tree = CompactionTree()
        self.rag_system = RAGSystem(
            hybrid_search=HybridSearch(
                bm25_weight=0.3,
                vector_weight=0.7
            )
        )

    def get_overview(self) -> str:
        """Awareness: What do I know? (No query needed)"""
        return self.compaction_tree.read_root()

    def browse_time(self, date_range: DateRange) -> str:
        """Time navigation: What happened when?"""
        return self.compaction_tree.navigate(date_range)

    def search(self, query: str) -> list[Document]:
        """Targeted retrieval: Find specific information"""
        return self.rag_system.search(query)

The hybrid search combines BM25 (keyword matching) with vector similarity. This handles different query types:

“JWT implementation” -> BM25 excels
“authentication error handling” -> Vector search finds semantically related
“session timeout bug” -> Hybrid combines both

The Architecture That Actually Works

After months of iteration, here’s what I’ve learned:

Start with ROOT.md: The agent needs immediate awareness of its knowledge. This should be the first thing loaded.
Preserve raw logs: Never delete the source of truth. Compression is not deletion.
Use both retrieval methods: Compaction trees for awareness and navigation, RAG for deep semantic search.
Time-based organization matters: Agents think in conversations and sessions. The tree structure mirrors how we actually work.
Hybrid search beats pure vector: BM25 catches exact terms. Vectors catch semantic meaning. Together, they’re more robust.

class RecommendedMemoryArchitecture:
    """
    The architecture that actually works in practice
    """

    def __init__(self):
        # Awareness layer
        self.root_document = "ROOT.md"

        # Navigation layer
        self.compaction_tree = CompactionTree(
            levels=[
                "raw_logs",      # Level 0: Everything
                "hourly",        # Level 1: Session summaries
                "daily",         # Level 2: Day summaries
                "weekly",        # Level 3: Week summaries
                "root"           # Level 4: Overview
            ]
        )

        # Retrieval layer
        self.retrieval = HybridSearch(
            bm25_index=BleveIndex(),
            vector_index=VectorIndex(embedding_model="text-embedding-3-small"),
            fusion_strategy="rrf"  # Reciprocal rank fusion
        )

    def recall(self, context: RecallContext) -> MemoryResult:
        """
        The complete recall process:

        1. Read ROOT.md for awareness
        2. Decide: Do I need more context?
        3. If yes: Navigate tree OR search RAG
        4. Return relevant memory
        """
        overview = self.read_root()

        if context.needs_awareness:
            return MemoryResult(overview, source="root")

        if context.time_range:
            return self.compaction_tree.navigate(context.time_range)

        return self.retrieval.search(context.query)

What About Memory Limits?

A common concern: “Won’t the tree get huge?”

The compaction happens continuously. Each level compresses by a factor of roughly 10:

100 hours of conversation
Compressed to 10 daily summaries
Compressed to 1-2 weekly summaries
Compressed to a single ROOT.md

The storage footprint stays manageable. More importantly, the agent never needs to load everything - it navigates the hierarchy based on what it needs.

The Bottom Line

Neither RAG nor compaction trees alone solve the AI agent memory problem. RAG gives you targeted retrieval but no awareness. Compaction trees give you awareness and navigation but require drill-down for details.

Use both.

The awareness problem is real. Your agent can’t search for what it doesn’t know exists. A ROOT.md at the top of a compaction tree solves this elegantly.

And when the agent needs to dive deep? That’s what RAG is for.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!