How DeerFlow Memory System Persists User Context Across Sessions

Mar 17, 2026

Purpose

I’ve been frustrated with how AI agents forget everything. Every new conversation starts from zero. I have to restate my preferences, re-explain my project context, and remind the agent what I’m working on.

When I explored DeerFlow, I found it has a built-in memory system that actually persists across sessions. Not just conversation history—structured facts about me that get extracted and stored automatically.

This post explains how DeerFlow’s memory system works, how to configure it, and whether it solves the “AI agents forget everything” problem.

The Problem: Agents Have No Memory

Standard AI agents have no persistence:

Day 1:
  Me: "I'm building a React app with TypeScript"
  Agent: "Got it, I'll help with that"

Day 2 (new session):
  Me: "Add authentication to my app"
  Agent: "What tech stack are you using?"
  Me: (frustrated) "React with TypeScript, like I said yesterday"

This happens because:

Session-only context: Each conversation starts fresh
No fact extraction: Agents don’t identify and store important details
No retrieval: Even if stored, previous context isn’t injected

I wanted to understand how DeerFlow solves this.

What DeerFlow Memory Does

DeerFlow’s memory system provides three things:

User Context: Short summaries about who you are (work, personal, current focus)
Facts: Discrete, confidence-scored knowledge (preferences, knowledge, goals)
History: Time-based conversation summaries

The key insight: memory is extracted automatically using the LLM itself, then injected into future conversations.

Exploring the Memory Storage

I started by examining where memory is stored:

cd deer-flow
find . -name "memory.json"

./backend/.deer-flow/memory.json

Let me check the structure:

cat ./backend/.deer-flow/memory.json | python3 -m json.tool | head -80

{
  "userContext": {
    "workContext": "Software engineer working on AI projects with Python and TypeScript",
    "personalContext": "Based in San Francisco, interested in hiking and photography",
    "topOfMind": "Currently planning a product launch for Q2 2026"
  },
  "facts": [
    {
      "id": "fact-2026031510",
      "content": "User prefers Python over JavaScript for backend development",
      "category": "preference",
      "confidence": 0.9,
      "createdAt": "2026-03-15T10:00:00Z",
      "source": "conversation-abc123"
    },
    {
      "id": "fact-2026031511",
      "content": "User is building a task management app called TaskFlow",
      "category": "goal",
      "confidence": 0.85,
      "createdAt": "2026-03-15T11:30:00Z",
      "source": "conversation-def456"
    }
  ],
  "history": {
    "recentMonths": "March 2026: Worked on DeerFlow integration, researched agent memory systems...",
    "earlierContext": "Previous projects include a microservices architecture and a data pipeline...",
    "longTermBackground": "Software engineer with 8+ years experience in distributed systems..."
  }
}

I was impressed. The system had extracted structured information from my conversations without any manual configuration.

How Memory Extraction Works

I wanted to understand the extraction process. Here’s what I found:

Step 1: Message Filtering via MemoryMiddleware

Not all messages should be processed for memory. The MemoryMiddleware filters to relevant messages:

Raw conversation:
  [system prompt]
  [user: "Hi"]              <- Skip: greeting
  [assistant: "Hello"]      <- Skip: greeting
  [user: "I prefer Python"] <- Keep: user preference
  [assistant: "Got it"]     <- Skip: acknowledgment
  [user: "Create a file"]   <- Skip: action request
  [assistant: "Created..."] <- Keep: AI response with action result

Filtered messages -> Memory queue

The middleware keeps:

User inputs (especially those with content about preferences, context, goals)
Final AI responses (not intermediate thinking steps)

Step 2: Debounced Processing

Memory updates don’t happen immediately. The system waits 30 seconds (configurable) before processing:

T+0s:   User sends message
T+10s:  User sends follow-up
T+25s:  User sends another message
T+55s:  (30 seconds after last message) -> Process all queued messages

This approach:

Batches multiple messages efficiently
Avoids processing incomplete conversations
Deduplicates updates per thread

Step 3: LLM-Powered Extraction

A background worker invokes the LLM to extract memory:

# Simplified extraction logic
async def extract_memory(messages: list, current_memory: dict) -> dict:
    prompt = f"""
    Analyze the conversation and extract:
    1. User context updates (work, personal, current focus)
    2. New facts about the user (preferences, knowledge, goals)
    3. History updates (what happened in this conversation)

    Current memory:
    {json.dumps(current_memory, indent=2)}

    Recent messages:
    {format_messages(messages)}

    Output JSON with:
    - context_updates: {{workContext, personalContext, topOfMind}}
    - new_facts: [{{content, category, confidence}}]
    - history_update: string
    """

    response = await llm.invoke(prompt)
    return parse_memory_updates(response)

The LLM identifies what’s worth remembering and assigns confidence scores.

Step 4: Confidence-Based Fact Storage

Facts are only stored if confidence exceeds the threshold:

def should_store_fact(fact: dict, config: dict) -> bool:
    """Only store facts with sufficient confidence"""

    threshold = config.get('fact_confidence_threshold', 0.7)

    if fact['confidence'] < threshold:
        logger.info(f"Skipping low-confidence fact: {fact['content']}")
        return False

    return True

The default threshold is 0.7 (70% confidence). This prevents low-quality extractions from polluting memory.

Step 5: Atomic Storage

Memory is written atomically to prevent corruption:

def save_memory(memory: dict, storage_path: str):
    """Write memory atomically to prevent corruption"""

    temp_path = f"{storage_path}.tmp"

    # Write to temp file first
    with open(temp_path, 'w') as f:
        json.dump(memory, f, indent=2)

    # Atomic rename
    os.rename(temp_path, storage_path)

This ensures no partial writes if the process crashes.

How Memory Gets Injected

When a new conversation starts, memory is injected into the system prompt:

You are a helpful AI assistant with access to persistent memory.

<memory>
## User Context
- Work: Software engineer working on AI projects with Python and TypeScript
- Personal: Based in San Francisco, interested in hiking
- Current Focus: Planning a product launch for Q2 2026

## Recent Facts (Top 15)
1. Prefers Python over JavaScript for backend (confidence: 0.9)
2. Building TaskFlow app (confidence: 0.85)
3. Uses FastAPI and PostgreSQL stack (confidence: 0.8)
4. Interested in agent memory systems (confidence: 0.75)
...

## History
- Recent: Worked on DeerFlow integration, researched memory systems
- Earlier: Built microservices architecture, data pipelines
</memory>

Use this context to provide personalized responses.

The injection includes:

User context summaries
Top 15 facts (sorted by confidence and recency)
History summaries

Configuration Options

I examined the memory configuration:

memory:
  enabled: true                    # Master switch
  injection_enabled: true          # Inject into system prompt
  storage_path: backend/.deer-flow/memory.json

  # Extraction settings
  debounce_seconds: 30             # Wait before processing
  model_name: null                 # Use default model (or specify)

  # Fact settings
  max_facts: 100                   # Maximum stored facts
  fact_confidence_threshold: 0.7   # Minimum confidence to store

  # Injection settings
  max_injection_tokens: 2000       # Max tokens in system prompt

Disabling Memory

If you don’t want memory:

memory:
  enabled: false

Disabling Injection Only

To extract memory but not inject it:

memory:
  enabled: true
  injection_enabled: false

This is useful for audit/logging purposes without affecting responses.

Testing Memory in Practice

I ran a test to see memory in action:

Day 1: Establish Context

from deerflow.client import DeerFlowClient

client = DeerFlowClient()

# Tell it about my project
client.chat(
    "I'm building a CLI tool called 'deployer' that automates "
    "deployment to Kubernetes. The tech stack is Go and I'm using "
    "the Cobra library for CLI parsing.",
    thread_id="deployer-project"
)

# Ask for help
response = client.chat(
    "Suggest a good structure for the command hierarchy",
    thread_id="deployer-project"
)
print(response)

Day 2: Test Memory Retrieval

from deerflow.client import DeerFlowClient
import time

client = DeerFlowClient()

# New session - does it remember?
response = client.chat(
    "Generate a README for the project we discussed yesterday",
    thread_id="deployer-project-readme"  # Different thread ID
)
print(response)

The response included correct references to:

Project name: “deployer”
Purpose: Kubernetes deployment automation
Tech stack: Go with Cobra

I checked the memory file after:

cat ./backend/.deer-flow/memory.json | grep -A 5 "deployer"

"content": "User is building a CLI tool called deployer for Kubernetes deployment automation",
"category": "goal",
"confidence": 0.88

The fact was extracted and stored with high confidence.

Accessing Memory via API

DeerFlow exposes memory through the Gateway API:

Get Current Memory

curl http://localhost:2026/api/memory

{
  "success": true,
  "data": {
    "userContext": {
      "workContext": "Software engineer working on AI projects...",
      "personalContext": "Based in San Francisco...",
      "topOfMind": "Planning a product launch..."
    },
    "facts": [
      {
        "id": "fact-001",
        "content": "Prefers Python over JavaScript",
        "category": "preference",
        "confidence": 0.9
      }
    ],
    "history": {
      "recentMonths": "...",
      "earlierContext": "...",
      "longTermBackground": "..."
    }
  }
}

Force Memory Reload

curl -X POST http://localhost:2026/api/memory/reload

This is useful after manual edits to memory.json.

Using the Python Client

from deerflow.client import DeerFlowClient

client = DeerFlowClient()

# Get current memory
memory = client.get_memory()
print(f"Work context: {memory['userContext']['workContext']}")

# See all facts
for fact in memory['facts']:
    print(f"- {fact['content']} (confidence: {fact['confidence']})")

# Force reload
client.reload_memory()

Fact Categories Explained

Facts are categorized to help with organization and retrieval:

Category	Description	Example
`preference`	User preferences	”Prefers dark mode”
`knowledge`	User’s domain knowledge	”Knows Kubernetes well”
`context`	Situational context	”Currently debugging an auth issue”
`behavior`	Behavioral patterns	”Usually asks for code examples”
`goal`	User’s goals	”Building a CLI tool called deployer”

Categories help the system prioritize what to inject. Goals and preferences get higher priority.

Issues I Encountered

Not everything was perfect:

1. Over-Extraction

The LLM sometimes extracts low-value facts:

{
  "content": "User said 'hello' at the start of conversation",
  "category": "behavior",
  "confidence": 0.3
}

This is why the confidence threshold matters. I raised mine to 0.75.

2. Fact Conflicts

When preferences change, old facts aren’t always updated:

Fact 1: "User prefers React" (confidence 0.8, 2 weeks ago)
Fact 2: "User now prefers Vue" (confidence 0.85, yesterday)

Both facts exist. The system should merge or replace, but currently doesn’t.

3. Privacy Concerns

Everything is stored in plain JSON. If someone accesses your memory.json, they see your complete profile.

4. No Per-Project Memory

Memory is global across all threads. If you work on multiple unrelated projects, facts get mixed together.

Comparison: Memory vs Other Approaches

Approach	Persists Across Sessions	Automatic Extraction	Personalized Responses
Conversation history only	No	N/A	Limited
Manual memory prompts	Yes	No	Yes
External memory service	Yes	Varies	Yes
DeerFlow Memory	Yes	Yes (LLM-powered)	Yes

DeerFlow’s advantage is automatic extraction. You don’t have to manually tell it what to remember.

When Memory Helps

Based on my testing, memory is useful when:

Long-term projects: You work on something over days/weeks
Consistent preferences: You want the agent to remember your coding style, tech choices
Context switching: You return to a project after a break
Team usage: Shared context about project goals (though this has privacy implications)

Memory is less useful for:

One-off questions: Quick answers don’t need persistence
Privacy-sensitive work: Everything gets stored
Multi-project work: Facts get mixed together

My Recommendation

DeerFlow’s memory system is one of its most practical features. It solves a real problem—AI agents forgetting everything—without requiring manual configuration.

Enable memory if:

You work on long-term projects
You want personalized responses
You don’t mind your context being stored locally

Disable memory if:

You only ask one-off questions
You’re working on sensitive projects
You don’t want any persistence

The local-first storage approach keeps data under your control. The extraction is automatic but configurable. For developers building agents that need to remember, DeerFlow’s memory system provides a solid foundation.

Summary

DeerFlow’s memory system persists user context across sessions using LLM-powered extraction. It stores user context summaries, confidence-scored facts, and conversation history. The top 15 facts plus context are injected into system prompts for personalized responses.

I tested it with a multi-day project and found it correctly remembered project details, tech stack choices, and goals. The main limitations are fact conflicts when preferences change and lack of per-project memory isolation.

For developers who’ve struggled with AI agents that forget everything, DeerFlow’s memory system provides a production-ready solution that works out of the box.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!