Skip to content

How DeerFlow Memory System Persists User Context Across Sessions

Purpose

I’ve been frustrated with how AI agents forget everything. Every new conversation starts from zero. I have to restate my preferences, re-explain my project context, and remind the agent what I’m working on.

When I explored DeerFlow, I found it has a built-in memory system that actually persists across sessions. Not just conversation history—structured facts about me that get extracted and stored automatically.

This post explains how DeerFlow’s memory system works, how to configure it, and whether it solves the “AI agents forget everything” problem.

The Problem: Agents Have No Memory

Standard AI agents have no persistence:

Typical Agent Experience
Day 1:
Me: "I'm building a React app with TypeScript"
Agent: "Got it, I'll help with that"
Day 2 (new session):
Me: "Add authentication to my app"
Agent: "What tech stack are you using?"
Me: (frustrated) "React with TypeScript, like I said yesterday"

This happens because:

  1. Session-only context: Each conversation starts fresh
  2. No fact extraction: Agents don’t identify and store important details
  3. No retrieval: Even if stored, previous context isn’t injected

I wanted to understand how DeerFlow solves this.

What DeerFlow Memory Does

DeerFlow’s memory system provides three things:

  1. User Context: Short summaries about who you are (work, personal, current focus)
  2. Facts: Discrete, confidence-scored knowledge (preferences, knowledge, goals)
  3. History: Time-based conversation summaries

The key insight: memory is extracted automatically using the LLM itself, then injected into future conversations.

Exploring the Memory Storage

I started by examining where memory is stored:

Terminal
cd deer-flow
find . -name "memory.json"
Output
./backend/.deer-flow/memory.json

Let me check the structure:

Terminal
cat ./backend/.deer-flow/memory.json | python3 -m json.tool | head -80
memory.json structure
{
"userContext": {
"workContext": "Software engineer working on AI projects with Python and TypeScript",
"personalContext": "Based in San Francisco, interested in hiking and photography",
"topOfMind": "Currently planning a product launch for Q2 2026"
},
"facts": [
{
"id": "fact-2026031510",
"content": "User prefers Python over JavaScript for backend development",
"category": "preference",
"confidence": 0.9,
"createdAt": "2026-03-15T10:00:00Z",
"source": "conversation-abc123"
},
{
"id": "fact-2026031511",
"content": "User is building a task management app called TaskFlow",
"category": "goal",
"confidence": 0.85,
"createdAt": "2026-03-15T11:30:00Z",
"source": "conversation-def456"
}
],
"history": {
"recentMonths": "March 2026: Worked on DeerFlow integration, researched agent memory systems...",
"earlierContext": "Previous projects include a microservices architecture and a data pipeline...",
"longTermBackground": "Software engineer with 8+ years experience in distributed systems..."
}
}

I was impressed. The system had extracted structured information from my conversations without any manual configuration.

How Memory Extraction Works

I wanted to understand the extraction process. Here’s what I found:

Step 1: Message Filtering via MemoryMiddleware

Not all messages should be processed for memory. The MemoryMiddleware filters to relevant messages:

Message Filtering Flow
Raw conversation:
[system prompt]
[user: "Hi"] <- Skip: greeting
[assistant: "Hello"] <- Skip: greeting
[user: "I prefer Python"] <- Keep: user preference
[assistant: "Got it"] <- Skip: acknowledgment
[user: "Create a file"] <- Skip: action request
[assistant: "Created..."] <- Keep: AI response with action result
Filtered messages -> Memory queue

The middleware keeps:

  • User inputs (especially those with content about preferences, context, goals)
  • Final AI responses (not intermediate thinking steps)

Step 2: Debounced Processing

Memory updates don’t happen immediately. The system waits 30 seconds (configurable) before processing:

Debounce Timeline
T+0s: User sends message
T+10s: User sends follow-up
T+25s: User sends another message
T+55s: (30 seconds after last message) -> Process all queued messages

This approach:

  • Batches multiple messages efficiently
  • Avoids processing incomplete conversations
  • Deduplicates updates per thread

Step 3: LLM-Powered Extraction

A background worker invokes the LLM to extract memory:

memory_extraction.py
# Simplified extraction logic
async def extract_memory(messages: list, current_memory: dict) -> dict:
prompt = f"""
Analyze the conversation and extract:
1. User context updates (work, personal, current focus)
2. New facts about the user (preferences, knowledge, goals)
3. History updates (what happened in this conversation)
Current memory:
{json.dumps(current_memory, indent=2)}
Recent messages:
{format_messages(messages)}
Output JSON with:
- context_updates: {{workContext, personalContext, topOfMind}}
- new_facts: [{{content, category, confidence}}]
- history_update: string
"""
response = await llm.invoke(prompt)
return parse_memory_updates(response)

The LLM identifies what’s worth remembering and assigns confidence scores.

Step 4: Confidence-Based Fact Storage

Facts are only stored if confidence exceeds the threshold:

fact_storage.py
def should_store_fact(fact: dict, config: dict) -> bool:
"""Only store facts with sufficient confidence"""
threshold = config.get('fact_confidence_threshold', 0.7)
if fact['confidence'] < threshold:
logger.info(f"Skipping low-confidence fact: {fact['content']}")
return False
return True

The default threshold is 0.7 (70% confidence). This prevents low-quality extractions from polluting memory.

Step 5: Atomic Storage

Memory is written atomically to prevent corruption:

atomic_storage.py
def save_memory(memory: dict, storage_path: str):
"""Write memory atomically to prevent corruption"""
temp_path = f"{storage_path}.tmp"
# Write to temp file first
with open(temp_path, 'w') as f:
json.dump(memory, f, indent=2)
# Atomic rename
os.rename(temp_path, storage_path)

This ensures no partial writes if the process crashes.

How Memory Gets Injected

When a new conversation starts, memory is injected into the system prompt:

System Prompt with Memory
You are a helpful AI assistant with access to persistent memory.
<memory>
## User Context
- Work: Software engineer working on AI projects with Python and TypeScript
- Personal: Based in San Francisco, interested in hiking
- Current Focus: Planning a product launch for Q2 2026
## Recent Facts (Top 15)
1. Prefers Python over JavaScript for backend (confidence: 0.9)
2. Building TaskFlow app (confidence: 0.85)
3. Uses FastAPI and PostgreSQL stack (confidence: 0.8)
4. Interested in agent memory systems (confidence: 0.75)
...
## History
- Recent: Worked on DeerFlow integration, researched memory systems
- Earlier: Built microservices architecture, data pipelines
</memory>
Use this context to provide personalized responses.

The injection includes:

  • User context summaries
  • Top 15 facts (sorted by confidence and recency)
  • History summaries

Configuration Options

I examined the memory configuration:

config.yaml (memory section)
memory:
enabled: true # Master switch
injection_enabled: true # Inject into system prompt
storage_path: backend/.deer-flow/memory.json
# Extraction settings
debounce_seconds: 30 # Wait before processing
model_name: null # Use default model (or specify)
# Fact settings
max_facts: 100 # Maximum stored facts
fact_confidence_threshold: 0.7 # Minimum confidence to store
# Injection settings
max_injection_tokens: 2000 # Max tokens in system prompt

Disabling Memory

If you don’t want memory:

config.yaml
memory:
enabled: false

Disabling Injection Only

To extract memory but not inject it:

config.yaml
memory:
enabled: true
injection_enabled: false

This is useful for audit/logging purposes without affecting responses.

Testing Memory in Practice

I ran a test to see memory in action:

Day 1: Establish Context

day1_test.py
from deerflow.client import DeerFlowClient
client = DeerFlowClient()
# Tell it about my project
client.chat(
"I'm building a CLI tool called 'deployer' that automates "
"deployment to Kubernetes. The tech stack is Go and I'm using "
"the Cobra library for CLI parsing.",
thread_id="deployer-project"
)
# Ask for help
response = client.chat(
"Suggest a good structure for the command hierarchy",
thread_id="deployer-project"
)
print(response)

Day 2: Test Memory Retrieval

day2_test.py
from deerflow.client import DeerFlowClient
import time
client = DeerFlowClient()
# New session - does it remember?
response = client.chat(
"Generate a README for the project we discussed yesterday",
thread_id="deployer-project-readme" # Different thread ID
)
print(response)

The response included correct references to:

  • Project name: “deployer”
  • Purpose: Kubernetes deployment automation
  • Tech stack: Go with Cobra

I checked the memory file after:

Terminal
cat ./backend/.deer-flow/memory.json | grep -A 5 "deployer"
Output
"content": "User is building a CLI tool called deployer for Kubernetes deployment automation",
"category": "goal",
"confidence": 0.88

The fact was extracted and stored with high confidence.

Accessing Memory via API

DeerFlow exposes memory through the Gateway API:

Get Current Memory

Terminal
curl http://localhost:2026/api/memory
Response
{
"success": true,
"data": {
"userContext": {
"workContext": "Software engineer working on AI projects...",
"personalContext": "Based in San Francisco...",
"topOfMind": "Planning a product launch..."
},
"facts": [
{
"id": "fact-001",
"content": "Prefers Python over JavaScript",
"category": "preference",
"confidence": 0.9
}
],
"history": {
"recentMonths": "...",
"earlierContext": "...",
"longTermBackground": "..."
}
}
}

Force Memory Reload

Terminal
curl -X POST http://localhost:2026/api/memory/reload

This is useful after manual edits to memory.json.

Using the Python Client

memory_api.py
from deerflow.client import DeerFlowClient
client = DeerFlowClient()
# Get current memory
memory = client.get_memory()
print(f"Work context: {memory['userContext']['workContext']}")
# See all facts
for fact in memory['facts']:
print(f"- {fact['content']} (confidence: {fact['confidence']})")
# Force reload
client.reload_memory()

Fact Categories Explained

Facts are categorized to help with organization and retrieval:

CategoryDescriptionExample
preferenceUser preferences”Prefers dark mode”
knowledgeUser’s domain knowledge”Knows Kubernetes well”
contextSituational context”Currently debugging an auth issue”
behaviorBehavioral patterns”Usually asks for code examples”
goalUser’s goals”Building a CLI tool called deployer”

Categories help the system prioritize what to inject. Goals and preferences get higher priority.

Issues I Encountered

Not everything was perfect:

1. Over-Extraction

The LLM sometimes extracts low-value facts:

Low-value extraction
{
"content": "User said 'hello' at the start of conversation",
"category": "behavior",
"confidence": 0.3
}

This is why the confidence threshold matters. I raised mine to 0.75.

2. Fact Conflicts

When preferences change, old facts aren’t always updated:

Conflict example
Fact 1: "User prefers React" (confidence 0.8, 2 weeks ago)
Fact 2: "User now prefers Vue" (confidence 0.85, yesterday)

Both facts exist. The system should merge or replace, but currently doesn’t.

3. Privacy Concerns

Everything is stored in plain JSON. If someone accesses your memory.json, they see your complete profile.

4. No Per-Project Memory

Memory is global across all threads. If you work on multiple unrelated projects, facts get mixed together.

Comparison: Memory vs Other Approaches

ApproachPersists Across SessionsAutomatic ExtractionPersonalized Responses
Conversation history onlyNoN/ALimited
Manual memory promptsYesNoYes
External memory serviceYesVariesYes
DeerFlow MemoryYesYes (LLM-powered)Yes

DeerFlow’s advantage is automatic extraction. You don’t have to manually tell it what to remember.

When Memory Helps

Based on my testing, memory is useful when:

  1. Long-term projects: You work on something over days/weeks
  2. Consistent preferences: You want the agent to remember your coding style, tech choices
  3. Context switching: You return to a project after a break
  4. Team usage: Shared context about project goals (though this has privacy implications)

Memory is less useful for:

  1. One-off questions: Quick answers don’t need persistence
  2. Privacy-sensitive work: Everything gets stored
  3. Multi-project work: Facts get mixed together

My Recommendation

DeerFlow’s memory system is one of its most practical features. It solves a real problem—AI agents forgetting everything—without requiring manual configuration.

Enable memory if:

  • You work on long-term projects
  • You want personalized responses
  • You don’t mind your context being stored locally

Disable memory if:

  • You only ask one-off questions
  • You’re working on sensitive projects
  • You don’t want any persistence

The local-first storage approach keeps data under your control. The extraction is automatic but configurable. For developers building agents that need to remember, DeerFlow’s memory system provides a solid foundation.

Summary

DeerFlow’s memory system persists user context across sessions using LLM-powered extraction. It stores user context summaries, confidence-scored facts, and conversation history. The top 15 facts plus context are injected into system prompts for personalized responses.

I tested it with a multi-day project and found it correctly remembered project details, tech stack choices, and goals. The main limitations are fact conflicts when preferences change and lack of per-project memory isolation.

For developers who’ve struggled with AI agents that forget everything, DeerFlow’s memory system provides a production-ready solution that works out of the box.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments