How DeerFlow Memory System Persists User Context Across Sessions
Purpose
I’ve been frustrated with how AI agents forget everything. Every new conversation starts from zero. I have to restate my preferences, re-explain my project context, and remind the agent what I’m working on.
When I explored DeerFlow, I found it has a built-in memory system that actually persists across sessions. Not just conversation history—structured facts about me that get extracted and stored automatically.
This post explains how DeerFlow’s memory system works, how to configure it, and whether it solves the “AI agents forget everything” problem.
The Problem: Agents Have No Memory
Standard AI agents have no persistence:
Day 1: Me: "I'm building a React app with TypeScript" Agent: "Got it, I'll help with that"
Day 2 (new session): Me: "Add authentication to my app" Agent: "What tech stack are you using?" Me: (frustrated) "React with TypeScript, like I said yesterday"This happens because:
- Session-only context: Each conversation starts fresh
- No fact extraction: Agents don’t identify and store important details
- No retrieval: Even if stored, previous context isn’t injected
I wanted to understand how DeerFlow solves this.
What DeerFlow Memory Does
DeerFlow’s memory system provides three things:
- User Context: Short summaries about who you are (work, personal, current focus)
- Facts: Discrete, confidence-scored knowledge (preferences, knowledge, goals)
- History: Time-based conversation summaries
The key insight: memory is extracted automatically using the LLM itself, then injected into future conversations.
Exploring the Memory Storage
I started by examining where memory is stored:
cd deer-flowfind . -name "memory.json"./backend/.deer-flow/memory.jsonLet me check the structure:
cat ./backend/.deer-flow/memory.json | python3 -m json.tool | head -80{ "userContext": { "workContext": "Software engineer working on AI projects with Python and TypeScript", "personalContext": "Based in San Francisco, interested in hiking and photography", "topOfMind": "Currently planning a product launch for Q2 2026" }, "facts": [ { "id": "fact-2026031510", "content": "User prefers Python over JavaScript for backend development", "category": "preference", "confidence": 0.9, "createdAt": "2026-03-15T10:00:00Z", "source": "conversation-abc123" }, { "id": "fact-2026031511", "content": "User is building a task management app called TaskFlow", "category": "goal", "confidence": 0.85, "createdAt": "2026-03-15T11:30:00Z", "source": "conversation-def456" } ], "history": { "recentMonths": "March 2026: Worked on DeerFlow integration, researched agent memory systems...", "earlierContext": "Previous projects include a microservices architecture and a data pipeline...", "longTermBackground": "Software engineer with 8+ years experience in distributed systems..." }}I was impressed. The system had extracted structured information from my conversations without any manual configuration.
How Memory Extraction Works
I wanted to understand the extraction process. Here’s what I found:
Step 1: Message Filtering via MemoryMiddleware
Not all messages should be processed for memory. The MemoryMiddleware filters to relevant messages:
Raw conversation: [system prompt] [user: "Hi"] <- Skip: greeting [assistant: "Hello"] <- Skip: greeting [user: "I prefer Python"] <- Keep: user preference [assistant: "Got it"] <- Skip: acknowledgment [user: "Create a file"] <- Skip: action request [assistant: "Created..."] <- Keep: AI response with action result
Filtered messages -> Memory queueThe middleware keeps:
- User inputs (especially those with content about preferences, context, goals)
- Final AI responses (not intermediate thinking steps)
Step 2: Debounced Processing
Memory updates don’t happen immediately. The system waits 30 seconds (configurable) before processing:
T+0s: User sends messageT+10s: User sends follow-upT+25s: User sends another messageT+55s: (30 seconds after last message) -> Process all queued messagesThis approach:
- Batches multiple messages efficiently
- Avoids processing incomplete conversations
- Deduplicates updates per thread
Step 3: LLM-Powered Extraction
A background worker invokes the LLM to extract memory:
# Simplified extraction logicasync def extract_memory(messages: list, current_memory: dict) -> dict: prompt = f""" Analyze the conversation and extract: 1. User context updates (work, personal, current focus) 2. New facts about the user (preferences, knowledge, goals) 3. History updates (what happened in this conversation)
Current memory: {json.dumps(current_memory, indent=2)}
Recent messages: {format_messages(messages)}
Output JSON with: - context_updates: {{workContext, personalContext, topOfMind}} - new_facts: [{{content, category, confidence}}] - history_update: string """
response = await llm.invoke(prompt) return parse_memory_updates(response)The LLM identifies what’s worth remembering and assigns confidence scores.
Step 4: Confidence-Based Fact Storage
Facts are only stored if confidence exceeds the threshold:
def should_store_fact(fact: dict, config: dict) -> bool: """Only store facts with sufficient confidence"""
threshold = config.get('fact_confidence_threshold', 0.7)
if fact['confidence'] < threshold: logger.info(f"Skipping low-confidence fact: {fact['content']}") return False
return TrueThe default threshold is 0.7 (70% confidence). This prevents low-quality extractions from polluting memory.
Step 5: Atomic Storage
Memory is written atomically to prevent corruption:
def save_memory(memory: dict, storage_path: str): """Write memory atomically to prevent corruption"""
temp_path = f"{storage_path}.tmp"
# Write to temp file first with open(temp_path, 'w') as f: json.dump(memory, f, indent=2)
# Atomic rename os.rename(temp_path, storage_path)This ensures no partial writes if the process crashes.
How Memory Gets Injected
When a new conversation starts, memory is injected into the system prompt:
You are a helpful AI assistant with access to persistent memory.
<memory>## User Context- Work: Software engineer working on AI projects with Python and TypeScript- Personal: Based in San Francisco, interested in hiking- Current Focus: Planning a product launch for Q2 2026
## Recent Facts (Top 15)1. Prefers Python over JavaScript for backend (confidence: 0.9)2. Building TaskFlow app (confidence: 0.85)3. Uses FastAPI and PostgreSQL stack (confidence: 0.8)4. Interested in agent memory systems (confidence: 0.75)...
## History- Recent: Worked on DeerFlow integration, researched memory systems- Earlier: Built microservices architecture, data pipelines</memory>
Use this context to provide personalized responses.The injection includes:
- User context summaries
- Top 15 facts (sorted by confidence and recency)
- History summaries
Configuration Options
I examined the memory configuration:
memory: enabled: true # Master switch injection_enabled: true # Inject into system prompt storage_path: backend/.deer-flow/memory.json
# Extraction settings debounce_seconds: 30 # Wait before processing model_name: null # Use default model (or specify)
# Fact settings max_facts: 100 # Maximum stored facts fact_confidence_threshold: 0.7 # Minimum confidence to store
# Injection settings max_injection_tokens: 2000 # Max tokens in system promptDisabling Memory
If you don’t want memory:
memory: enabled: falseDisabling Injection Only
To extract memory but not inject it:
memory: enabled: true injection_enabled: falseThis is useful for audit/logging purposes without affecting responses.
Testing Memory in Practice
I ran a test to see memory in action:
Day 1: Establish Context
from deerflow.client import DeerFlowClient
client = DeerFlowClient()
# Tell it about my projectclient.chat( "I'm building a CLI tool called 'deployer' that automates " "deployment to Kubernetes. The tech stack is Go and I'm using " "the Cobra library for CLI parsing.", thread_id="deployer-project")
# Ask for helpresponse = client.chat( "Suggest a good structure for the command hierarchy", thread_id="deployer-project")print(response)Day 2: Test Memory Retrieval
from deerflow.client import DeerFlowClientimport time
client = DeerFlowClient()
# New session - does it remember?response = client.chat( "Generate a README for the project we discussed yesterday", thread_id="deployer-project-readme" # Different thread ID)print(response)The response included correct references to:
- Project name: “deployer”
- Purpose: Kubernetes deployment automation
- Tech stack: Go with Cobra
I checked the memory file after:
cat ./backend/.deer-flow/memory.json | grep -A 5 "deployer""content": "User is building a CLI tool called deployer for Kubernetes deployment automation","category": "goal","confidence": 0.88The fact was extracted and stored with high confidence.
Accessing Memory via API
DeerFlow exposes memory through the Gateway API:
Get Current Memory
curl http://localhost:2026/api/memory{ "success": true, "data": { "userContext": { "workContext": "Software engineer working on AI projects...", "personalContext": "Based in San Francisco...", "topOfMind": "Planning a product launch..." }, "facts": [ { "id": "fact-001", "content": "Prefers Python over JavaScript", "category": "preference", "confidence": 0.9 } ], "history": { "recentMonths": "...", "earlierContext": "...", "longTermBackground": "..." } }}Force Memory Reload
curl -X POST http://localhost:2026/api/memory/reloadThis is useful after manual edits to memory.json.
Using the Python Client
from deerflow.client import DeerFlowClient
client = DeerFlowClient()
# Get current memorymemory = client.get_memory()print(f"Work context: {memory['userContext']['workContext']}")
# See all factsfor fact in memory['facts']: print(f"- {fact['content']} (confidence: {fact['confidence']})")
# Force reloadclient.reload_memory()Fact Categories Explained
Facts are categorized to help with organization and retrieval:
| Category | Description | Example |
|---|---|---|
preference | User preferences | ”Prefers dark mode” |
knowledge | User’s domain knowledge | ”Knows Kubernetes well” |
context | Situational context | ”Currently debugging an auth issue” |
behavior | Behavioral patterns | ”Usually asks for code examples” |
goal | User’s goals | ”Building a CLI tool called deployer” |
Categories help the system prioritize what to inject. Goals and preferences get higher priority.
Issues I Encountered
Not everything was perfect:
1. Over-Extraction
The LLM sometimes extracts low-value facts:
{ "content": "User said 'hello' at the start of conversation", "category": "behavior", "confidence": 0.3}This is why the confidence threshold matters. I raised mine to 0.75.
2. Fact Conflicts
When preferences change, old facts aren’t always updated:
Fact 1: "User prefers React" (confidence 0.8, 2 weeks ago)Fact 2: "User now prefers Vue" (confidence 0.85, yesterday)Both facts exist. The system should merge or replace, but currently doesn’t.
3. Privacy Concerns
Everything is stored in plain JSON. If someone accesses your memory.json, they see your complete profile.
4. No Per-Project Memory
Memory is global across all threads. If you work on multiple unrelated projects, facts get mixed together.
Comparison: Memory vs Other Approaches
| Approach | Persists Across Sessions | Automatic Extraction | Personalized Responses |
|---|---|---|---|
| Conversation history only | No | N/A | Limited |
| Manual memory prompts | Yes | No | Yes |
| External memory service | Yes | Varies | Yes |
| DeerFlow Memory | Yes | Yes (LLM-powered) | Yes |
DeerFlow’s advantage is automatic extraction. You don’t have to manually tell it what to remember.
When Memory Helps
Based on my testing, memory is useful when:
- Long-term projects: You work on something over days/weeks
- Consistent preferences: You want the agent to remember your coding style, tech choices
- Context switching: You return to a project after a break
- Team usage: Shared context about project goals (though this has privacy implications)
Memory is less useful for:
- One-off questions: Quick answers don’t need persistence
- Privacy-sensitive work: Everything gets stored
- Multi-project work: Facts get mixed together
My Recommendation
DeerFlow’s memory system is one of its most practical features. It solves a real problem—AI agents forgetting everything—without requiring manual configuration.
Enable memory if:
- You work on long-term projects
- You want personalized responses
- You don’t mind your context being stored locally
Disable memory if:
- You only ask one-off questions
- You’re working on sensitive projects
- You don’t want any persistence
The local-first storage approach keeps data under your control. The extraction is automatic but configurable. For developers building agents that need to remember, DeerFlow’s memory system provides a solid foundation.
Summary
DeerFlow’s memory system persists user context across sessions using LLM-powered extraction. It stores user context summaries, confidence-scored facts, and conversation history. The top 15 facts plus context are injected into system prompts for personalized responses.
I tested it with a multi-day project and found it correctly remembered project details, tech stack choices, and goals. The main limitations are fact conflicts when preferences change and lack of per-project memory isolation.
For developers who’ve struggled with AI agents that forget everything, DeerFlow’s memory system provides a production-ready solution that works out of the box.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments