How to Structure AI Agent Memory and Context for Persistent Workflows: A Tiered Approach
Problem
My AI agent kept forgetting everything between sessions. I’d spend 20 minutes giving it context about the project, then the next day it would start from scratch. When I tried to fix this by storing state in a database, debugging became a nightmare.
Here’s what my first attempt looked like:
class AgentWithMemory: def __init__(self): self.db = PostgresClient() self.context = [] # In-memory only, lost on restart
async def process(self, input: str): # Load from database - but what format? history = self.db.query("SELECT * FROM agent_state WHERE session_id = ?") # How do I parse this blob? # What went wrong last time? response = await self.llm.generate(input, context=self.context) return responseWhen something went wrong, I had no idea what happened:
# Database query returns binary blobSELECT state FROM agent_state WHERE id = 'session-123';# What does this mean? Why did the agent fail?What Happened?
I was treating agent memory like application state. But agent memory has different requirements:
- You need to read it - When debugging, you want to open a file and see what the agent was thinking
- You need version control - When the agent breaks, you want to see what changed
- You need diff capability - You want to compare memory states between runs
A database gives you none of this. When my agent started acting weird mid-task, I couldn’t figure out why because the state was locked in a binary blob.
A Reddit thread confirmed my suspicion: “Your agent’s memory architecture matters more than the model you pick. Flat files beat databases for agent state in most cases. Easier to debug, easier to version, and when something goes wrong you can actually open the file and see what happened.”
How to Solve It?
I switched to a three-tier memory architecture using flat files:
agent-memory/├── hot/ # Current session (in-memory)│ └── context.json├── warm/ # Recent activity (7 days)│ ├── 2026-03-21.md│ ├── 2026-03-20.md│ └── ...└── cold/ # Long-term learnings └── MEMORY.mdEach tier has different read/write patterns and serves different purposes.
Tier 1: Hot Memory (Current Session)
Hot memory is what the agent needs right now. It lives in the context window and gets updated constantly.
from dataclasses import dataclass, asdictfrom typing import List, Optionalimport json
@dataclassclass HotMemory: """Current session context - fast read/write""" session_id: str task: str current_step: str files_modified: List[str] decisions: List[dict] pending_actions: List[str]
def to_context_window(self) -> str: """Format for LLM context window""" return f"""## Current Session
Task: {self.task}Step: {self.current_step}
Files modified: {', '.join(self.files_modified) or 'None'}
Recent decisions:{self._format_decisions()}
Pending: {', '.join(self.pending_actions) or 'None'}"""
def _format_decisions(self) -> str: if not self.decisions: return "No decisions yet" return '\n'.join([ f"- {d['action']}: {d.get('reason', 'N/A')}" for d in self.decisions[-5:] # Last 5 only ])
def save(self, path: str): """Persist to disk (for crash recovery)""" with open(path, 'w') as f: json.dump(asdict(self), f, indent=2)
@classmethod def load(cls, path: str) -> 'HotMemory': """Recover from disk""" with open(path) as f: data = json.load(f) return cls(**data)The hot memory file is tiny and readable:
{ "session_id": "sess-20260321-001", "task": "Refactor authentication module", "current_step": "Extracting JWT logic", "files_modified": ["auth/jwt.py", "auth/utils.py"], "decisions": [ {"action": "Split jwt.py", "reason": "File exceeded 500 lines"}, {"action": "Keep RS256", "reason": "Existing tokens use this algorithm"} ], "pending_actions": ["Add tests for jwt_utils.py"]}When the agent crashes, I can see exactly what it was doing.
Tier 2: Warm Memory (Recent Activity)
Warm memory stores what the agent did recently. I use daily markdown files because they’re human-readable and naturally organize by time.
from datetime import datetime, timedeltafrom pathlib import Pathimport os
class WarmMemory: """Recent activity log - daily markdown files"""
def __init__(self, base_path: str, retention_days: int = 7): self.base_path = Path(base_path) self.retention_days = retention_days self.base_path.mkdir(parents=True, exist_ok=True)
def log_activity(self, activity: dict): """Append activity to today's log""" today = datetime.now().strftime("%Y-%m-%d") log_file = self.base_path / f"{today}.md"
timestamp = datetime.now().strftime("%H:%M:%S") entry = self._format_entry(timestamp, activity)
with open(log_file, 'a') as f: f.write(entry + '\n')
def _format_entry(self, timestamp: str, activity: dict) -> str: action = activity.get('action', 'unknown') details = activity.get('details', {}) result = activity.get('result', 'in_progress')
entry = f"### {timestamp} - {action}\n" if details: entry += "```\n" for k, v in details.items(): entry += f"{k}: {v}\n" entry += "```\n" entry += f"Result: {result}\n" return entry
def get_recent_context(self, days: int = 3) -> str: """Get summary of recent activity for context window""" context = "# Recent Activity\n\n"
for i in range(days): date = (datetime.now() - timedelta(days=i)).strftime("%Y-%m-%d") log_file = self.base_path / f"{date}.md"
if log_file.exists(): context += f"## {date}\n" with open(log_file) as f: context += f.read()[:2000] # Limit size context += "\n"
return context
def cleanup_old_logs(self): """Remove logs older than retention period""" cutoff = datetime.now() - timedelta(days=self.retention_days)
for log_file in self.base_path.glob("*.md"): date_str = log_file.stem try: file_date = datetime.strptime(date_str, "%Y-%m-%d") if file_date < cutoff: log_file.unlink() except ValueError: pass # Invalid filename, skipThe daily log files are easy to read:
### 10:15:32 - file_readfile: auth/jwt.py purpose: Analyze current structure
Result: success
### 10:16:45 - decisionaction: Split jwt.py reason: File exceeded 500 lines risk: Breaking existing imports
Result: committed
### 10:18:22 - refactorfiles: auth/jwt.py, auth/jwt_utils.py lines_moved: 234
Result: successWhen something breaks, I can trace back through the log:
# Find what changed todaycat warm/2026-03-21.md | grep -A5 "refactor"
# Compare with yesterdaydiff warm/2026-03-21.md warm/2026-03-20.mdTier 3: Cold Memory (Long-Term Learnings)
Cold memory stores curated learnings that survive across sessions. This is where the agent remembers patterns, decisions, and knowledge.
from pathlib import Pathfrom datetime import datetime
class ColdMemory: """Long-term learnings - MEMORY.md"""
def __init__(self, path: str): self.path = Path(path)
def load(self) -> str: """Load long-term memory""" if not self.path.exists(): return "" with open(self.path) as f: return f.read()
def add_learning(self, category: str, learning: str): """Add a new learning (curated, not automatic)""" content = self.load()
# Find or create category section if f"## {category}" not in content: content += f"\n## {category}\n\n"
# Add learning with timestamp timestamp = datetime.now().strftime("%Y-%m-%d") entry = f"- [{timestamp}] {learning}\n"
# Insert after category header lines = content.split('\n') insert_idx = None for i, line in enumerate(lines): if line == f"## {category}": insert_idx = i + 1 # Skip empty lines while insert_idx < len(lines) and lines[insert_idx].strip() == '': insert_idx += 1 break
if insert_idx: lines.insert(insert_idx, entry)
with open(self.path, 'w') as f: f.write('\n'.join(lines))
def get_relevant(self, topic: str) -> str: """Get learnings relevant to a topic""" content = self.load() relevant = []
in_section = False for line in content.split('\n'): if line.startswith('## '): in_section = topic.lower() in line.lower() if in_section: relevant.append(line)
return '\n'.join(relevant)The MEMORY.md file looks like this:
# Agent Memory
Project-specific learnings and decisions.
## Authentication
- [2026-03-15] Always use RS256 for JWT - ES256 had compatibility issues with iOS 14- [2026-03-18] Token refresh happens at 80% of expiry - not 50%, to avoid race conditions- [2026-03-20] Never log JWT payload - may contain PII
## Database
- [2026-03-10] Use connection pooling with max 20 connections - more causes lock contention- [2026-03-12] Always add index on foreign keys - forgot twice, caused slow queries
## Error Handling
- [2026-03-08] Never catch Exception without logging - masked a database connection issue for 3 days- [2026-03-19] Retry on connection errors but not on validation errorsWhy This Works
The three-tier architecture works because each tier matches a specific access pattern:
+----------------+------------------+-------------------+| HOT | WARM | COLD |+----------------+------------------+-------------------+| In-memory | Daily files | Curated file || Read: every | Read: session | Read: task start || Write: constant| Write: activity | Write: manually || Size: KB | Size: MB | Size: KB || TTL: session | TTL: 7 days | TTL: forever |+----------------+------------------+-------------------+Hot memory is for what’s happening now. The agent reads and writes this constantly. It’s small because it has to fit in the context window.
Warm memory is for what happened recently. The agent reads this at session start to recover context. It’s written to automatically but has a TTL because it grows without bound.
Cold memory is for what the agent has learned. It’s curated manually because you don’t want noise - only real learnings that should persist.
The Context Window Budget
One mistake I made early: treating the context window as unlimited storage. It’s not. It’s a budget.
Context Window Budget (200K tokens example)+------------------------------------------+| System prompt | 5K tokens || Hot memory | 10K tokens || Warm memory (recent) | 20K tokens || Cold memory | 10K tokens || Task instructions | 5K tokens || Code being worked on | 100K tokens || Output buffer | 50K tokens |+------------------------------------------+When the agent started doing weird things mid-task, it was usually because stale context was crowding out what mattered. The fix was explicit summarization at checkpoints:
class ContextManager: def __init__(self, max_tokens: int = 200000): self.max_tokens = max_tokens self.checkpoints = []
async def checkpoint(self, agent_state: dict): """Summarize and compress context at checkpoints"""
# Estimate current token usage current_tokens = self._estimate_tokens(agent_state)
if current_tokens > self.max_tokens * 0.8: # 80% threshold # Summarize old context summary = await self._summarize_old_context(agent_state)
# Replace old context with summary agent_state['context'] = summary + agent_state['recent_context']
# Log what was compressed self.checkpoints.append({ 'timestamp': datetime.now(), 'tokens_before': current_tokens, 'tokens_after': self._estimate_tokens(agent_state) })
async def _summarize_old_context(self, state: dict) -> str: """Ask LLM to summarize old context""" summary_prompt = f"""Summarize the following context, keeping only information that might be needed later:
{state['old_context']}
Format as a list of key points.""" response = await self.llm.generate(summary_prompt) return f"# Context Summary\n\n{response}"Comparison: Flat Files vs Database
I tested both approaches and flat files won for these reasons:
| Aspect | Flat Files | Database |
|---|---|---|
| Debugging | Open file, read | Query, parse binary |
| Version control | git diff | Custom migration |
| Size visibility | ls -lh | SELECT pg_size_pretty |
| Human readable | Yes | No |
| Offline access | Yes | No (need connection) |
| Crash recovery | Cat file | Transaction logs |
The database approach looked like this:
# This is what I tried first - it was painfulclass DatabaseMemory: def save_state(self, state: dict): # Serialize to JSON, then to blob blob = json.dumps(state).encode() self.db.execute( "INSERT INTO agent_state (id, state) VALUES (?, ?)", (state['id'], blob) )
def load_state(self, id: str) -> dict: result = self.db.query( "SELECT state FROM agent_state WHERE id = ?", (id,) ) # This returns a blob - good luck debugging return json.loads(result[0]['state'])When something went wrong:
# I had to do this to debugpsql -d agent_db -c "SELECT encode(state, 'escape') FROM agent_state WHERE id = 'sess-123'"# Output was still JSON blob, hard to read# No git diff possible# No history unless I built it myselfWith flat files:
# Just open the filecat hot/context.json
# See historygit log --oneline hot/context.json
# Compare versionsgit diff HEAD~1 hot/context.jsonComplete Implementation
Here’s the full memory system:
from dataclasses import dataclassfrom pathlib import Pathfrom typing import Optionalimport json
@dataclassclass AgentMemory: """Three-tier memory system for AI agents"""
base_path: str retention_days: int = 7
def __post_init__(self): self.hot = HotMemory( session_id="", task="", current_step="", files_modified=[], decisions=[], pending_actions=[] ) self.warm = WarmMemory( base_path=f"{self.base_path}/warm", retention_days=self.retention_days ) self.cold = ColdMemory( path=f"{self.base_path}/cold/MEMORY.md" )
def start_session(self, session_id: str, task: str): """Initialize a new session""" self.hot.session_id = session_id self.hot.task = task
# Load relevant cold memory relevant_learnings = self.cold.get_relevant(task)
# Get recent activity recent_activity = self.warm.get_recent_context(days=3)
# Build initial context context = f"""# Session Context
## Current Task{task}
## Relevant Learnings{relevant_learnings}
## Recent Activity{recent_activity}""" return context
def update_progress(self, step: str, files: list[str] = None): """Update current progress""" self.hot.current_step = step if files: self.hot.files_modified.extend(files)
# Log to warm memory self.warm.log_activity({ 'action': 'progress', 'details': {'step': step, 'files': files}, 'result': 'in_progress' })
def record_decision(self, action: str, reason: str, risk: str = None): """Record an important decision""" decision = { 'action': action, 'reason': reason, 'risk': risk } self.hot.decisions.append(decision)
# Log to warm memory self.warm.log_activity({ 'action': 'decision', 'details': decision, 'result': 'committed' })
def learn(self, category: str, learning: str): """Add to long-term memory (curated)""" self.cold.add_learning(category, learning)
def get_context_window_payload(self) -> str: """Build payload for context window""" return f"""# Agent State
{self.hot.to_context_window()}
## Long-term Memory{self.cold.load()[:5000]} # Limit cold memory size"""
def save_state(self): """Persist state for crash recovery""" self.hot.save(f"{self.base_path}/hot/context.json")
def restore_state(self) -> bool: """Restore from crash""" path = f"{self.base_path}/hot/context.json" if Path(path).exists(): self.hot = HotMemory.load(path) return True return FalseUsage:
# Initializememory = AgentMemory(base_path="./agent-memory")
# Start sessioncontext = memory.start_session( session_id="sess-20260321-001", task="Refactor authentication module")
# During workmemory.update_progress("Extracting JWT logic", files=["auth/jwt.py"])memory.record_decision("Split jwt.py", "File exceeded 500 lines", "Breaking imports")
# Save long-term learning (manual, curated)memory.learn("Authentication", "Always use RS256 for JWT - ES256 had iOS 14 issues")
# Build context for LLMcontext_payload = memory.get_context_window_payload()Summary
In this post, I showed how to structure AI agent memory using three tiers: hot memory for current session context, warm memory for recent activity logs in daily markdown files, and cold memory for curated long-term learnings in MEMORY.md. The key point is using flat files instead of databases because they’re easier to debug, easier to version control, and let you directly inspect what went wrong when failures occur.
The memory architecture matters more than the model you pick. When your agent starts doing weird things mid-task, it’s usually drowning in stale context. With tiered flat-file memory, you can see exactly what the agent knows, trace its decisions, and debug failures by opening a file.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments