How to Structure AI Agent Memory and Context for Persistent Workflows: A Tiered Approach

Mar 21, 2026

Problem

My AI agent kept forgetting everything between sessions. I’d spend 20 minutes giving it context about the project, then the next day it would start from scratch. When I tried to fix this by storing state in a database, debugging became a nightmare.

Here’s what my first attempt looked like:

class AgentWithMemory:
    def __init__(self):
        self.db = PostgresClient()
        self.context = []  # In-memory only, lost on restart

    async def process(self, input: str):
        # Load from database - but what format?
        history = self.db.query("SELECT * FROM agent_state WHERE session_id = ?")
        # How do I parse this blob?
        # What went wrong last time?
        response = await self.llm.generate(input, context=self.context)
        return response

When something went wrong, I had no idea what happened:

# Database query returns binary blob
SELECT state FROM agent_state WHERE id = 'session-123';
# What does this mean? Why did the agent fail?

What Happened?

I was treating agent memory like application state. But agent memory has different requirements:

You need to read it - When debugging, you want to open a file and see what the agent was thinking
You need version control - When the agent breaks, you want to see what changed
You need diff capability - You want to compare memory states between runs

A database gives you none of this. When my agent started acting weird mid-task, I couldn’t figure out why because the state was locked in a binary blob.

A Reddit thread confirmed my suspicion: “Your agent’s memory architecture matters more than the model you pick. Flat files beat databases for agent state in most cases. Easier to debug, easier to version, and when something goes wrong you can actually open the file and see what happened.”

How to Solve It?

I switched to a three-tier memory architecture using flat files:

agent-memory/
├── hot/                    # Current session (in-memory)
│   └── context.json
├── warm/                   # Recent activity (7 days)
│   ├── 2026-03-21.md
│   ├── 2026-03-20.md
│   └── ...
└── cold/                   # Long-term learnings
    └── MEMORY.md

Each tier has different read/write patterns and serves different purposes.

Tier 1: Hot Memory (Current Session)

Hot memory is what the agent needs right now. It lives in the context window and gets updated constantly.

from dataclasses import dataclass, asdict
from typing import List, Optional
import json

@dataclass
class HotMemory:
    """Current session context - fast read/write"""
    session_id: str
    task: str
    current_step: str
    files_modified: List[str]
    decisions: List[dict]
    pending_actions: List[str]

    def to_context_window(self) -> str:
        """Format for LLM context window"""
        return f"""
## Current Session

Task: {self.task}
Step: {self.current_step}

Files modified: {', '.join(self.files_modified) or 'None'}

Recent decisions:
{self._format_decisions()}

Pending: {', '.join(self.pending_actions) or 'None'}
"""

    def _format_decisions(self) -> str:
        if not self.decisions:
            return "No decisions yet"
        return '\n'.join([
            f"- {d['action']}: {d.get('reason', 'N/A')}"
            for d in self.decisions[-5:]  # Last 5 only
        ])

    def save(self, path: str):
        """Persist to disk (for crash recovery)"""
        with open(path, 'w') as f:
            json.dump(asdict(self), f, indent=2)

    @classmethod
    def load(cls, path: str) -> 'HotMemory':
        """Recover from disk"""
        with open(path) as f:
            data = json.load(f)
        return cls(**data)

The hot memory file is tiny and readable:

{
  "session_id": "sess-20260321-001",
  "task": "Refactor authentication module",
  "current_step": "Extracting JWT logic",
  "files_modified": ["auth/jwt.py", "auth/utils.py"],
  "decisions": [
    {"action": "Split jwt.py", "reason": "File exceeded 500 lines"},
    {"action": "Keep RS256", "reason": "Existing tokens use this algorithm"}
  ],
  "pending_actions": ["Add tests for jwt_utils.py"]
}

When the agent crashes, I can see exactly what it was doing.

Tier 2: Warm Memory (Recent Activity)

Warm memory stores what the agent did recently. I use daily markdown files because they’re human-readable and naturally organize by time.

from datetime import datetime, timedelta
from pathlib import Path
import os

class WarmMemory:
    """Recent activity log - daily markdown files"""

    def __init__(self, base_path: str, retention_days: int = 7):
        self.base_path = Path(base_path)
        self.retention_days = retention_days
        self.base_path.mkdir(parents=True, exist_ok=True)

    def log_activity(self, activity: dict):
        """Append activity to today's log"""
        today = datetime.now().strftime("%Y-%m-%d")
        log_file = self.base_path / f"{today}.md"

        timestamp = datetime.now().strftime("%H:%M:%S")
        entry = self._format_entry(timestamp, activity)

        with open(log_file, 'a') as f:
            f.write(entry + '\n')

    def _format_entry(self, timestamp: str, activity: dict) -> str:
        action = activity.get('action', 'unknown')
        details = activity.get('details', {})
        result = activity.get('result', 'in_progress')

        entry = f"### {timestamp} - {action}\n"
        if details:
            entry += "```\n"
            for k, v in details.items():
                entry += f"{k}: {v}\n"
            entry += "```\n"
        entry += f"Result: {result}\n"
        return entry

    def get_recent_context(self, days: int = 3) -> str:
        """Get summary of recent activity for context window"""
        context = "# Recent Activity\n\n"

        for i in range(days):
            date = (datetime.now() - timedelta(days=i)).strftime("%Y-%m-%d")
            log_file = self.base_path / f"{date}.md"

            if log_file.exists():
                context += f"## {date}\n"
                with open(log_file) as f:
                    context += f.read()[:2000]  # Limit size
                context += "\n"

        return context

    def cleanup_old_logs(self):
        """Remove logs older than retention period"""
        cutoff = datetime.now() - timedelta(days=self.retention_days)

        for log_file in self.base_path.glob("*.md"):
            date_str = log_file.stem
            try:
                file_date = datetime.strptime(date_str, "%Y-%m-%d")
                if file_date < cutoff:
                    log_file.unlink()
            except ValueError:
                pass  # Invalid filename, skip

The daily log files are easy to read:

### 10:15:32 - file_read

file: auth/jwt.py purpose: Analyze current structure

Result: success

### 10:16:45 - decision

action: Split jwt.py reason: File exceeded 500 lines risk: Breaking existing imports

Result: committed

### 10:18:22 - refactor

files: auth/jwt.py, auth/jwt_utils.py lines_moved: 234

Result: success

When something breaks, I can trace back through the log:

# Find what changed today
cat warm/2026-03-21.md | grep -A5 "refactor"

# Compare with yesterday
diff warm/2026-03-21.md warm/2026-03-20.md

Tier 3: Cold Memory (Long-Term Learnings)

Cold memory stores curated learnings that survive across sessions. This is where the agent remembers patterns, decisions, and knowledge.

from pathlib import Path
from datetime import datetime

class ColdMemory:
    """Long-term learnings - MEMORY.md"""

    def __init__(self, path: str):
        self.path = Path(path)

    def load(self) -> str:
        """Load long-term memory"""
        if not self.path.exists():
            return ""
        with open(self.path) as f:
            return f.read()

    def add_learning(self, category: str, learning: str):
        """Add a new learning (curated, not automatic)"""
        content = self.load()

        # Find or create category section
        if f"## {category}" not in content:
            content += f"\n## {category}\n\n"

        # Add learning with timestamp
        timestamp = datetime.now().strftime("%Y-%m-%d")
        entry = f"- [{timestamp}] {learning}\n"

        # Insert after category header
        lines = content.split('\n')
        insert_idx = None
        for i, line in enumerate(lines):
            if line == f"## {category}":
                insert_idx = i + 1
                # Skip empty lines
                while insert_idx < len(lines) and lines[insert_idx].strip() == '':
                    insert_idx += 1
                break

        if insert_idx:
            lines.insert(insert_idx, entry)

        with open(self.path, 'w') as f:
            f.write('\n'.join(lines))

    def get_relevant(self, topic: str) -> str:
        """Get learnings relevant to a topic"""
        content = self.load()
        relevant = []

        in_section = False
        for line in content.split('\n'):
            if line.startswith('## '):
                in_section = topic.lower() in line.lower()
            if in_section:
                relevant.append(line)

        return '\n'.join(relevant)

The MEMORY.md file looks like this:

# Agent Memory

Project-specific learnings and decisions.

## Authentication

- [2026-03-15] Always use RS256 for JWT - ES256 had compatibility issues with iOS 14
- [2026-03-18] Token refresh happens at 80% of expiry - not 50%, to avoid race conditions
- [2026-03-20] Never log JWT payload - may contain PII

## Database

- [2026-03-10] Use connection pooling with max 20 connections - more causes lock contention
- [2026-03-12] Always add index on foreign keys - forgot twice, caused slow queries

## Error Handling

- [2026-03-08] Never catch Exception without logging - masked a database connection issue for 3 days
- [2026-03-19] Retry on connection errors but not on validation errors

Why This Works

The three-tier architecture works because each tier matches a specific access pattern:

+----------------+------------------+-------------------+
|     HOT        |      WARM        |       COLD        |
+----------------+------------------+-------------------+
| In-memory      | Daily files      | Curated file      |
| Read: every    | Read: session    | Read: task start  |
| Write: constant| Write: activity  | Write: manually   |
| Size: KB       | Size: MB         | Size: KB          |
| TTL: session   | TTL: 7 days      | TTL: forever      |
+----------------+------------------+-------------------+

Hot memory is for what’s happening now. The agent reads and writes this constantly. It’s small because it has to fit in the context window.

Warm memory is for what happened recently. The agent reads this at session start to recover context. It’s written to automatically but has a TTL because it grows without bound.

Cold memory is for what the agent has learned. It’s curated manually because you don’t want noise - only real learnings that should persist.

The Context Window Budget

One mistake I made early: treating the context window as unlimited storage. It’s not. It’s a budget.

Context Window Budget (200K tokens example)
+------------------------------------------+
| System prompt        | 5K tokens         |
| Hot memory           | 10K tokens        |
| Warm memory (recent) | 20K tokens        |
| Cold memory          | 10K tokens        |
| Task instructions    | 5K tokens         |
| Code being worked on | 100K tokens       |
| Output buffer        | 50K tokens        |
+------------------------------------------+

When the agent started doing weird things mid-task, it was usually because stale context was crowding out what mattered. The fix was explicit summarization at checkpoints:

class ContextManager:
    def __init__(self, max_tokens: int = 200000):
        self.max_tokens = max_tokens
        self.checkpoints = []

    async def checkpoint(self, agent_state: dict):
        """Summarize and compress context at checkpoints"""

        # Estimate current token usage
        current_tokens = self._estimate_tokens(agent_state)

        if current_tokens > self.max_tokens * 0.8:  # 80% threshold
            # Summarize old context
            summary = await self._summarize_old_context(agent_state)

            # Replace old context with summary
            agent_state['context'] = summary + agent_state['recent_context']

            # Log what was compressed
            self.checkpoints.append({
                'timestamp': datetime.now(),
                'tokens_before': current_tokens,
                'tokens_after': self._estimate_tokens(agent_state)
            })

    async def _summarize_old_context(self, state: dict) -> str:
        """Ask LLM to summarize old context"""
        summary_prompt = f"""
Summarize the following context, keeping only information that might be needed later:

{state['old_context']}

Format as a list of key points.
"""
        response = await self.llm.generate(summary_prompt)
        return f"# Context Summary\n\n{response}"

Comparison: Flat Files vs Database

I tested both approaches and flat files won for these reasons:

Aspect	Flat Files	Database
Debugging	Open file, read	Query, parse binary
Version control	git diff	Custom migration
Size visibility	ls -lh	SELECT pg_size_pretty
Human readable	Yes	No
Offline access	Yes	No (need connection)
Crash recovery	Cat file	Transaction logs

The database approach looked like this:

# This is what I tried first - it was painful
class DatabaseMemory:
    def save_state(self, state: dict):
        # Serialize to JSON, then to blob
        blob = json.dumps(state).encode()
        self.db.execute(
            "INSERT INTO agent_state (id, state) VALUES (?, ?)",
            (state['id'], blob)
        )

    def load_state(self, id: str) -> dict:
        result = self.db.query(
            "SELECT state FROM agent_state WHERE id = ?",
            (id,)
        )
        # This returns a blob - good luck debugging
        return json.loads(result[0]['state'])

When something went wrong:

# I had to do this to debug
psql -d agent_db -c "SELECT encode(state, 'escape') FROM agent_state WHERE id = 'sess-123'"
# Output was still JSON blob, hard to read
# No git diff possible
# No history unless I built it myself

With flat files:

# Just open the file
cat hot/context.json

# See history
git log --oneline hot/context.json

# Compare versions
git diff HEAD~1 hot/context.json

Complete Implementation

Here’s the full memory system:

from dataclasses import dataclass
from pathlib import Path
from typing import Optional
import json

@dataclass
class AgentMemory:
    """Three-tier memory system for AI agents"""

    base_path: str
    retention_days: int = 7

    def __post_init__(self):
        self.hot = HotMemory(
            session_id="",
            task="",
            current_step="",
            files_modified=[],
            decisions=[],
            pending_actions=[]
        )
        self.warm = WarmMemory(
            base_path=f"{self.base_path}/warm",
            retention_days=self.retention_days
        )
        self.cold = ColdMemory(
            path=f"{self.base_path}/cold/MEMORY.md"
        )

    def start_session(self, session_id: str, task: str):
        """Initialize a new session"""
        self.hot.session_id = session_id
        self.hot.task = task

        # Load relevant cold memory
        relevant_learnings = self.cold.get_relevant(task)

        # Get recent activity
        recent_activity = self.warm.get_recent_context(days=3)

        # Build initial context
        context = f"""
# Session Context

## Current Task
{task}

## Relevant Learnings
{relevant_learnings}

## Recent Activity
{recent_activity}
"""
        return context

    def update_progress(self, step: str, files: list[str] = None):
        """Update current progress"""
        self.hot.current_step = step
        if files:
            self.hot.files_modified.extend(files)

        # Log to warm memory
        self.warm.log_activity({
            'action': 'progress',
            'details': {'step': step, 'files': files},
            'result': 'in_progress'
        })

    def record_decision(self, action: str, reason: str, risk: str = None):
        """Record an important decision"""
        decision = {
            'action': action,
            'reason': reason,
            'risk': risk
        }
        self.hot.decisions.append(decision)

        # Log to warm memory
        self.warm.log_activity({
            'action': 'decision',
            'details': decision,
            'result': 'committed'
        })

    def learn(self, category: str, learning: str):
        """Add to long-term memory (curated)"""
        self.cold.add_learning(category, learning)

    def get_context_window_payload(self) -> str:
        """Build payload for context window"""
        return f"""
# Agent State

{self.hot.to_context_window()}

## Long-term Memory
{self.cold.load()[:5000]}  # Limit cold memory size
"""

    def save_state(self):
        """Persist state for crash recovery"""
        self.hot.save(f"{self.base_path}/hot/context.json")

    def restore_state(self) -> bool:
        """Restore from crash"""
        path = f"{self.base_path}/hot/context.json"
        if Path(path).exists():
            self.hot = HotMemory.load(path)
            return True
        return False

Usage:

# Initialize
memory = AgentMemory(base_path="./agent-memory")

# Start session
context = memory.start_session(
    session_id="sess-20260321-001",
    task="Refactor authentication module"
)

# During work
memory.update_progress("Extracting JWT logic", files=["auth/jwt.py"])
memory.record_decision("Split jwt.py", "File exceeded 500 lines", "Breaking imports")

# Save long-term learning (manual, curated)
memory.learn("Authentication", "Always use RS256 for JWT - ES256 had iOS 14 issues")

# Build context for LLM
context_payload = memory.get_context_window_payload()

Summary

In this post, I showed how to structure AI agent memory using three tiers: hot memory for current session context, warm memory for recent activity logs in daily markdown files, and cold memory for curated long-term learnings in MEMORY.md. The key point is using flat files instead of databases because they’re easier to debug, easier to version control, and let you directly inspect what went wrong when failures occur.

The memory architecture matters more than the model you pick. When your agent starts doing weird things mid-task, it’s usually drowning in stale context. With tiered flat-file memory, you can see exactly what the agent knows, trace its decisions, and debug failures by opening a file.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: Why flat files beat databases for agent state
👨‍💻 Context Window Best Practices

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!