How to Handle Memory and State in AI Agents for Production Reliability

Mar 19, 2026

I built an AI agent that worked perfectly in development. Then I deployed it to production, and it fell apart. Conversations lost context between requests. Failures were impossible to debug because I couldn’t reproduce them. Tool selection decisions were a black box.

The root cause? I had neglected memory and state management.

The Problem

AI agents make decisions based on context. Without proper persistence, that context vanishes between runs. In production, this creates several issues:

Unreproducible failures: When an agent fails, you have no way to trace what happened
Lost progress: Multi-step workflows lose their place when interrupted
Silent failures: Nobody tracks memory consumption or state transitions
Trust deficit: Stakeholders can’t verify that agents behave predictably

A Reddit discussion on AI agent stacks captured this perfectly: “Memory is the part nobody tracks, and agents fail quickly without it.”

Another developer noted: “Most frameworks let the model decide which tool to call… In production it means you cannot reproduce failures, cannot trace decisions, and cannot trust outputs without manually checking them.”

The consensus? Memory and state management matter more than the framework itself.

Environment

Python 3.11
LangGraph for state workflow management
ChromaDB for persistent vector storage
Pydantic for schema validation

What Happened

My original agent ran everything in-memory. Each request started fresh, with no connection to previous interactions. This worked for simple queries but broke down for:

Multi-turn conversations where context mattered
Long-running tasks that could be interrupted
Debugging production failures
Understanding why the model made specific decisions

I needed a system that could persist state across sessions, replay decisions, and provide observability.

How to Solve

Step 1: Define Your State Schema

First, decide what data needs to persist between turns. I use Pydantic models for validation:

from pydantic import BaseModel
from typing import List, Optional, Dict, Any
from datetime import datetime

class AgentState(BaseModel):
    messages: List[Dict[str, Any]] = []
    current_task: Optional[str] = None
    tool_outputs: Dict[str, Any] = {}
    retry_count: int = 0
    last_decision: Optional[str] = None
    created_at: datetime = datetime.now()

    class Config:
        arbitrary_types_allowed = True

This schema captures conversation history, the current task, tool outputs, and retry state.

Step 2: Set Up Persistent Memory with ChromaDB

ChromaDB stores long-term memory with semantic retrieval:

from chromadb import Client
from chromadb.config import Settings

# Initialize ChromaDB with persistence
chroma = Client(Settings(
    chroma_db_impl="duckdb+parquet",
    persist_directory="./chroma_data"
))

memory_collection = chroma.get_or_create_collection(
    name="agent_memory",
    metadata={"hnsw:space": "cosine"}
)

def store_memory(session_id: str, content: str, metadata: dict = None):
    """Store a memory with semantic indexing."""
    memory_collection.add(
        documents=[content],
        metadatas=[metadata or {}],
        ids=[f"{session_id}-{len(memory_collection.get()['ids'])}"]
    )

def retrieve_relevant_memory(query: str, n_results: int = 3):
    """Find semantically similar past interactions."""
    results = memory_collection.query(
        query_texts=[query],
        n_results=n_results
    )
    return results

Step 3: Build the State Graph with LangGraph

LangGraph provides explicit state graphs with built-in persistence:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from state_schema import AgentState

# Build the workflow graph
workflow = StateGraph(AgentState)

def process_input(state: AgentState) -> dict:
    """Process input and retrieve relevant context."""
    from memory_store import retrieve_relevant_memory

    if state.current_task:
        # Check memory for relevant context
        results = retrieve_relevant_memory(state.current_task)
        context_msg = f"Relevant past context: {results['documents']}"

        return {
            "messages": [*state.messages, {"role": "system", "content": context_msg}]
        }
    return {}

def execute_tools(state: AgentState) -> dict:
    """Execute required tools and capture outputs."""
    # Tool execution logic here
    outputs = {"tool_result": "example output"}
    return {"tool_outputs": {**state.tool_outputs, **outputs}}

def generate_response(state: AgentState) -> dict:
    """Generate final response based on state."""
    # Response generation logic here
    return {"last_decision": "completed"}

# Add nodes to workflow
workflow.add_node("process", process_input)
workflow.add_node("execute", execute_tools)
workflow.add_node("respond", generate_response)

# Define edges
workflow.set_entry_point("process")
workflow.add_edge("process", "execute")
workflow.add_edge("execute", "respond")
workflow.add_edge("respond", END)

Step 4: Add Checkpointing for Production

Checkpointing enables resumption from any point:

from langgraph.checkpoint.memory import MemorySaver
from agent_graph import workflow

# Enable checkpointing
checkpointer = MemorySaver()
app = workflow.compile(checkpointer=checkpointer)

# Run with thread_id for session persistence
result = app.invoke(
    {"current_task": "Analyze sales data"},
    config={"configurable": {"thread_id": "user-123-session"}}
)

Step 5: Implement Tracing for Observability

Log all state transitions for debugging:

from datetime import datetime
import uuid
from state_schema import AgentState

def trace_state_transition(
    from_state: AgentState,
    to_state: AgentState,
    decision: str
) -> dict:
    """Log all state changes for reproducibility."""
    return {
        "trace_id": str(uuid.uuid4()),
        "timestamp": datetime.now().isoformat(),
        "from_state": from_state.model_dump(),
        "to_state": to_state.model_dump(),
        "decision": decision
    }

# Example usage in a node
def traced_node(state: AgentState) -> dict:
    old_state = state.model_copy()
    # ... do work ...
    new_state = AgentState(**{**state.model_dump(), "retry_count": state.retry_count + 1})

    trace = trace_state_transition(old_state, new_state, "retry_attempted")
    print(f"[TRACE] {trace['trace_id']}: {decision}")

    return {"retry_count": new_state.retry_count}

Why This Works

LangGraph provides structure: Each node has access to a shared state object. State transitions are logged and replayable. You can trace exactly which tool was called and why.

ChromaDB enables persistence: Conversations, learned facts, and user preferences survive restarts. Semantic retrieval brings relevant context back when needed.

Checkpointing enables recovery: Long workflows can resume from failure points. You can roll back failed branches without starting over.

Schema validation catches errors early: Pydantic models enforce consistency. Bad state data fails fast rather than corrupting downstream.

The trade-off: More state tracking means more complexity and storage overhead. But for production systems, this investment pays off in reliability and debuggability.

Common Mistakes to Avoid

No persistence at all: Running agents in-memory without any state saving
Trusting framework defaults: Assuming the framework handles state without configuration
Storing everything: Hoarding all data without pruning leads to bloat
Ignoring state schema: Not defining what goes into state leads to inconsistent data
No failure recovery: Not planning for how to resume after crashes
Skipping observability: Building state management but not logging transitions

Summary

In this post, I showed how to implement memory and state management for AI agents using LangGraph and ChromaDB. The key insight is that state tracking is more critical than framework choice for production reliability. Start with a minimal state schema, add checkpointing for long workflows, and always implement tracing for debugging. Without proper state management, production agents become unreliable black boxes that fail silently and cannot be debugged.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 LangGraph Documentation
👨‍💻 ChromaDB

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!