How to Build Multi-Agent Systems with Persistent Memory for Business Automation

Mar 30, 2026

Purpose

This post shows how to build multi-agent systems with persistent memory for business automation.

Problem

I built a single AI agent to handle my business operations. It worked for simple tasks. When I asked it to handle complex workflows across multiple days, everything broke.

Day 1: "Set up meeting with John" - Agent creates calendar event
Day 2: "What did I schedule with John?" - Agent: "I don't know who John is"

# No memory across sessions
# No coordination between tasks
# Using expensive model for simple operations

The agent forgot context between sessions. It used Claude Sonnet for every task, even simple ones. My monthly AI cost hit $200 for basic operations.

Environment

Python 3.12
LangGraph 0.2 for orchestration
Mem0 for persistent memory
ChromaDB as vector database backend
DeepSeek V3.2, Claude Haiku, Claude Sonnet for different agent roles

Solution

A multi-agent system with persistent memory needs three components:

Orchestration Layer (LangGraph) - Routes tasks to correct agents
Memory Layer (Mem0) - Stores facts, decisions, and preferences
Storage Layer (ChromaDB) - Vector embeddings for memory search

Architecture Overview

I designed the system with a router agent that dispatches tasks to specialized agents:

                    ┌─────────────────────────────────────────────┐
                    │           ROUTER AGENT (DeepSeek)           │
                    │   Classifies task, dispatches to specialist │
                    └─────────────────┬───────────────────────────┘
                                      │
           ┌──────────────────────────┼──────────────────────────┐
           │                          │                          │
           ▼                          ▼                          ▼
┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   OPS ENGINE     │    │   OUTREACH QA    │    │   WEBMASTER MGR │
│   (MiniMax)      │    │   (Claude Haiku) │    │   (Claude Sonnet)│
│                  │    │                  │    │                  │
│ Bulk operations  │    │ Draft emails     │    │ Complex decisions│
│ Batch processing │    │ Simple content   │    │ Negotiations     │
└──────────────────┘    └──────────────────┘    └──────────────────┘
           │                          │                          │
           └──────────────────────────┼──────────────────────────┘
                                      │
                                      ▼
                    ┌─────────────────────────────────────────────┐
                    │              Mem0 + ChromaDB                │
                    │   Persistent Memory Storage & Retrieval     │
                    │                                             │
                    │   Facts: "User prefers email over Slack"   │
                    │   Decisions: "Approved budget of $500"     │
                    │   Preferences: "Meeting times: 9am-11am"   │
                    └─────────────────────────────────────────────┘

Memory Layers

Mem0 provides four memory layers with different lifetimes:

┌─────────────────┬───────────────────┬───────────────────────────────┐
│ Layer           │ Lifetime          │ Best For                      │
├─────────────────┼───────────────────┼───────────────────────────────┤
│ Conversation    │ Single response   │ Tool execution details        │
│ Session         │ Minutes to hours  │ Multi-step workflow context   │
│ User            │ Weeks to forever  │ Personal preferences          │
│ Organizational  │ Global            │ Shared FAQs, policies         │
└─────────────────┴───────────────────┴───────────────────────────────┘

I implemented this hierarchy to match business needs:

from mem0 import Memory
from enum import Enum

class MemoryLayer(Enum):
    CONVERSATION = "conversation"  # Dies after response
    SESSION = "session"           # Lives during workflow
    USER = "user"                 # Persists per user
    ORGANIZATIONAL = "org"        # Shared across all

class MemoryManager:
    def __init__(self, chroma_client):
        self.memory = Memory(
            backend=chroma_client,
            config={
                "vector_store": {
                    "provider": "chroma",
                    "config": {
                        "collection_name": "agent_memory",
                        "path": "./chroma_db"
                    }
                }
            }
        )

    async def store_fact(self, user_id: str, fact: str, layer: MemoryLayer):
        """Store a fact in the appropriate memory layer"""
        await self.memory.add(
            messages=[{"role": "system", "content": fact}],
            user_id=user_id,
            metadata={
                "layer": layer.value,
                "timestamp": datetime.now().isoformat()
            }
        )

    async def get_relevant_context(self, user_id: str, query: str) -> list:
        """Retrieve relevant facts for current task"""
        results = await self.memory.search(
            query=query,
            user_id=user_id,
            limit=10
        )
        return results

When I store a user preference:

# Store preference
await memory.store_fact(
    user_id="user_123",
    fact="User prefers email communication over Slack for important updates",
    layer=MemoryLayer.USER
)

# Later, retrieve it
context = await memory.get_relevant_context(
    user_id="user_123",
    query="How should I notify about the budget approval?"
)

# Output: [{"content": "User prefers email...", "score": 0.92}]

Agent Router with LangGraph

I built a router that classifies tasks and dispatches to specialized agents:

from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal
from enum import Enum

class TaskType(Enum):
    SIMPLE_OPERATION = "simple"      # MiniMax handles
    CONTENT_DRAFT = "draft"          # Claude Haiku handles
    COMPLEX_DECISION = "complex"     # Claude Sonnet handles

class AgentState(TypedDict):
    input: str
    task_type: TaskType
    user_id: str
    context: list
    result: str
    cost: float

class AgentRouter:
    def __init__(self, memory_manager, agents):
        self.memory = memory_manager
        self.agents = agents
        self.graph = self._build_graph()

    def _build_graph(self) -> StateGraph:
        """Build the routing graph"""
        workflow = StateGraph(AgentState)

        # Add nodes
        workflow.add_node("classify", self.classify_task)
        workflow.add_node("retrieve_context", self.retrieve_context)
        workflow.add_node("dispatch", self.dispatch_to_agent)
        workflow.add_node("store_result", self.store_result)

        # Add edges
        workflow.set_entry_point("classify")
        workflow.add_edge("classify", "retrieve_context")
        workflow.add_edge("retrieve_context", "dispatch")
        workflow.add_edge("dispatch", "store_result")
        workflow.add_edge("store_result", END)

        return workflow.compile()

    async def classify_task(self, state: AgentState) -> AgentState:
        """Classify task complexity using DeepSeek (cheap)"""
        classification_prompt = f"""
        Classify this task complexity:
        Task: {state['input']}

        Options:
        - SIMPLE_OPERATION: Bulk tasks, batch processing, data retrieval
        - CONTENT_DRAFT: Writing emails, drafting content, simple summaries
        - COMPLEX_DECISION: Negotiations, strategic decisions, technical architecture

        Return only the classification type.
        """

        response = await self.agents["classifier"].generate(classification_prompt)
        state["task_type"] = TaskType(response.strip().upper())
        return state

    async def retrieve_context(self, state: AgentState) -> AgentState:
        """Get relevant memory context"""
        context = await self.memory.get_relevant_context(
            user_id=state["user_id"],
            query=state["input"]
        )
        state["context"] = context
        return state

    async def dispatch_to_agent(self, state: AgentState) -> AgentState:
        """Route to appropriate specialist agent"""
        agent_map = {
            TaskType.SIMPLE_OPERATION: "ops_engine",
            TaskType.CONTENT_DRAFT: "outreach_qa",
            TaskType.COMPLEX_DECISION: "webmaster_mgr"
        }

        agent_name = agent_map[state["task_type"]]
        agent = self.agents[agent_name]

        # Build prompt with context
        context_str = "\n".join([c["content"] for c in state["context"]])
        prompt = f"""
        Context from memory:
        {context_str}

        Task: {state['input']}
        """

        result = await agent.generate(prompt)
        state["result"] = result
        state["cost"] = agent.get_cost()
        return state

    async def store_result(self, state: AgentState) -> AgentState:
        """Store decision in memory"""
        await self.memory.store_fact(
            user_id=state["user_id"],
            fact=f"Decision made: {state['result']}",
            layer=MemoryLayer.SESSION
        )
        return state

When I run the router:

# Initialize router
router = AgentRouter(memory_manager, agents)

# Process task
result = await router.invoke({
    "input": "Draft an email to John about the meeting schedule",
    "user_id": "user_123"
})

# Classification flow:
# 1. classify -> CONTENT_DRAFT (using DeepSeek, $0.001)
# 2. retrieve_context -> ["User prefers email...", "Meeting times: 9am"]
# 3. dispatch -> outreach_qa (Claude Haiku, $0.01)
# 4. store_result -> Memory saved

Multi-Agent Configuration

I configured each agent with a model matched to its task complexity:

from langchain_openai import ChatOpenAI
from anthropic import Anthropic
import requests

class AgentConfig:
    """Agent configuration with cost tracking"""

    agents = {
        "classifier": {
            "model": "deepseek-chat",
            "provider": "deepseek",
            "cost_per_1k_tokens": 0.0014,  # $0.14 per million
            "role": "Task classification only"
        },
        "ops_engine": {
            "model": "MiniMax-m2.1",
            "provider": "minimax",
            "cost_per_1k_tokens": 0.002,
            "role": "Bulk operations, batch processing"
        },
        "outreach_qa": {
            "model": "claude-3-5-haiku",
            "provider": "anthropic",
            "cost_per_1k_tokens": 0.008,  # $0.80 per million
            "role": "Draft emails, simple content"
        },
        "webmaster_mgr": {
            "model": "claude-3-5-sonnet",
            "provider": "anthropic",
            "cost_per_1k_tokens": 0.03,   # $3 per million
            "role": "Complex decisions, negotiations"
        }
    }

class AgentPool:
    def __init__(self):
        self.agents = {}
        self._initialize_agents()

    def _initialize_agents(self):
        """Initialize all agents with their models"""
        for name, config in AgentConfig.agents.items():
            if config["provider"] == "anthropic":
                self.agents[name] = AnthropicAgent(config)
            elif config["provider"] == "deepseek":
                self.agents[name] = DeepSeekAgent(config)
            elif config["provider"] == "minimax":
                self.agents[name] = MiniMaxAgent(config)

    def get_agent(self, name: str):
        return self.agents[name]

class AnthropicAgent:
    def __init__(self, config):
        self.client = Anthropic()
        self.model = config["model"]
        self.cost_per_1k = config["cost_per_1k_tokens"]
        self.tokens_used = 0

    async def generate(self, prompt: str) -> str:
        response = self.client.messages.create(
            model=self.model,
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}]
        )
        self.tokens_used += response.usage.input_tokens + response.usage.output_tokens
        return response.content[0].text

    def get_cost(self) -> float:
        return (self.tokens_used / 1000) * self.cost_per_1k

class DeepSeekAgent:
    def __init__(self, config):
        self.api_key = os.environ.get("DEEPSEEK_API_KEY")
        self.model = config["model"]
        self.cost_per_1k = config["cost_per_1k_tokens"]
        self.tokens_used = 0

    async def generate(self, prompt: str) -> str:
        response = requests.post(
            "https://api.deepseek.com/v1/chat/completions",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={
                "model": self.model,
                "messages": [{"role": "user", "content": prompt}]
            }
        )
        usage = response.json()["usage"]
        self.tokens_used += usage["total_tokens"]
        return response.json()["choices"][0]["message"]["content"]

    def get_cost(self) -> float:
        return (self.tokens_used / 1000) * self.cost_per_1k

When I compare costs for 1000 tasks:

Single Claude Sonnet for all tasks:
- 1000 tasks * 2k tokens average * $0.03/1k = $60/month

Multi-agent with model matching:
- 600 simple tasks * MiniMax * $0.002/1k = $1.20
- 300 draft tasks * Haiku * $0.008/1k = $2.40
- 100 complex tasks * Sonnet * $0.03/1k = $3.00
- Classification overhead: $0.50
- Total: $7.10/month

Savings: 88% cost reduction

ChromaDB Vector Store Setup

I set up ChromaDB as the vector storage backend:

import chromadb
from chromadb.config import Settings

class VectorStore:
    def __init__(self, path: str = "./chroma_db"):
        self.client = chromadb.PersistentClient(
            path=path,
            settings=Settings(
                anonymized_telemetry=False,
                allow_reset=True
            )
        )
        self.collection = self.client.get_or_create_collection(
            name="agent_memory",
            metadata={"description": "Multi-agent persistent memory"}
        )

    async def add_memory(self, id: str, content: str, metadata: dict):
        """Add memory entry with embeddings"""
        self.collection.add(
            documents=[content],
            metadatas=[metadata],
            ids=[id]
        )

    async def search_memory(self, query: str, n_results: int = 10) -> list:
        """Semantic search for relevant memories"""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results
        )
        return results

    async def get_by_user(self, user_id: str) -> list:
        """Get all memories for a user"""
        results = self.collection.get(
            where={"user_id": user_id}
        )
        return results

# Initialize
vector_store = VectorStore()

# Add organizational memory (shared rules)
await vector_store.add_memory(
    id="org_rule_1",
    content="Always confirm budget approvals with finance team before execution",
    metadata={"layer": "org", "type": "policy"}
)

# Semantic search
results = await vector_store.search_memory(
    query="How do I handle budget requests?",
    n_results=5
)

Complete System Integration

I integrated all components into a complete system:

from agent_router import AgentRouter
from memory_config import MemoryManager, MemoryLayer
from agents_config import AgentPool
from chroma_setup import VectorStore

class MultiAgentSystem:
    def __init__(self):
        # Initialize components
        self.vector_store = VectorStore()
        self.memory = MemoryManager(self.vector_store)
        self.agent_pool = AgentPool()
        self.router = AgentRouter(self.memory, self.agent_pool.agents)

        # Load organizational rules
        self._load_org_memory()

    def _load_org_memory(self):
        """Load shared organizational memory"""
        org_rules = [
            "Budget approvals require finance team confirmation",
            "All client communications must be logged",
            "Meeting times preferred: 9am-11am, 2pm-4pm",
            "Response SLA: 24 hours for emails, 2 hours for Slack"
        ]

        for i, rule in enumerate(org_rules):
            await self.vector_store.add_memory(
                id=f"org_rule_{i}",
                content=rule,
                metadata={"layer": "org", "type": "policy"}
            )

    async def process_task(self, user_id: str, task: str) -> dict:
        """Process a task through the multi-agent system"""
        result = await self.router.invoke({
            "input": task,
            "user_id": user_id
        })

        return {
            "result": result["result"],
            "task_type": result["task_type"].value,
            "cost": result["cost"],
            "context_used": len(result["context"])
        }

    async def add_user_preference(self, user_id: str, preference: str):
        """Store user preference in memory"""
        await self.memory.store_fact(
            user_id=user_id,
            fact=preference,
            layer=MemoryLayer.USER
        )

# Usage
system = MultiAgentSystem()

# Add user preference
await system.add_user_preference(
    user_id="john_123",
    preference="John prefers morning meetings between 9am and 11am"
)

# Process task
result = await system.process_task(
    user_id="john_123",
    task="Schedule a meeting with the client team about the new project"
)

# Output:
# {
#   "result": "Meeting scheduled for 9:30am tomorrow with client team...",
#   "task_type": "simple",
#   "cost": 0.002,
#   "context_used": 3
# }

Real-World Example: SEO Agency

I found a Reddit post about an SEO agency running 5 agents with this architecture:

Agent Roles:
- Steve (DeepSeek V3.2): Main agent, handles WhatsApp & Slack daily ops
- ops-engine (MiniMax): Bulk tasks, batch SEO operations
- outreach-qa (Claude Haiku): Draft outreach emails, review content
- webmaster-mgr (Claude Sonnet): Negotiate with webmasters, technical decisions

Memory Facts Stored:
- "Client X prefers weekly reports"
- "Budget limit for outreach: $500/month"
- "Webmaster Y negotiated $200 for link placement"
- "Previous successful outreach template: [stored]"

Monthly Cost: ~$20-30 (vs $200+ for single premium model)

The agency reported: “We use Mem0 + ChromaDB - every decision, rule, and preference gets stored as a searchable fact.”

Common Mistakes I Made

1. Using one model for everything
   - Mistake: Claude Sonnet for simple classification
   - Fix: Use DeepSeek for routing, Haiku for drafts

2. Ignoring memory architecture
   - Mistake: All memories in one bucket
   - Fix: Layer by lifetime (conversation, session, user, org)

3. No agent routing logic
   - Mistake: Random agent assignment
   - Fix: Classification step before dispatch

4. Over-engineering first iteration
   - Mistake: 10 agents on day one
   - Fix: Start with 3 agents, add as needed

5. Storing secrets in memory
   - Mistake: API keys stored in vector DB
   - Fix: Only store business facts, use env vars for secrets

Summary

In this post, I showed how to build multi-agent systems with persistent memory for business automation. The key point is combining LangGraph for orchestration, Mem0 for memory management, and matching models to task complexity.

The architecture uses three layers: Orchestration (LangGraph router), Memory (Mem0 with ChromaDB backend), and Storage (ChromaDB vector embeddings). I configured four memory layers: Conversation (single response), Session (workflow context), User (preferences), and Organizational (shared rules).

For model matching, I use DeepSeek for classification ($0.0014/1k), MiniMax for operations ($0.002/1k), Claude Haiku for drafts ($0.008/1k), and Claude Sonnet only for complex decisions ($0.03/1k). This reduced my monthly AI costs from $200 to ~$20-30.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: SEO Agency Multi-Agent System
👨‍💻 LangGraph Documentation
👨‍💻 Mem0 Documentation
👨‍💻 ChromaDB Documentation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!