How to Set Up ChromaDB Vector Database for AI Memory System

Mar 23, 2026

Purpose

This post shows how to set up ChromaDB as an AI memory system with semantic search for persistent context across conversations.

Problem

I wanted my AI assistant to remember things. Every new session started blank. I had to re-explain my preferences, previous decisions, and context. This was frustrating.

Here’s what I experienced:

Day 1:
Me: "Help me configure the Roborock API"
AI: "Sure, I'll help you with that..."
[30 minutes of configuration]

Day 2:
Me: "What did we decide about the Roborock API?"
AI: "I don't have access to our previous conversation..."
[Re-explain everything again]

Key challenges I faced:

No persistence: Conversations die when sessions end
Context limits: LLMs have finite context windows (4K-200K tokens)
Retrieval difficulty: Keyword search fails on semantic similarity
Memory organization: Raw logs vs. curated insights vs. searchable knowledge

I needed a way to store and retrieve memories semantically.

Solution

I implemented a three-tier memory architecture:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Daily Markdown │────▶│   MEMORY.md     │────▶│    ChromaDB     │
│      Logs       │     │  (Curated)      │     │  (Semantic)     │
│                 │     │                 │     │                 │
│  Raw session    │     │  Long-term      │     │  1,078+ chunks  │
│  notes          │     │  memory        │     │  vectorized     │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Tier 1: Daily Markdown Logs

Raw session notes
Quick capture, no filtering

Tier 2: MEMORY.md

Curated long-term memory
Private sessions only
Human-reviewed

Tier 3: ChromaDB Vector Database

Semantic chunks (1,078+)
multilingual-e5-small embeddings
Natural language queries

Environment

Python 3.10+
ChromaDB 0.4+
sentence-transformers for embeddings
PostgreSQL for metadata (optional)

Installing ChromaDB

First, I installed the required packages:

chromadb>=0.4.0
sentence-transformers>=2.2.0

pip install -r requirements.txt

Creating the Memory System

I created a memory system class:

import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer
from datetime import datetime
from typing import Optional

class AIMemorySystem:
    def __init__(self, persist_directory: str = "./chromadb"):
        # Initialize ChromaDB with persistent storage
        self.client = chromadb.PersistentClient(path=persist_directory)

        # Use multilingual-e5-small for embeddings
        self.embedder = SentenceTransformer('intfloat/multilingual-e5-small')

        # Create or get the collection
        self.collection = self.client.get_or_create_collection(
            name="conversation_memory",
            metadata={"description": "AI conversation memory"}
        )

    def add_memory(
        self,
        text: str,
        metadata: Optional[dict] = None
    ) -> str:
        """Add a memory chunk to the database."""

        if metadata is None:
            metadata = {}

        # Generate unique ID
        memory_id = f"mem_{datetime.now().strftime('%Y%m%d_%H%M%S_%f')}"

        # Add timestamp to metadata
        metadata["created_at"] = datetime.now().isoformat()

        # Add to collection (ChromaDB handles embedding internally
        # if we pass text, but we can also pass embeddings directly)
        self.collection.add(
            documents=[text],
            metadatas=[metadata],
            ids=[memory_id]
        )

        return memory_id

    def query_memory(
        self,
        query: str,
        n_results: int = 5
    ) -> list[dict]:
        """Query memory with natural language."""

        results = self.collection.query(
            query_texts=[query],
            n_results=n_results
        )

        # Format results
        memories = []
        for i, doc in enumerate(results["documents"][0]):
            memories.append({
                "text": doc,
                "metadata": results["metadatas"][0][i],
                "id": results["ids"][0][i],
                "distance": results["distances"][0][i]
            })

        return memories

    def get_memory_count(self) -> int:
        """Get total number of stored memories."""
        return self.collection.count()

Semantic Chunking

Raw transcripts don’t work well. I learned to chunk semantically:

from dataclasses import dataclass
from typing import List
import re

@dataclass
class SemanticChunk:
    text: str
    topic: str
    timestamp: str
    importance: float  # 0.0 to 1.0

def create_semantic_chunks(
    conversation: str,
    min_chunk_size: int = 100,
    max_chunk_size: int = 500
) -> List[SemanticChunk]:
    """Split conversation into semantic chunks."""

    chunks = []

    # Split by topic changes or time gaps
    sections = re.split(r'\n---+\n|\n{3,}', conversation)

    for section in sections:
        section = section.strip()
        if not section:
            continue

        # Skip if too short
        if len(section) < min_chunk_size:
            continue

        # Split if too long
        if len(section) > max_chunk_size:
            # Split by sentences while preserving meaning
            sentences = section.split('. ')
            current_chunk = ""

            for sentence in sentences:
                if len(current_chunk) + len(sentence) < max_chunk_size:
                    current_chunk += sentence + ". "
                else:
                    if current_chunk:
                        chunks.append(create_chunk(current_chunk))
                    current_chunk = sentence + ". "

            if current_chunk:
                chunks.append(create_chunk(current_chunk))
        else:
            chunks.append(create_chunk(section))

    return chunks

def create_chunk(text: str) -> SemanticChunk:
    """Create a semantic chunk with metadata."""
    return SemanticChunk(
        text=text,
        topic=extract_topic(text),
        timestamp=datetime.now().isoformat(),
        importance=calculate_importance(text)
    )

def extract_topic(text: str) -> str:
    """Extract main topic from text."""
    # Simple keyword extraction
    # In production, use NLP or LLM
    keywords = ["API", "configuration", "bug", "feature", "database"]
    for keyword in keywords:
        if keyword.lower() in text.lower():
            return keyword
    return "general"

def calculate_importance(text: str) -> float:
    """Calculate importance score."""
    # Heuristics for importance
    importance = 0.5

    # Contains decision keywords
    if any(word in text.lower() for word in ["decided", "resolved", "fixed"]):
        importance += 0.2

    # Contains technical details
    if any(word in text for word in ["error:", "success:", "http://"]):
        importance += 0.1

    # Contains action items
    if "todo:" in text.lower() or "action:" in text.lower():
        importance += 0.15

    return min(importance, 1.0)

Adding Memories from Conversations

I built a function to process and store conversations:

from memory_system import AIMemorySystem
from chunking import create_semantic_chunks

def process_conversation(
    memory: AIMemorySystem,
    conversation: str,
    session_type: str = "normal"
):
    """Process and store conversation in memory."""

    # Create semantic chunks
    chunks = create_semantic_chunks(conversation)

    # Add each chunk to memory
    for chunk in chunks:
        memory.add_memory(
            text=chunk.text,
            metadata={
                "topic": chunk.topic,
                "importance": chunk.importance,
                "session_type": session_type,
                "timestamp": chunk.timestamp
            }
        )

    return len(chunks)

# Usage
memory = AIMemorySystem(persist_directory="./ai_memory")

conversation = """
Discussed the Roborock API integration.
Decided to use REST API instead of MQTT.
Error: Connection timeout at port 8080.
Fixed by adding retry logic with exponential backoff.
"""

chunks_added = process_conversation(memory, conversation)
print(f"Added {chunks_added} memory chunks")

Querying Memories

The power of semantic search shows when I query:

memory = AIMemorySystem(persist_directory="./ai_memory")

# Example queries
queries = [
    "What did we decide about the Roborock API last week?",
    "Remind me of all the Hyper-V networking lessons we learned",
    "How did we fix the Grafana dashboard bug?",
    "Show me configuration issues we encountered"
]

for query in queries:
    print(f"\nQuery: {query}")
    results = memory.query_memory(query, n_results=3)

    for i, result in enumerate(results):
        print(f"  [{i+1}] {result['text'][:100]}...")
        print(f"      Distance: {result['distance']:.4f}")
        print(f"      Topic: {result['metadata'].get('topic')}")

Output:

Query: What did we decide about the Roborock API last week?
  [1] Decided to use REST API instead of MQTT for Roborock integration...
      Distance: 0.2341
      Topic: API
  [2] Roborock API connection issues resolved with retry logic...
      Distance: 0.2891
      Topic: configuration

Query: How did we fix the Grafana dashboard bug?
  [1] Fixed Grafana dashboard bug by updating the query syntax...
      Distance: 0.1523
      Topic: bug

Metadata-Enhanced Queries

I filter by metadata for better results:

def query_with_filters(
    memory: AIMemorySystem,
    query: str,
    topic: Optional[str] = None,
    min_importance: Optional[float] = None,
    n_results: int = 5
) -> list[dict]:
    """Query with metadata filters."""

    where_filter = {}

    if topic:
        where_filter["topic"] = topic

    if min_importance:
        where_filter["importance"] = {"$gte": min_importance}

    results = memory.collection.query(
        query_texts=[query],
        n_results=n_results,
        where=where_filter if where_filter else None
    )

    return format_results(results)

# Usage: Find only high-importance API discussions
results = query_with_filters(
    memory,
    query="API configuration",
    topic="API",
    min_importance=0.7
)

Three-Tier Memory Integration

I integrated all three tiers:

class ThreeTierMemory:
    def __init__(self, base_path: str):
        self.vector_db = AIMemorySystem(f"{base_path}/chromadb")
        self.memory_md = f"{base_path}/MEMORY.md"
        self.logs_path = f"{base_path}/logs"

    def store_session(self, session: dict, is_private: bool = False):
        """Store session in all three tiers."""

        # Tier 1: Daily log (always)
        self._write_daily_log(session)

        # Tier 2: Curated memory (private only)
        if is_private:
            self._update_memory_md(session)

        # Tier 3: Vector DB (always)
        chunks = create_semantic_chunks(session["content"])
        for chunk in chunks:
            self.vector_db.add_memory(
                text=chunk.text,
                metadata={
                    "topic": chunk.topic,
                    "importance": chunk.importance,
                    "session_type": "private" if is_private else "normal",
                    "date": session["date"]
                }
            )

    def recall(self, query: str) -> dict:
        """Retrieve relevant context from memory."""

        # Get semantic matches
        semantic_results = self.vector_db.query_memory(query, n_results=10)

        # Get recent curated memory
        curated = self._read_memory_md()

        return {
            "semantic_matches": semantic_results,
            "curated_memory": curated,
            "query": query
        }

    def _write_daily_log(self, session: dict):
        """Write raw session to daily log."""
        date_str = datetime.now().strftime("%Y-%m-%d")
        log_file = f"{self.logs_path}/{date_str}.md"

        with open(log_file, "a") as f:
            f.write(f"\n## {session['time']}\n\n")
            f.write(session["content"])

    def _update_memory_md(self, session: dict):
        """Update curated long-term memory."""
        with open(self.memory_md, "a") as f:
            f.write(f"\n### {session['date']}\n\n")
            f.write(session["summary"])

    def _read_memory_md(self) -> str:
        """Read curated memory."""
        try:
            with open(self.memory_md, "r") as f:
                return f.read()
        except FileNotFoundError:
            return ""

Common Mistakes

I made these mistakes. Don’t repeat them:

1. Storing raw transcripts instead of semantic chunks

# WRONG: Storing entire conversation
memory.add_memory(entire_conversation_transcript)

# CORRECT: Store semantic chunks
chunks = create_semantic_chunks(conversation)
for chunk in chunks:
    memory.add_memory(chunk.text, chunk.metadata)

2. Ignoring metadata

# WRONG: No metadata
memory.add_memory(text)

# CORRECT: Rich metadata
memory.add_memory(text, {
    "topic": "configuration",
    "importance": 0.8,
    "date": "2026-03-23",
    "session_type": "debugging"
})

3. Using wrong embedding model

# WRONG: English-only model
embedder = SentenceTransformer('all-MiniLM-L6-v2')

# CORRECT: Multilingual support
embedder = SentenceTransformer('intfloat/multilingual-e5-small')

4. Not separating curated memory from raw logs

# WRONG: Everything in one place
store_to_database(all_conversations)

# CORRECT: Three-tier system
daily_log.store(raw_session)
memory_md.store(curated_summary)  # Private only
vector_db.store(semantic_chunks)

5. Querying without context limits

# WRONG: No limit
results = memory.query_memory(query)

# CORRECT: Reasonable limit
results = memory.query_memory(query, n_results=10)
# Then filter by relevance threshold
relevant = [r for r in results if r["distance"] < 0.3]

Summary

In this post, I showed how to set up ChromaDB as an AI memory system. The key points are:

Three-tier architecture: daily logs for raw capture, MEMORY.md for curated insights, ChromaDB for semantic search
multilingual-e5-small embeddings handle multiple languages effectively
Semantic chunking produces better retrieval than raw transcripts
Metadata filtering enables precise memory queries
1,078+ chunks searchable in milliseconds

After 50 days of running this system, my AI assistant remembers decisions from weeks ago. I can ask “What did we decide about the Roborock API?” and get relevant context immediately.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: I gave my home a brain. Here's what 50 days of self-hosted AI looks like
👨‍💻 ChromaDB Documentation
👨‍💻 multilingual-e5-small on HuggingFace

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!