Skip to content

How to Set Up ChromaDB Vector Database for AI Memory System

Purpose

This post shows how to set up ChromaDB as an AI memory system with semantic search for persistent context across conversations.

Problem

I wanted my AI assistant to remember things. Every new session started blank. I had to re-explain my preferences, previous decisions, and context. This was frustrating.

Here’s what I experienced:

Day 1:
Me: "Help me configure the Roborock API"
AI: "Sure, I'll help you with that..."
[30 minutes of configuration]
Day 2:
Me: "What did we decide about the Roborock API?"
AI: "I don't have access to our previous conversation..."
[Re-explain everything again]

Key challenges I faced:

  • No persistence: Conversations die when sessions end
  • Context limits: LLMs have finite context windows (4K-200K tokens)
  • Retrieval difficulty: Keyword search fails on semantic similarity
  • Memory organization: Raw logs vs. curated insights vs. searchable knowledge

I needed a way to store and retrieve memories semantically.

Solution

I implemented a three-tier memory architecture:

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Daily Markdown │────▶│ MEMORY.md │────▶│ ChromaDB │
│ Logs │ │ (Curated) │ │ (Semantic) │
│ │ │ │ │ │
│ Raw session │ │ Long-term │ │ 1,078+ chunks │
│ notes │ │ memory │ │ vectorized │
└─────────────────┘ └─────────────────┘ └─────────────────┘

Tier 1: Daily Markdown Logs

  • Raw session notes
  • Quick capture, no filtering

Tier 2: MEMORY.md

  • Curated long-term memory
  • Private sessions only
  • Human-reviewed

Tier 3: ChromaDB Vector Database

  • Semantic chunks (1,078+)
  • multilingual-e5-small embeddings
  • Natural language queries

Environment

  • Python 3.10+
  • ChromaDB 0.4+
  • sentence-transformers for embeddings
  • PostgreSQL for metadata (optional)

Installing ChromaDB

First, I installed the required packages:

requirements.txt
chromadb>=0.4.0
sentence-transformers>=2.2.0
Terminal window
pip install -r requirements.txt

Creating the Memory System

I created a memory system class:

memory_system.py
import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer
from datetime import datetime
from typing import Optional
class AIMemorySystem:
def __init__(self, persist_directory: str = "./chromadb"):
# Initialize ChromaDB with persistent storage
self.client = chromadb.PersistentClient(path=persist_directory)
# Use multilingual-e5-small for embeddings
self.embedder = SentenceTransformer('intfloat/multilingual-e5-small')
# Create or get the collection
self.collection = self.client.get_or_create_collection(
name="conversation_memory",
metadata={"description": "AI conversation memory"}
)
def add_memory(
self,
text: str,
metadata: Optional[dict] = None
) -> str:
"""Add a memory chunk to the database."""
if metadata is None:
metadata = {}
# Generate unique ID
memory_id = f"mem_{datetime.now().strftime('%Y%m%d_%H%M%S_%f')}"
# Add timestamp to metadata
metadata["created_at"] = datetime.now().isoformat()
# Add to collection (ChromaDB handles embedding internally
# if we pass text, but we can also pass embeddings directly)
self.collection.add(
documents=[text],
metadatas=[metadata],
ids=[memory_id]
)
return memory_id
def query_memory(
self,
query: str,
n_results: int = 5
) -> list[dict]:
"""Query memory with natural language."""
results = self.collection.query(
query_texts=[query],
n_results=n_results
)
# Format results
memories = []
for i, doc in enumerate(results["documents"][0]):
memories.append({
"text": doc,
"metadata": results["metadatas"][0][i],
"id": results["ids"][0][i],
"distance": results["distances"][0][i]
})
return memories
def get_memory_count(self) -> int:
"""Get total number of stored memories."""
return self.collection.count()

Semantic Chunking

Raw transcripts don’t work well. I learned to chunk semantically:

chunking.py
from dataclasses import dataclass
from typing import List
import re
@dataclass
class SemanticChunk:
text: str
topic: str
timestamp: str
importance: float # 0.0 to 1.0
def create_semantic_chunks(
conversation: str,
min_chunk_size: int = 100,
max_chunk_size: int = 500
) -> List[SemanticChunk]:
"""Split conversation into semantic chunks."""
chunks = []
# Split by topic changes or time gaps
sections = re.split(r'\n---+\n|\n{3,}', conversation)
for section in sections:
section = section.strip()
if not section:
continue
# Skip if too short
if len(section) < min_chunk_size:
continue
# Split if too long
if len(section) > max_chunk_size:
# Split by sentences while preserving meaning
sentences = section.split('. ')
current_chunk = ""
for sentence in sentences:
if len(current_chunk) + len(sentence) < max_chunk_size:
current_chunk += sentence + ". "
else:
if current_chunk:
chunks.append(create_chunk(current_chunk))
current_chunk = sentence + ". "
if current_chunk:
chunks.append(create_chunk(current_chunk))
else:
chunks.append(create_chunk(section))
return chunks
def create_chunk(text: str) -> SemanticChunk:
"""Create a semantic chunk with metadata."""
return SemanticChunk(
text=text,
topic=extract_topic(text),
timestamp=datetime.now().isoformat(),
importance=calculate_importance(text)
)
def extract_topic(text: str) -> str:
"""Extract main topic from text."""
# Simple keyword extraction
# In production, use NLP or LLM
keywords = ["API", "configuration", "bug", "feature", "database"]
for keyword in keywords:
if keyword.lower() in text.lower():
return keyword
return "general"
def calculate_importance(text: str) -> float:
"""Calculate importance score."""
# Heuristics for importance
importance = 0.5
# Contains decision keywords
if any(word in text.lower() for word in ["decided", "resolved", "fixed"]):
importance += 0.2
# Contains technical details
if any(word in text for word in ["error:", "success:", "http://"]):
importance += 0.1
# Contains action items
if "todo:" in text.lower() or "action:" in text.lower():
importance += 0.15
return min(importance, 1.0)

Adding Memories from Conversations

I built a function to process and store conversations:

conversation_processor.py
from memory_system import AIMemorySystem
from chunking import create_semantic_chunks
def process_conversation(
memory: AIMemorySystem,
conversation: str,
session_type: str = "normal"
):
"""Process and store conversation in memory."""
# Create semantic chunks
chunks = create_semantic_chunks(conversation)
# Add each chunk to memory
for chunk in chunks:
memory.add_memory(
text=chunk.text,
metadata={
"topic": chunk.topic,
"importance": chunk.importance,
"session_type": session_type,
"timestamp": chunk.timestamp
}
)
return len(chunks)
# Usage
memory = AIMemorySystem(persist_directory="./ai_memory")
conversation = """
Discussed the Roborock API integration.
Decided to use REST API instead of MQTT.
Error: Connection timeout at port 8080.
Fixed by adding retry logic with exponential backoff.
"""
chunks_added = process_conversation(memory, conversation)
print(f"Added {chunks_added} memory chunks")

Querying Memories

The power of semantic search shows when I query:

query_examples.py
memory = AIMemorySystem(persist_directory="./ai_memory")
# Example queries
queries = [
"What did we decide about the Roborock API last week?",
"Remind me of all the Hyper-V networking lessons we learned",
"How did we fix the Grafana dashboard bug?",
"Show me configuration issues we encountered"
]
for query in queries:
print(f"\nQuery: {query}")
results = memory.query_memory(query, n_results=3)
for i, result in enumerate(results):
print(f" [{i+1}] {result['text'][:100]}...")
print(f" Distance: {result['distance']:.4f}")
print(f" Topic: {result['metadata'].get('topic')}")

Output:

Query: What did we decide about the Roborock API last week?
[1] Decided to use REST API instead of MQTT for Roborock integration...
Distance: 0.2341
Topic: API
[2] Roborock API connection issues resolved with retry logic...
Distance: 0.2891
Topic: configuration
Query: How did we fix the Grafana dashboard bug?
[1] Fixed Grafana dashboard bug by updating the query syntax...
Distance: 0.1523
Topic: bug

Metadata-Enhanced Queries

I filter by metadata for better results:

filtered_queries.py
def query_with_filters(
memory: AIMemorySystem,
query: str,
topic: Optional[str] = None,
min_importance: Optional[float] = None,
n_results: int = 5
) -> list[dict]:
"""Query with metadata filters."""
where_filter = {}
if topic:
where_filter["topic"] = topic
if min_importance:
where_filter["importance"] = {"$gte": min_importance}
results = memory.collection.query(
query_texts=[query],
n_results=n_results,
where=where_filter if where_filter else None
)
return format_results(results)
# Usage: Find only high-importance API discussions
results = query_with_filters(
memory,
query="API configuration",
topic="API",
min_importance=0.7
)

Three-Tier Memory Integration

I integrated all three tiers:

three_tier_memory.py
class ThreeTierMemory:
def __init__(self, base_path: str):
self.vector_db = AIMemorySystem(f"{base_path}/chromadb")
self.memory_md = f"{base_path}/MEMORY.md"
self.logs_path = f"{base_path}/logs"
def store_session(self, session: dict, is_private: bool = False):
"""Store session in all three tiers."""
# Tier 1: Daily log (always)
self._write_daily_log(session)
# Tier 2: Curated memory (private only)
if is_private:
self._update_memory_md(session)
# Tier 3: Vector DB (always)
chunks = create_semantic_chunks(session["content"])
for chunk in chunks:
self.vector_db.add_memory(
text=chunk.text,
metadata={
"topic": chunk.topic,
"importance": chunk.importance,
"session_type": "private" if is_private else "normal",
"date": session["date"]
}
)
def recall(self, query: str) -> dict:
"""Retrieve relevant context from memory."""
# Get semantic matches
semantic_results = self.vector_db.query_memory(query, n_results=10)
# Get recent curated memory
curated = self._read_memory_md()
return {
"semantic_matches": semantic_results,
"curated_memory": curated,
"query": query
}
def _write_daily_log(self, session: dict):
"""Write raw session to daily log."""
date_str = datetime.now().strftime("%Y-%m-%d")
log_file = f"{self.logs_path}/{date_str}.md"
with open(log_file, "a") as f:
f.write(f"\n## {session['time']}\n\n")
f.write(session["content"])
def _update_memory_md(self, session: dict):
"""Update curated long-term memory."""
with open(self.memory_md, "a") as f:
f.write(f"\n### {session['date']}\n\n")
f.write(session["summary"])
def _read_memory_md(self) -> str:
"""Read curated memory."""
try:
with open(self.memory_md, "r") as f:
return f.read()
except FileNotFoundError:
return ""

Common Mistakes

I made these mistakes. Don’t repeat them:

1. Storing raw transcripts instead of semantic chunks

# WRONG: Storing entire conversation
memory.add_memory(entire_conversation_transcript)
# CORRECT: Store semantic chunks
chunks = create_semantic_chunks(conversation)
for chunk in chunks:
memory.add_memory(chunk.text, chunk.metadata)

2. Ignoring metadata

# WRONG: No metadata
memory.add_memory(text)
# CORRECT: Rich metadata
memory.add_memory(text, {
"topic": "configuration",
"importance": 0.8,
"date": "2026-03-23",
"session_type": "debugging"
})

3. Using wrong embedding model

# WRONG: English-only model
embedder = SentenceTransformer('all-MiniLM-L6-v2')
# CORRECT: Multilingual support
embedder = SentenceTransformer('intfloat/multilingual-e5-small')

4. Not separating curated memory from raw logs

# WRONG: Everything in one place
store_to_database(all_conversations)
# CORRECT: Three-tier system
daily_log.store(raw_session)
memory_md.store(curated_summary) # Private only
vector_db.store(semantic_chunks)

5. Querying without context limits

# WRONG: No limit
results = memory.query_memory(query)
# CORRECT: Reasonable limit
results = memory.query_memory(query, n_results=10)
# Then filter by relevance threshold
relevant = [r for r in results if r["distance"] < 0.3]

Summary

In this post, I showed how to set up ChromaDB as an AI memory system. The key points are:

  • Three-tier architecture: daily logs for raw capture, MEMORY.md for curated insights, ChromaDB for semantic search
  • multilingual-e5-small embeddings handle multiple languages effectively
  • Semantic chunking produces better retrieval than raw transcripts
  • Metadata filtering enables precise memory queries
  • 1,078+ chunks searchable in milliseconds

After 50 days of running this system, my AI assistant remembers decisions from weeks ago. I can ask “What did we decide about the Roborock API?” and get relevant context immediately.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments