How to Set Up ChromaDB Vector Database for AI Memory System
Purpose
This post shows how to set up ChromaDB as an AI memory system with semantic search for persistent context across conversations.
Problem
I wanted my AI assistant to remember things. Every new session started blank. I had to re-explain my preferences, previous decisions, and context. This was frustrating.
Here’s what I experienced:
Day 1:Me: "Help me configure the Roborock API"AI: "Sure, I'll help you with that..."[30 minutes of configuration]
Day 2:Me: "What did we decide about the Roborock API?"AI: "I don't have access to our previous conversation..."[Re-explain everything again]Key challenges I faced:
- No persistence: Conversations die when sessions end
- Context limits: LLMs have finite context windows (4K-200K tokens)
- Retrieval difficulty: Keyword search fails on semantic similarity
- Memory organization: Raw logs vs. curated insights vs. searchable knowledge
I needed a way to store and retrieve memories semantically.
Solution
I implemented a three-tier memory architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐│ Daily Markdown │────▶│ MEMORY.md │────▶│ ChromaDB ││ Logs │ │ (Curated) │ │ (Semantic) ││ │ │ │ │ ││ Raw session │ │ Long-term │ │ 1,078+ chunks ││ notes │ │ memory │ │ vectorized │└─────────────────┘ └─────────────────┘ └─────────────────┘Tier 1: Daily Markdown Logs
- Raw session notes
- Quick capture, no filtering
Tier 2: MEMORY.md
- Curated long-term memory
- Private sessions only
- Human-reviewed
Tier 3: ChromaDB Vector Database
- Semantic chunks (1,078+)
- multilingual-e5-small embeddings
- Natural language queries
Environment
- Python 3.10+
- ChromaDB 0.4+
- sentence-transformers for embeddings
- PostgreSQL for metadata (optional)
Installing ChromaDB
First, I installed the required packages:
chromadb>=0.4.0sentence-transformers>=2.2.0pip install -r requirements.txtCreating the Memory System
I created a memory system class:
import chromadbfrom chromadb.config import Settingsfrom sentence_transformers import SentenceTransformerfrom datetime import datetimefrom typing import Optional
class AIMemorySystem: def __init__(self, persist_directory: str = "./chromadb"): # Initialize ChromaDB with persistent storage self.client = chromadb.PersistentClient(path=persist_directory)
# Use multilingual-e5-small for embeddings self.embedder = SentenceTransformer('intfloat/multilingual-e5-small')
# Create or get the collection self.collection = self.client.get_or_create_collection( name="conversation_memory", metadata={"description": "AI conversation memory"} )
def add_memory( self, text: str, metadata: Optional[dict] = None ) -> str: """Add a memory chunk to the database."""
if metadata is None: metadata = {}
# Generate unique ID memory_id = f"mem_{datetime.now().strftime('%Y%m%d_%H%M%S_%f')}"
# Add timestamp to metadata metadata["created_at"] = datetime.now().isoformat()
# Add to collection (ChromaDB handles embedding internally # if we pass text, but we can also pass embeddings directly) self.collection.add( documents=[text], metadatas=[metadata], ids=[memory_id] )
return memory_id
def query_memory( self, query: str, n_results: int = 5 ) -> list[dict]: """Query memory with natural language."""
results = self.collection.query( query_texts=[query], n_results=n_results )
# Format results memories = [] for i, doc in enumerate(results["documents"][0]): memories.append({ "text": doc, "metadata": results["metadatas"][0][i], "id": results["ids"][0][i], "distance": results["distances"][0][i] })
return memories
def get_memory_count(self) -> int: """Get total number of stored memories.""" return self.collection.count()Semantic Chunking
Raw transcripts don’t work well. I learned to chunk semantically:
from dataclasses import dataclassfrom typing import Listimport re
@dataclassclass SemanticChunk: text: str topic: str timestamp: str importance: float # 0.0 to 1.0
def create_semantic_chunks( conversation: str, min_chunk_size: int = 100, max_chunk_size: int = 500) -> List[SemanticChunk]: """Split conversation into semantic chunks."""
chunks = []
# Split by topic changes or time gaps sections = re.split(r'\n---+\n|\n{3,}', conversation)
for section in sections: section = section.strip() if not section: continue
# Skip if too short if len(section) < min_chunk_size: continue
# Split if too long if len(section) > max_chunk_size: # Split by sentences while preserving meaning sentences = section.split('. ') current_chunk = ""
for sentence in sentences: if len(current_chunk) + len(sentence) < max_chunk_size: current_chunk += sentence + ". " else: if current_chunk: chunks.append(create_chunk(current_chunk)) current_chunk = sentence + ". "
if current_chunk: chunks.append(create_chunk(current_chunk)) else: chunks.append(create_chunk(section))
return chunks
def create_chunk(text: str) -> SemanticChunk: """Create a semantic chunk with metadata.""" return SemanticChunk( text=text, topic=extract_topic(text), timestamp=datetime.now().isoformat(), importance=calculate_importance(text) )
def extract_topic(text: str) -> str: """Extract main topic from text.""" # Simple keyword extraction # In production, use NLP or LLM keywords = ["API", "configuration", "bug", "feature", "database"] for keyword in keywords: if keyword.lower() in text.lower(): return keyword return "general"
def calculate_importance(text: str) -> float: """Calculate importance score.""" # Heuristics for importance importance = 0.5
# Contains decision keywords if any(word in text.lower() for word in ["decided", "resolved", "fixed"]): importance += 0.2
# Contains technical details if any(word in text for word in ["error:", "success:", "http://"]): importance += 0.1
# Contains action items if "todo:" in text.lower() or "action:" in text.lower(): importance += 0.15
return min(importance, 1.0)Adding Memories from Conversations
I built a function to process and store conversations:
from memory_system import AIMemorySystemfrom chunking import create_semantic_chunks
def process_conversation( memory: AIMemorySystem, conversation: str, session_type: str = "normal"): """Process and store conversation in memory."""
# Create semantic chunks chunks = create_semantic_chunks(conversation)
# Add each chunk to memory for chunk in chunks: memory.add_memory( text=chunk.text, metadata={ "topic": chunk.topic, "importance": chunk.importance, "session_type": session_type, "timestamp": chunk.timestamp } )
return len(chunks)
# Usagememory = AIMemorySystem(persist_directory="./ai_memory")
conversation = """Discussed the Roborock API integration.Decided to use REST API instead of MQTT.Error: Connection timeout at port 8080.Fixed by adding retry logic with exponential backoff."""
chunks_added = process_conversation(memory, conversation)print(f"Added {chunks_added} memory chunks")Querying Memories
The power of semantic search shows when I query:
memory = AIMemorySystem(persist_directory="./ai_memory")
# Example queriesqueries = [ "What did we decide about the Roborock API last week?", "Remind me of all the Hyper-V networking lessons we learned", "How did we fix the Grafana dashboard bug?", "Show me configuration issues we encountered"]
for query in queries: print(f"\nQuery: {query}") results = memory.query_memory(query, n_results=3)
for i, result in enumerate(results): print(f" [{i+1}] {result['text'][:100]}...") print(f" Distance: {result['distance']:.4f}") print(f" Topic: {result['metadata'].get('topic')}")Output:
Query: What did we decide about the Roborock API last week? [1] Decided to use REST API instead of MQTT for Roborock integration... Distance: 0.2341 Topic: API [2] Roborock API connection issues resolved with retry logic... Distance: 0.2891 Topic: configuration
Query: How did we fix the Grafana dashboard bug? [1] Fixed Grafana dashboard bug by updating the query syntax... Distance: 0.1523 Topic: bugMetadata-Enhanced Queries
I filter by metadata for better results:
def query_with_filters( memory: AIMemorySystem, query: str, topic: Optional[str] = None, min_importance: Optional[float] = None, n_results: int = 5) -> list[dict]: """Query with metadata filters."""
where_filter = {}
if topic: where_filter["topic"] = topic
if min_importance: where_filter["importance"] = {"$gte": min_importance}
results = memory.collection.query( query_texts=[query], n_results=n_results, where=where_filter if where_filter else None )
return format_results(results)
# Usage: Find only high-importance API discussionsresults = query_with_filters( memory, query="API configuration", topic="API", min_importance=0.7)Three-Tier Memory Integration
I integrated all three tiers:
class ThreeTierMemory: def __init__(self, base_path: str): self.vector_db = AIMemorySystem(f"{base_path}/chromadb") self.memory_md = f"{base_path}/MEMORY.md" self.logs_path = f"{base_path}/logs"
def store_session(self, session: dict, is_private: bool = False): """Store session in all three tiers."""
# Tier 1: Daily log (always) self._write_daily_log(session)
# Tier 2: Curated memory (private only) if is_private: self._update_memory_md(session)
# Tier 3: Vector DB (always) chunks = create_semantic_chunks(session["content"]) for chunk in chunks: self.vector_db.add_memory( text=chunk.text, metadata={ "topic": chunk.topic, "importance": chunk.importance, "session_type": "private" if is_private else "normal", "date": session["date"] } )
def recall(self, query: str) -> dict: """Retrieve relevant context from memory."""
# Get semantic matches semantic_results = self.vector_db.query_memory(query, n_results=10)
# Get recent curated memory curated = self._read_memory_md()
return { "semantic_matches": semantic_results, "curated_memory": curated, "query": query }
def _write_daily_log(self, session: dict): """Write raw session to daily log.""" date_str = datetime.now().strftime("%Y-%m-%d") log_file = f"{self.logs_path}/{date_str}.md"
with open(log_file, "a") as f: f.write(f"\n## {session['time']}\n\n") f.write(session["content"])
def _update_memory_md(self, session: dict): """Update curated long-term memory.""" with open(self.memory_md, "a") as f: f.write(f"\n### {session['date']}\n\n") f.write(session["summary"])
def _read_memory_md(self) -> str: """Read curated memory.""" try: with open(self.memory_md, "r") as f: return f.read() except FileNotFoundError: return ""Common Mistakes
I made these mistakes. Don’t repeat them:
1. Storing raw transcripts instead of semantic chunks
# WRONG: Storing entire conversationmemory.add_memory(entire_conversation_transcript)
# CORRECT: Store semantic chunkschunks = create_semantic_chunks(conversation)for chunk in chunks: memory.add_memory(chunk.text, chunk.metadata)2. Ignoring metadata
# WRONG: No metadatamemory.add_memory(text)
# CORRECT: Rich metadatamemory.add_memory(text, { "topic": "configuration", "importance": 0.8, "date": "2026-03-23", "session_type": "debugging"})3. Using wrong embedding model
# WRONG: English-only modelembedder = SentenceTransformer('all-MiniLM-L6-v2')
# CORRECT: Multilingual supportembedder = SentenceTransformer('intfloat/multilingual-e5-small')4. Not separating curated memory from raw logs
# WRONG: Everything in one placestore_to_database(all_conversations)
# CORRECT: Three-tier systemdaily_log.store(raw_session)memory_md.store(curated_summary) # Private onlyvector_db.store(semantic_chunks)5. Querying without context limits
# WRONG: No limitresults = memory.query_memory(query)
# CORRECT: Reasonable limitresults = memory.query_memory(query, n_results=10)# Then filter by relevance thresholdrelevant = [r for r in results if r["distance"] < 0.3]Summary
In this post, I showed how to set up ChromaDB as an AI memory system. The key points are:
- Three-tier architecture: daily logs for raw capture, MEMORY.md for curated insights, ChromaDB for semantic search
- multilingual-e5-small embeddings handle multiple languages effectively
- Semantic chunking produces better retrieval than raw transcripts
- Metadata filtering enables precise memory queries
- 1,078+ chunks searchable in milliseconds
After 50 days of running this system, my AI assistant remembers decisions from weeks ago. I can ask “What did we decide about the Roborock API?” and get relevant context immediately.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit: I gave my home a brain. Here's what 50 days of self-hosted AI looks like
- 👨💻 ChromaDB Documentation
- 👨💻 multilingual-e5-small on HuggingFace
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments