Building Local AI Memory with SQLite and Ollama for Cross-Channel Search
Problem
When I work with AI agents, I face a frustrating limitation: they cannot remember my past conversations across different platforms. My Slack messages, Discord discussions, and email threads exist in separate silos. My AI assistant has no way to search through all of them together.
Cloud-based memory solutions exist, but they have problems:
| Cloud Memory | Local Memory |
|---|---|
| API costs per query | Free unlimited queries |
| Data leaves my machine | Data stays local |
| Requires internet | Works offline |
| Platform controls access | I control everything |
| Subscription costs | One-time setup |
I wanted to build a local AI memory system that syncs all my messaging channels into one SQLite database and gives my agent fast search access to everything.
What I Built
I built a two-layer search system:
- Keyword search using SQLite FTS5 (fast, exact matches)
- Semantic search using Ollama embeddings (conceptual similarity)
The architecture looks like this:
+------------------------------------------------------------------+| Data Sources || Messages | Files | Calendar | App Usage | Browser |+-----+------+----+----+-----+------+-----+-------+-------+--------+ | | | | | v v v v v+------------------------------------------------------------------+| Ingestion Pipeline || - Platform-specific connectors || - Normalization layer || - Deduplication |+----------------------------------+-------------------------------+ | v+------------------------------------------------------------------+| SQLite Storage || +--------------+ +--------------+ +----------------------+ || | messages | | events | | file_references | || | - id | | - id | | - id | || | - platform | | - type | | - path | || | - channel | | - timestamp | | - content_hash | || | - sender | | - metadata | | - last_accessed | || | - content | | | | | || | - timestamp | | | | | || +--------------+ +--------------+ +----------------------+ || || +----------------------------------------------------------+ || | FTS5 Virtual Tables (keyword search) | || +----------------------------------------------------------+ |+----------------------------------+-------------------------------+ | v+------------------------------------------------------------------+| Ollama Embeddings || +----------------------------------------------------------+ || | Model: nomic-embed-text (fast, 768 dimensions) | || | Stored in: embeddings table (id, vector blob) | || +----------------------------------------------------------+ |+----------------------------------+-------------------------------+ | v+------------------------------------------------------------------+| Search API || - Keyword search (FTS5) || - Semantic search (vector similarity) || - Hybrid search (combined) |+------------------------------------------------------------------+Step 1: Create SQLite Schema
First, I set up the SQLite database with three tables: one for messages, one for full-text search, and one for embeddings.
import sqlite3
conn = sqlite3.connect('memory.db')
# Main messages tableconn.execute(''' CREATE TABLE IF NOT EXISTS messages ( id TEXT PRIMARY KEY, platform TEXT NOT NULL, channel TEXT, sender TEXT, content TEXT NOT NULL, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP, metadata JSON )''')
# FTS5 for keyword searchconn.execute(''' CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5( id, platform, channel, sender, content, content='messages', content_rowid='rowid', tokenize='porter unicode61' )''')
# Embeddings storageconn.execute(''' CREATE TABLE IF NOT EXISTS embeddings ( message_id TEXT PRIMARY KEY, embedding BLOB NOT NULL, model TEXT DEFAULT 'nomic-embed-text', created_at DATETIME DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (message_id) REFERENCES messages(id) )''')
conn.commit()The FTS5 virtual table provides fast keyword search. The tokenize='porter unicode61' option enables stemming (finding βrunningβ when searching βrunβ).
Step 2: Generate Embeddings with Ollama
I use Ollama to generate embeddings locally. First, I installed the nomic-embed-text model:
ollama pull nomic-embed-textThen I created functions to generate and store embeddings:
import requestsimport struct
def get_embedding(text: str, model: str = "nomic-embed-text") -> bytes: """Get embedding from Ollama and store as binary blob.""" response = requests.post( 'http://localhost:11434/api/embeddings', json={'model': model, 'prompt': text} ) embedding = response.json()['embedding'] # Store as binary blob for efficiency return struct.pack(f'{len(embedding)}f', *embedding)
def store_embedding(conn, message_id: str, content: str): """Store embedding for a message.""" embedding_blob = get_embedding(content) conn.execute(''' INSERT OR REPLACE INTO embeddings(message_id, embedding) VALUES(?, ?) ''', (message_id, embedding_blob)) conn.commit()Storing embeddings as binary blobs instead of JSON saves significant space. A 768-dimensional float32 vector takes 3KB as binary vs 10KB+ as JSON.
Step 3: Implement Hybrid Search
The real power comes from combining keyword and semantic search:
import numpy as npimport structimport requests
def search_memory(conn, query: str, limit: int = 10): """Hybrid search combining FTS5 and vector similarity."""
# 1. FTS5 keyword search fts_results = conn.execute(''' SELECT m.id, m.content, m.platform, m.timestamp FROM messages m JOIN messages_fts fts ON m.id = fts.id WHERE messages_fts MATCH ? ORDER BY rank LIMIT ? ''', (query, limit)).fetchall()
# 2. Vector semantic search query_embedding = np.array( requests.post( 'http://localhost:11434/api/embeddings', json={'model': 'nomic-embed-text', 'prompt': query} ).json()['embedding'] )
# Load all embeddings and compute similarity vector_results = [] for row in conn.execute('SELECT message_id, embedding FROM embeddings'): stored_embedding = np.array( struct.unpack(f'{len(query_embedding)}f', row[1]) ) similarity = np.dot(query_embedding, stored_embedding) if similarity > 0.7: # threshold vector_results.append((row[0], similarity))
# 3. Merge and rank results return merge_results(fts_results, vector_results, limit)
def merge_results(fts_results, vector_results, limit): """Combine keyword and semantic results.""" seen = set() merged = []
# Keyword results first (higher precision) for row in fts_results: if row[0] not in seen: merged.append({ 'id': row[0], 'content': row[1], 'platform': row[2], 'timestamp': row[3], 'source': 'keyword' }) seen.add(row[0])
# Add semantic results for msg_id, similarity in sorted(vector_results, key=lambda x: -x[1]): if msg_id not in seen and len(merged) < limit: merged.append({ 'id': msg_id, 'similarity': similarity, 'source': 'semantic' }) seen.add(msg_id)
return merged[:limit]Performance Issues I Hit
When I first deployed this, I noticed Ollama added latency at query time. Generating embeddings for each search took 100-300ms, which felt slow.
The fix: pre-compute embeddings during sync, not at query time.
def sync_messages(conn, messages: list): """Sync messages with pre-computed embeddings."""
# Batch request to Ollama for efficiency embeddings = [] for msg in messages: emb = requests.post( 'http://localhost:11434/api/embeddings', json={'model': 'nomic-embed-text', 'prompt': msg['content']} ).json()['embedding'] embeddings.append(emb)
# Bulk insert cursor = conn.cursor() for msg, emb in zip(messages, embeddings): cursor.execute(''' INSERT OR REPLACE INTO messages(id, platform, channel, sender, content) VALUES(?, ?, ?, ?, ?) ''', (msg['id'], msg['platform'], msg['channel'], msg['sender'], msg['content']))
emb_blob = struct.pack(f'{len(emb)}f', *emb) cursor.execute(''' INSERT OR REPLACE INTO embeddings(message_id, embedding) VALUES(?, ?) ''', (msg['id'], emb_blob))
conn.commit()Additional optimizations that helped:
- Use smaller models - nomic-embed-text is faster than larger models
- Batch embedding generation - Process messages in batches during sync
- Cache frequent queries - Store common search results
- Use sqlite-vec extension - Native vector operations (optional but faster)
Common Mistakes I Made
Mistake 1: Indexing Everything
When I bulk-indexed my entire communication archive, retrieval started surfacing irrelevant old threads. Too much noise crowded out the signal.
Fix: Be selective about what you index. Tag agent-relevant content. Filter out automated messages and notifications.
Mistake 2: Not Handling Updates
Messages get edited, deleted, and threads fork. My initial sync didnβt track versions, leading to stale data.
Fix: Track message versions with timestamps. Sync deltas instead of full reloads.
def sync_delta(conn, platform: str, last_sync: datetime): """Sync only new and modified messages.""" messages = fetch_messages_since(platform, last_sync)
for msg in messages: # Check if message was updated existing = conn.execute( 'SELECT timestamp FROM messages WHERE id = ?', (msg['id'],) ).fetchone()
if not existing or existing[0] < msg['updated_at']: # Insert or update store_message_and_embedding(conn, msg)Mistake 3: Ignoring Privacy
Local storage means I control my data, but it also means Iβm responsible for securing it.
Fix: Encrypt sensitive content before storing. Consider what fields actually need to be indexed.
Related Knowledge
Why SQLite FTS5?
FTS5 (Full-Text Search 5) is a SQLite extension that provides:
- Fast full-text search over large text corpora
- Built-in tokenization and stemming
- Ranking algorithms (BM25)
- Phrase queries and NEAR operators
-- Search for "api error" within 3 wordsSELECT * FROM messages_fts WHERE messages_fts MATCH 'api NEAR/3 error';
-- Search with wildcardsSELECT * FROM messages_fts WHERE messages_fts MATCH 'error*';Why Ollama for Embeddings?
Ollama provides:
- Local inference (no API calls)
- Multiple embedding models
- Simple REST API
- GPU acceleration support
The nomic-embed-text model I use has:
- 768 dimensions (compact)
- 8192 token context window
- Good performance on semantic similarity benchmarks
Alternative: sqlite-vec Extension
For even better performance, the sqlite-vec extension provides native vector operations:
# If you have sqlite-vec installedconn.enable_load_extension(True)conn.load_extension('sqlite_vec')
# Create vector indexconn.execute(''' CREATE VIRTUAL TABLE vec_messages USING vec0( message_id TEXT PRIMARY KEY, embedding FLOAT[768] )''')
# Vector search with distanceresults = conn.execute(''' SELECT message_id, distance FROM vec_messages WHERE embedding MATCH ? ORDER BY distance LIMIT 10''', (query_embedding.tolist(),))Summary
In this post, I showed how to build a local AI memory system using SQLite and Ollama. The key components are:
- SQLite with FTS5 for fast keyword search
- Ollama embeddings for semantic similarity
- Hybrid search combining both approaches
- Pre-computed embeddings for performance
The result: your AI agent can search across all your messaging channels without cloud dependencies, API costs, or data leaving your machine.
Start with keyword search (simpler), then add semantic search as needed. Pre-compute embeddings during sync to avoid query-time latency.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- π¨βπ» traul - Local Memory System
- π¨βπ» Ollama
- π¨βπ» nomic-embed-text Model
- π¨βπ» sqlite-vec Extension
Oh, and if you found these resources useful, donβt forget to support me by starring the repo on GitHub!
Comments