Skip to content

Building Local AI Memory with SQLite and Ollama for Cross-Channel Search

Problem

When I work with AI agents, I face a frustrating limitation: they cannot remember my past conversations across different platforms. My Slack messages, Discord discussions, and email threads exist in separate silos. My AI assistant has no way to search through all of them together.

Cloud-based memory solutions exist, but they have problems:

Cloud MemoryLocal Memory
API costs per queryFree unlimited queries
Data leaves my machineData stays local
Requires internetWorks offline
Platform controls accessI control everything
Subscription costsOne-time setup

I wanted to build a local AI memory system that syncs all my messaging channels into one SQLite database and gives my agent fast search access to everything.

What I Built

I built a two-layer search system:

  1. Keyword search using SQLite FTS5 (fast, exact matches)
  2. Semantic search using Ollama embeddings (conceptual similarity)

The architecture looks like this:

Architecture Overview
+------------------------------------------------------------------+
| Data Sources |
| Messages | Files | Calendar | App Usage | Browser |
+-----+------+----+----+-----+------+-----+-------+-------+--------+
| | | | |
v v v v v
+------------------------------------------------------------------+
| Ingestion Pipeline |
| - Platform-specific connectors |
| - Normalization layer |
| - Deduplication |
+----------------------------------+-------------------------------+
|
v
+------------------------------------------------------------------+
| SQLite Storage |
| +--------------+ +--------------+ +----------------------+ |
| | messages | | events | | file_references | |
| | - id | | - id | | - id | |
| | - platform | | - type | | - path | |
| | - channel | | - timestamp | | - content_hash | |
| | - sender | | - metadata | | - last_accessed | |
| | - content | | | | | |
| | - timestamp | | | | | |
| +--------------+ +--------------+ +----------------------+ |
| |
| +----------------------------------------------------------+ |
| | FTS5 Virtual Tables (keyword search) | |
| +----------------------------------------------------------+ |
+----------------------------------+-------------------------------+
|
v
+------------------------------------------------------------------+
| Ollama Embeddings |
| +----------------------------------------------------------+ |
| | Model: nomic-embed-text (fast, 768 dimensions) | |
| | Stored in: embeddings table (id, vector blob) | |
| +----------------------------------------------------------+ |
+----------------------------------+-------------------------------+
|
v
+------------------------------------------------------------------+
| Search API |
| - Keyword search (FTS5) |
| - Semantic search (vector similarity) |
| - Hybrid search (combined) |
+------------------------------------------------------------------+

Step 1: Create SQLite Schema

First, I set up the SQLite database with three tables: one for messages, one for full-text search, and one for embeddings.

schema.py
import sqlite3
conn = sqlite3.connect('memory.db')
# Main messages table
conn.execute('''
CREATE TABLE IF NOT EXISTS messages (
id TEXT PRIMARY KEY,
platform TEXT NOT NULL,
channel TEXT,
sender TEXT,
content TEXT NOT NULL,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
metadata JSON
)
''')
# FTS5 for keyword search
conn.execute('''
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
id, platform, channel, sender, content,
content='messages',
content_rowid='rowid',
tokenize='porter unicode61'
)
''')
# Embeddings storage
conn.execute('''
CREATE TABLE IF NOT EXISTS embeddings (
message_id TEXT PRIMARY KEY,
embedding BLOB NOT NULL,
model TEXT DEFAULT 'nomic-embed-text',
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (message_id) REFERENCES messages(id)
)
''')
conn.commit()

The FTS5 virtual table provides fast keyword search. The tokenize='porter unicode61' option enables stemming (finding β€œrunning” when searching β€œrun”).

Step 2: Generate Embeddings with Ollama

I use Ollama to generate embeddings locally. First, I installed the nomic-embed-text model:

Terminal
ollama pull nomic-embed-text

Then I created functions to generate and store embeddings:

embeddings.py
import requests
import struct
def get_embedding(text: str, model: str = "nomic-embed-text") -> bytes:
"""Get embedding from Ollama and store as binary blob."""
response = requests.post(
'http://localhost:11434/api/embeddings',
json={'model': model, 'prompt': text}
)
embedding = response.json()['embedding']
# Store as binary blob for efficiency
return struct.pack(f'{len(embedding)}f', *embedding)
def store_embedding(conn, message_id: str, content: str):
"""Store embedding for a message."""
embedding_blob = get_embedding(content)
conn.execute('''
INSERT OR REPLACE INTO embeddings(message_id, embedding)
VALUES(?, ?)
''', (message_id, embedding_blob))
conn.commit()

Storing embeddings as binary blobs instead of JSON saves significant space. A 768-dimensional float32 vector takes 3KB as binary vs 10KB+ as JSON.

The real power comes from combining keyword and semantic search:

search.py
import numpy as np
import struct
import requests
def search_memory(conn, query: str, limit: int = 10):
"""Hybrid search combining FTS5 and vector similarity."""
# 1. FTS5 keyword search
fts_results = conn.execute('''
SELECT m.id, m.content, m.platform, m.timestamp
FROM messages m
JOIN messages_fts fts ON m.id = fts.id
WHERE messages_fts MATCH ?
ORDER BY rank
LIMIT ?
''', (query, limit)).fetchall()
# 2. Vector semantic search
query_embedding = np.array(
requests.post(
'http://localhost:11434/api/embeddings',
json={'model': 'nomic-embed-text', 'prompt': query}
).json()['embedding']
)
# Load all embeddings and compute similarity
vector_results = []
for row in conn.execute('SELECT message_id, embedding FROM embeddings'):
stored_embedding = np.array(
struct.unpack(f'{len(query_embedding)}f', row[1])
)
similarity = np.dot(query_embedding, stored_embedding)
if similarity > 0.7: # threshold
vector_results.append((row[0], similarity))
# 3. Merge and rank results
return merge_results(fts_results, vector_results, limit)
def merge_results(fts_results, vector_results, limit):
"""Combine keyword and semantic results."""
seen = set()
merged = []
# Keyword results first (higher precision)
for row in fts_results:
if row[0] not in seen:
merged.append({
'id': row[0],
'content': row[1],
'platform': row[2],
'timestamp': row[3],
'source': 'keyword'
})
seen.add(row[0])
# Add semantic results
for msg_id, similarity in sorted(vector_results, key=lambda x: -x[1]):
if msg_id not in seen and len(merged) < limit:
merged.append({
'id': msg_id,
'similarity': similarity,
'source': 'semantic'
})
seen.add(msg_id)
return merged[:limit]

Performance Issues I Hit

When I first deployed this, I noticed Ollama added latency at query time. Generating embeddings for each search took 100-300ms, which felt slow.

The fix: pre-compute embeddings during sync, not at query time.

sync.py
def sync_messages(conn, messages: list):
"""Sync messages with pre-computed embeddings."""
# Batch request to Ollama for efficiency
embeddings = []
for msg in messages:
emb = requests.post(
'http://localhost:11434/api/embeddings',
json={'model': 'nomic-embed-text', 'prompt': msg['content']}
).json()['embedding']
embeddings.append(emb)
# Bulk insert
cursor = conn.cursor()
for msg, emb in zip(messages, embeddings):
cursor.execute('''
INSERT OR REPLACE INTO messages(id, platform, channel, sender, content)
VALUES(?, ?, ?, ?, ?)
''', (msg['id'], msg['platform'], msg['channel'], msg['sender'], msg['content']))
emb_blob = struct.pack(f'{len(emb)}f', *emb)
cursor.execute('''
INSERT OR REPLACE INTO embeddings(message_id, embedding)
VALUES(?, ?)
''', (msg['id'], emb_blob))
conn.commit()

Additional optimizations that helped:

  1. Use smaller models - nomic-embed-text is faster than larger models
  2. Batch embedding generation - Process messages in batches during sync
  3. Cache frequent queries - Store common search results
  4. Use sqlite-vec extension - Native vector operations (optional but faster)

Common Mistakes I Made

Mistake 1: Indexing Everything

When I bulk-indexed my entire communication archive, retrieval started surfacing irrelevant old threads. Too much noise crowded out the signal.

Fix: Be selective about what you index. Tag agent-relevant content. Filter out automated messages and notifications.

Mistake 2: Not Handling Updates

Messages get edited, deleted, and threads fork. My initial sync didn’t track versions, leading to stale data.

Fix: Track message versions with timestamps. Sync deltas instead of full reloads.

delta_sync.py
def sync_delta(conn, platform: str, last_sync: datetime):
"""Sync only new and modified messages."""
messages = fetch_messages_since(platform, last_sync)
for msg in messages:
# Check if message was updated
existing = conn.execute(
'SELECT timestamp FROM messages WHERE id = ?',
(msg['id'],)
).fetchone()
if not existing or existing[0] < msg['updated_at']:
# Insert or update
store_message_and_embedding(conn, msg)

Mistake 3: Ignoring Privacy

Local storage means I control my data, but it also means I’m responsible for securing it.

Fix: Encrypt sensitive content before storing. Consider what fields actually need to be indexed.

Why SQLite FTS5?

FTS5 (Full-Text Search 5) is a SQLite extension that provides:

  • Fast full-text search over large text corpora
  • Built-in tokenization and stemming
  • Ranking algorithms (BM25)
  • Phrase queries and NEAR operators
fts5_example.sql
-- Search for "api error" within 3 words
SELECT * FROM messages_fts WHERE messages_fts MATCH 'api NEAR/3 error';
-- Search with wildcards
SELECT * FROM messages_fts WHERE messages_fts MATCH 'error*';

Why Ollama for Embeddings?

Ollama provides:

  • Local inference (no API calls)
  • Multiple embedding models
  • Simple REST API
  • GPU acceleration support

The nomic-embed-text model I use has:

  • 768 dimensions (compact)
  • 8192 token context window
  • Good performance on semantic similarity benchmarks

Alternative: sqlite-vec Extension

For even better performance, the sqlite-vec extension provides native vector operations:

sqlite_vec_example.py
# If you have sqlite-vec installed
conn.enable_load_extension(True)
conn.load_extension('sqlite_vec')
# Create vector index
conn.execute('''
CREATE VIRTUAL TABLE vec_messages USING vec0(
message_id TEXT PRIMARY KEY,
embedding FLOAT[768]
)
''')
# Vector search with distance
results = conn.execute('''
SELECT message_id, distance
FROM vec_messages
WHERE embedding MATCH ?
ORDER BY distance
LIMIT 10
''', (query_embedding.tolist(),))

Summary

In this post, I showed how to build a local AI memory system using SQLite and Ollama. The key components are:

  1. SQLite with FTS5 for fast keyword search
  2. Ollama embeddings for semantic similarity
  3. Hybrid search combining both approaches
  4. Pre-computed embeddings for performance

The result: your AI agent can search across all your messaging channels without cloud dependencies, API costs, or data leaving your machine.

Start with keyword search (simpler), then add semantic search as needed. Pre-compute embeddings during sync to avoid query-time latency.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments