Building Local AI Memory with SQLite and Ollama for Cross-Channel Search

Mar 17, 2026

Problem

When I work with AI agents, I face a frustrating limitation: they cannot remember my past conversations across different platforms. My Slack messages, Discord discussions, and email threads exist in separate silos. My AI assistant has no way to search through all of them together.

Cloud-based memory solutions exist, but they have problems:

Cloud Memory	Local Memory
API costs per query	Free unlimited queries
Data leaves my machine	Data stays local
Requires internet	Works offline
Platform controls access	I control everything
Subscription costs	One-time setup

I wanted to build a local AI memory system that syncs all my messaging channels into one SQLite database and gives my agent fast search access to everything.

What I Built

I built a two-layer search system:

Keyword search using SQLite FTS5 (fast, exact matches)
Semantic search using Ollama embeddings (conceptual similarity)

The architecture looks like this:

+------------------------------------------------------------------+
|                        Data Sources                               |
|  Messages  |  Files  |  Calendar  |  App Usage  |  Browser       |
+-----+------+----+----+-----+------+-----+-------+-------+--------+
      |           |          |             |              |
      v           v          v             v              v
+------------------------------------------------------------------+
|                      Ingestion Pipeline                           |
|  - Platform-specific connectors                                   |
|  - Normalization layer                                            |
|  - Deduplication                                                  |
+----------------------------------+-------------------------------+
                                   |
                                   v
+------------------------------------------------------------------+
|                     SQLite Storage                                |
|  +--------------+  +--------------+  +----------------------+   |
|  | messages     |  | events       |  | file_references      |   |
|  | - id         |  | - id         |  | - id                 |   |
|  | - platform   |  | - type       |  | - path               |   |
|  | - channel    |  | - timestamp  |  | - content_hash       |   |
|  | - sender     |  | - metadata   |  | - last_accessed      |   |
|  | - content    |  |              |  |                      |   |
|  | - timestamp  |  |              |  |                      |   |
|  +--------------+  +--------------+  +----------------------+   |
|                                                                   |
|  +----------------------------------------------------------+    |
|  | FTS5 Virtual Tables (keyword search)                      |    |
|  +----------------------------------------------------------+    |
+----------------------------------+-------------------------------+
                                   |
                                   v
+------------------------------------------------------------------+
|                    Ollama Embeddings                              |
|  +----------------------------------------------------------+    |
|  | Model: nomic-embed-text (fast, 768 dimensions)           |    |
|  | Stored in: embeddings table (id, vector blob)            |    |
|  +----------------------------------------------------------+    |
+----------------------------------+-------------------------------+
                                   |
                                   v
+------------------------------------------------------------------+
|                      Search API                                   |
|  - Keyword search (FTS5)                                          |
|  - Semantic search (vector similarity)                            |
|  - Hybrid search (combined)                                       |
+------------------------------------------------------------------+

Step 1: Create SQLite Schema

First, I set up the SQLite database with three tables: one for messages, one for full-text search, and one for embeddings.

import sqlite3

conn = sqlite3.connect('memory.db')

# Main messages table
conn.execute('''
    CREATE TABLE IF NOT EXISTS messages (
        id TEXT PRIMARY KEY,
        platform TEXT NOT NULL,
        channel TEXT,
        sender TEXT,
        content TEXT NOT NULL,
        timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
        metadata JSON
    )
''')

# FTS5 for keyword search
conn.execute('''
    CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
        id, platform, channel, sender, content,
        content='messages',
        content_rowid='rowid',
        tokenize='porter unicode61'
    )
''')

# Embeddings storage
conn.execute('''
    CREATE TABLE IF NOT EXISTS embeddings (
        message_id TEXT PRIMARY KEY,
        embedding BLOB NOT NULL,
        model TEXT DEFAULT 'nomic-embed-text',
        created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
        FOREIGN KEY (message_id) REFERENCES messages(id)
    )
''')

conn.commit()

The FTS5 virtual table provides fast keyword search. The tokenize='porter unicode61' option enables stemming (finding “running” when searching “run”).

Step 2: Generate Embeddings with Ollama

I use Ollama to generate embeddings locally. First, I installed the nomic-embed-text model:

ollama pull nomic-embed-text

Then I created functions to generate and store embeddings:

import requests
import struct

def get_embedding(text: str, model: str = "nomic-embed-text") -> bytes:
    """Get embedding from Ollama and store as binary blob."""
    response = requests.post(
        'http://localhost:11434/api/embeddings',
        json={'model': model, 'prompt': text}
    )
    embedding = response.json()['embedding']
    # Store as binary blob for efficiency
    return struct.pack(f'{len(embedding)}f', *embedding)

def store_embedding(conn, message_id: str, content: str):
    """Store embedding for a message."""
    embedding_blob = get_embedding(content)
    conn.execute('''
        INSERT OR REPLACE INTO embeddings(message_id, embedding)
        VALUES(?, ?)
    ''', (message_id, embedding_blob))
    conn.commit()

Storing embeddings as binary blobs instead of JSON saves significant space. A 768-dimensional float32 vector takes 3KB as binary vs 10KB+ as JSON.

Step 3: Implement Hybrid Search

The real power comes from combining keyword and semantic search:

import numpy as np
import struct
import requests

def search_memory(conn, query: str, limit: int = 10):
    """Hybrid search combining FTS5 and vector similarity."""

    # 1. FTS5 keyword search
    fts_results = conn.execute('''
        SELECT m.id, m.content, m.platform, m.timestamp
        FROM messages m
        JOIN messages_fts fts ON m.id = fts.id
        WHERE messages_fts MATCH ?
        ORDER BY rank
        LIMIT ?
    ''', (query, limit)).fetchall()

    # 2. Vector semantic search
    query_embedding = np.array(
        requests.post(
            'http://localhost:11434/api/embeddings',
            json={'model': 'nomic-embed-text', 'prompt': query}
        ).json()['embedding']
    )

    # Load all embeddings and compute similarity
    vector_results = []
    for row in conn.execute('SELECT message_id, embedding FROM embeddings'):
        stored_embedding = np.array(
            struct.unpack(f'{len(query_embedding)}f', row[1])
        )
        similarity = np.dot(query_embedding, stored_embedding)
        if similarity > 0.7:  # threshold
            vector_results.append((row[0], similarity))

    # 3. Merge and rank results
    return merge_results(fts_results, vector_results, limit)

def merge_results(fts_results, vector_results, limit):
    """Combine keyword and semantic results."""
    seen = set()
    merged = []

    # Keyword results first (higher precision)
    for row in fts_results:
        if row[0] not in seen:
            merged.append({
                'id': row[0],
                'content': row[1],
                'platform': row[2],
                'timestamp': row[3],
                'source': 'keyword'
            })
            seen.add(row[0])

    # Add semantic results
    for msg_id, similarity in sorted(vector_results, key=lambda x: -x[1]):
        if msg_id not in seen and len(merged) < limit:
            merged.append({
                'id': msg_id,
                'similarity': similarity,
                'source': 'semantic'
            })
            seen.add(msg_id)

    return merged[:limit]

Performance Issues I Hit

When I first deployed this, I noticed Ollama added latency at query time. Generating embeddings for each search took 100-300ms, which felt slow.

The fix: pre-compute embeddings during sync, not at query time.

def sync_messages(conn, messages: list):
    """Sync messages with pre-computed embeddings."""

    # Batch request to Ollama for efficiency
    embeddings = []
    for msg in messages:
        emb = requests.post(
            'http://localhost:11434/api/embeddings',
            json={'model': 'nomic-embed-text', 'prompt': msg['content']}
        ).json()['embedding']
        embeddings.append(emb)

    # Bulk insert
    cursor = conn.cursor()
    for msg, emb in zip(messages, embeddings):
        cursor.execute('''
            INSERT OR REPLACE INTO messages(id, platform, channel, sender, content)
            VALUES(?, ?, ?, ?, ?)
        ''', (msg['id'], msg['platform'], msg['channel'], msg['sender'], msg['content']))

        emb_blob = struct.pack(f'{len(emb)}f', *emb)
        cursor.execute('''
            INSERT OR REPLACE INTO embeddings(message_id, embedding)
            VALUES(?, ?)
        ''', (msg['id'], emb_blob))

    conn.commit()

Additional optimizations that helped:

Use smaller models - nomic-embed-text is faster than larger models
Batch embedding generation - Process messages in batches during sync
Cache frequent queries - Store common search results
Use sqlite-vec extension - Native vector operations (optional but faster)

Common Mistakes I Made

Mistake 1: Indexing Everything

When I bulk-indexed my entire communication archive, retrieval started surfacing irrelevant old threads. Too much noise crowded out the signal.

Fix: Be selective about what you index. Tag agent-relevant content. Filter out automated messages and notifications.

Mistake 2: Not Handling Updates

Messages get edited, deleted, and threads fork. My initial sync didn’t track versions, leading to stale data.

Fix: Track message versions with timestamps. Sync deltas instead of full reloads.

def sync_delta(conn, platform: str, last_sync: datetime):
    """Sync only new and modified messages."""
    messages = fetch_messages_since(platform, last_sync)

    for msg in messages:
        # Check if message was updated
        existing = conn.execute(
            'SELECT timestamp FROM messages WHERE id = ?',
            (msg['id'],)
        ).fetchone()

        if not existing or existing[0] < msg['updated_at']:
            # Insert or update
            store_message_and_embedding(conn, msg)

Mistake 3: Ignoring Privacy

Local storage means I control my data, but it also means I’m responsible for securing it.

Fix: Encrypt sensitive content before storing. Consider what fields actually need to be indexed.

Why SQLite FTS5?

FTS5 (Full-Text Search 5) is a SQLite extension that provides:

Fast full-text search over large text corpora
Built-in tokenization and stemming
Ranking algorithms (BM25)
Phrase queries and NEAR operators

-- Search for "api error" within 3 words
SELECT * FROM messages_fts WHERE messages_fts MATCH 'api NEAR/3 error';

-- Search with wildcards
SELECT * FROM messages_fts WHERE messages_fts MATCH 'error*';

Why Ollama for Embeddings?

Ollama provides:

Local inference (no API calls)
Multiple embedding models
Simple REST API
GPU acceleration support

The nomic-embed-text model I use has:

768 dimensions (compact)
8192 token context window
Good performance on semantic similarity benchmarks

Alternative: sqlite-vec Extension

For even better performance, the sqlite-vec extension provides native vector operations:

# If you have sqlite-vec installed
conn.enable_load_extension(True)
conn.load_extension('sqlite_vec')

# Create vector index
conn.execute('''
    CREATE VIRTUAL TABLE vec_messages USING vec0(
        message_id TEXT PRIMARY KEY,
        embedding FLOAT[768]
    )
''')

# Vector search with distance
results = conn.execute('''
    SELECT message_id, distance
    FROM vec_messages
    WHERE embedding MATCH ?
    ORDER BY distance
    LIMIT 10
''', (query_embedding.tolist(),))

Summary

In this post, I showed how to build a local AI memory system using SQLite and Ollama. The key components are:

SQLite with FTS5 for fast keyword search
Ollama embeddings for semantic similarity
Hybrid search combining both approaches
Pre-computed embeddings for performance

The result: your AI agent can search across all your messaging channels without cloud dependencies, API costs, or data leaving your machine.

Start with keyword search (simpler), then add semantic search as needed. Pre-compute embeddings during sync to avoid query-time latency.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 traul - Local Memory System
👨‍💻 Ollama
👨‍💻 nomic-embed-text Model
👨‍💻 sqlite-vec Extension

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!