FTS5 vs Vector Search: When to Use Keyword vs Semantic Search Locally

Mar 17, 2026

Problem

When I built a local search system for my Discord messages, I hit a wall. My FTS5 keyword search worked perfectly for exact matches like “API migration” but completely failed when I tried to find “that conversation about pushing code to production” because nobody used the word “deploy” in that thread.

I had two options: stick with keyword search (fast but limited) or add vector search (slow but smart). The question was: which one should I use, and could I use both?

Environment

SQLite with FTS5 extension (built into Python’s sqlite3 module)
Ollama with nomic-embed-text model for embeddings
M1 Mac with 16GB RAM
~100k Discord messages to search

What I tried first: FTS5 keyword search

My initial approach was SQLite FTS5. It’s built into SQLite, requires zero external dependencies, and responds in milliseconds.

import sqlite3

def setup_fts5(db_path: str = 'messages.db'):
    conn = sqlite3.connect(db_path)
    conn.execute('''
        CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
            content,
            author,
            timestamp
        )
    ''')
    return conn

def fts5_search(query: str, db_path: str = 'messages.db'):
    conn = sqlite3.connect(db_path)
    cursor = conn.execute('''
        SELECT content, author, timestamp
        FROM messages_fts
        WHERE messages_fts MATCH ?
        ORDER BY rank
        LIMIT 10
    ''', (query,))
    return cursor.fetchall()

# Fast search for exact keywords
results = fts5_search("API migration")
# Response time: 1-5ms

FTS5 worked great for:

Exact phrase matches: "error code 500"
Author names: "author:alex"
Boolean queries: "deploy AND production NOT staging"

But it failed hard for fuzzy recall:

Me: Find that discussion about the deployment issue we had last month
FTS5: No results (because nobody said "deployment", they said "push broke")

Me: Search for: API version change
FTS5: Found 50 results (but none about the actual API version discussion)

The problem: I needed to know the exact keywords to find anything.

What I tried next: Vector search with Ollama

I added semantic search using Ollama embeddings. Now “deployment issue” could match “push broke on production” because the concepts are similar.

import requests
import numpy as np
import sqlite3

def get_embedding(text: str, model: str = "nomic-embed-text"):
    response = requests.post(
        'http://localhost:11434/api/embeddings',
        json={'model': model, 'prompt': text}
    )
    return np.array(response.json()['embedding'])

def store_embeddings(db_path: str = 'messages.db'):
    conn = sqlite3.connect(db_path)
    conn.execute('''
        CREATE TABLE IF NOT EXISTS embeddings (
            id INTEGER PRIMARY KEY,
            content TEXT,
            embedding BLOB
        )
    ''')

    # Pre-compute embeddings for all messages
    cursor = conn.execute('SELECT id, content FROM messages')
    for row in cursor:
        embedding = get_embedding(row[1])
        conn.execute(
            'INSERT INTO embeddings (id, content, embedding) VALUES (?, ?, ?)',
            (row[0], row[1], embedding.tobytes())
        )
    conn.commit()

def vector_search(query: str, db_path: str = 'messages.db', top_k: int = 10):
    query_embedding = get_embedding(query)
    conn = sqlite3.connect(db_path)

    cursor = conn.execute('SELECT id, content, embedding FROM embeddings')
    similarities = []

    for row in cursor:
        stored_embedding = np.frombuffer(row[2], dtype=np.float32)
        similarity = np.dot(query_embedding, stored_embedding) / (
            np.linalg.norm(query_embedding) * np.linalg.norm(stored_embedding)
        )
        similarities.append((row[0], row[1], similarity))

    return sorted(similarities, key=lambda x: x[2], reverse=True)[:top_k]

# Semantic search for fuzzy recall
results = vector_search("how we handled the deployment issue")
# Response time: 100-500ms

Vector search found what FTS5 couldn’t:

Me: Find that discussion about the deployment issue
Vector Search: Found 3 results:
  1. "the push broke on production yesterday" (score: 0.89)
  2. "we need to fix the release pipeline" (score: 0.82)
  3. "staging is down after the update" (score: 0.78)

But vector search had its own problems: latency.

The performance gap

I measured the response times:

Query Type              | FTS5      | Vector Search
------------------------|-----------|---------------
Exact keyword match     | 1-5ms     | 100-500ms
Synonym match           | 0 results | 100-500ms
"Something about..."    | 0 results | 150-600ms
100k messages scanned   | 2ms       | 450ms

The 100ms-500ms latency for vector search is noticeable. Users notice anything over 100ms as “laggy.”

Why the latency matters

On my M1 Mac with 16GB RAM:

Embedding generation (nomic-embed-text): 50-100ms per query
Vector similarity calculation: 100-400ms for 100k messages
Total: 150-500ms

vs.

FTS5 index lookup: 1-5ms

The bottleneck is clear: vector search requires computing similarity against every stored embedding. FTS5 just looks up a pre-built index.

The hybrid solution

I realized I didn’t need to choose. I could use both:

import sqlite3
import numpy as np
from typing import List, Tuple

def hybrid_search(query: str, db_path: str = 'messages.db', top_k: int = 10):
    """
    Combine FTS5 precision with vector search recall.
    Uses reciprocal rank fusion to merge results.
    """
    # Fast keyword results from FTS5
    fts5_results = fts5_search(query, db_path, top_k * 2)

    # Semantic results for recall
    vector_results = vector_search(query, db_path, top_k * 2)

    # Reciprocal rank fusion
    def get_rank_scores(results: List, k: int = 60) -> dict:
        return {r[0]: 1 / (k + i) for i, r in enumerate(results)}

    fts5_scores = get_rank_scores(fts5_results)
    vector_scores = get_rank_scores(vector_results)

    # Combine scores
    all_ids = set(fts5_scores.keys()) | set(vector_scores.keys())
    combined = []

    for msg_id in all_ids:
        score = fts5_scores.get(msg_id, 0) + vector_scores.get(msg_id, 0)
        # Get content from either result set
        content = next((r[1] for r in fts5_results + vector_results if r[0] == msg_id), '')
        combined.append((msg_id, content, score))

    return sorted(combined, key=lambda x: x[2], reverse=True)[:top_k]

The hybrid approach gives me both:

Query: "deployment issue"

FTS5 results (precision):
  - "deployment script failed" (exact match)
  - "deployment rollback needed" (exact match)

Vector results (recall):
  - "the push broke on production" (semantic match)
  - "release pipeline is stuck" (semantic match)

Hybrid merged results:
  1. "deployment script failed" (score: 0.033)
  2. "the push broke on production" (score: 0.028)
  3. "deployment rollback needed" (score: 0.025)
  4. "release pipeline is stuck" (score: 0.018)

When to use each approach

Based on my testing, here’s the decision matrix:

| Query Type                        | Use FTS5 | Use Vector | Use Hybrid |
|-----------------------------------|----------|------------|------------|
| Exact phrase "error code 500"     | Yes      | No         | No         |
| Finding specific error code       | Yes      | No         | No         |
| "What did Alex say about..."      | Yes      | No         | No         |
| "Something about the migration"   | No       | Yes        | Yes        |
| "That bug with null pointer"      | No       | Yes        | Yes        |
| General search box                | No       | No         | Yes        |
| API docs lookup                   | Yes      | No         | No         |

Optimization tips I learned

Pre-compute embeddings during sync, not at query time

# BAD: Compute at query time
def search_slow(query):
    embedding = get_embedding(query)  # 50-100ms
    return vector_search_with_embedding(embedding)

# GOOD: Pre-compute during message sync
def sync_messages(messages):
    for msg in messages:
        embedding = get_embedding(msg.content)
        store_embedding(msg.id, embedding)
    # Now search is fast

Use smaller embedding models

Model comparison on M1 Mac:
nomic-embed-text: 50-100ms per embedding
all-MiniLM-L6-v2: 30-60ms per embedding
text-embedding-3-small (OpenAI): API latency varies

FTS5 as first pass, vectors as refinement

def two_pass_search(query: str, db_path: str = 'messages.db'):
    # Pass 1: FTS5 for candidates (fast)
    candidates = fts5_search(query, db_path, top_k=100)

    # If FTS5 finds good results, skip vector search
    if len(candidates) >= 10:
        return candidates[:10]

    # Pass 2: Vector search for recall (only if needed)
    return vector_search(query, db_path, top_k=10)

This approach gives you FTS5 speed 90% of the time, with vector search as a fallback.

Why FTS5 can’t handle synonyms

FTS5 uses inverted indices: it maps words to documents. When you search “deploy”, it looks up that exact word in the index. “Deployment” has a different index entry. “Push to production” has none of those words.

Vector embeddings, on the other hand, represent meaning in a high-dimensional space. “Deploy” and “push to production” have similar vectors because they appear in similar contexts in training data.

Reciprocal rank fusion explained

The fusion formula I use:

RRF_score(d) = sum(1 / (k + rank(d))) for each ranking list

Where k is a constant (typically 60) that smooths the rank differences. This combines multiple rankings without needing to normalize scores across different systems.

When to skip hybrid search

If your use case is:

Documentation lookup (exact terms expected): Use FTS5 only
Chat history search (fuzzy recall needed): Use hybrid
Code search (exact identifiers needed): Use FTS5 only
Research assistant (exploratory search): Use vector search

Summary

In this post, I compared FTS5 keyword search with Ollama vector search for local search systems. The key insight: you don’t need to choose one.

FTS5 gives you millisecond response times and exact precision
Vector search gives you semantic understanding and fuzzy recall
Hybrid search gives you both, with reciprocal rank fusion

Start with FTS5 for exact keyword searches. Add vector search when users need to find things they can only vaguely remember. Use the two-pass approach for the best balance of speed and recall.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 SQLite FTS5 Extension
👨‍💻 Ollama Embeddings API
👨‍💻 traul GitHub - Discord Message Search Tool

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!