FTS5 vs Vector Search: When to Use Keyword vs Semantic Search Locally
Problem
When I built a local search system for my Discord messages, I hit a wall. My FTS5 keyword search worked perfectly for exact matches like “API migration” but completely failed when I tried to find “that conversation about pushing code to production” because nobody used the word “deploy” in that thread.
I had two options: stick with keyword search (fast but limited) or add vector search (slow but smart). The question was: which one should I use, and could I use both?
Environment
- SQLite with FTS5 extension (built into Python’s sqlite3 module)
- Ollama with nomic-embed-text model for embeddings
- M1 Mac with 16GB RAM
- ~100k Discord messages to search
What I tried first: FTS5 keyword search
My initial approach was SQLite FTS5. It’s built into SQLite, requires zero external dependencies, and responds in milliseconds.
import sqlite3
def setup_fts5(db_path: str = 'messages.db'): conn = sqlite3.connect(db_path) conn.execute(''' CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5( content, author, timestamp ) ''') return conn
def fts5_search(query: str, db_path: str = 'messages.db'): conn = sqlite3.connect(db_path) cursor = conn.execute(''' SELECT content, author, timestamp FROM messages_fts WHERE messages_fts MATCH ? ORDER BY rank LIMIT 10 ''', (query,)) return cursor.fetchall()
# Fast search for exact keywordsresults = fts5_search("API migration")# Response time: 1-5msFTS5 worked great for:
- Exact phrase matches:
"error code 500" - Author names:
"author:alex" - Boolean queries:
"deploy AND production NOT staging"
But it failed hard for fuzzy recall:
Me: Find that discussion about the deployment issue we had last monthFTS5: No results (because nobody said "deployment", they said "push broke")
Me: Search for: API version changeFTS5: Found 50 results (but none about the actual API version discussion)The problem: I needed to know the exact keywords to find anything.
What I tried next: Vector search with Ollama
I added semantic search using Ollama embeddings. Now “deployment issue” could match “push broke on production” because the concepts are similar.
import requestsimport numpy as npimport sqlite3
def get_embedding(text: str, model: str = "nomic-embed-text"): response = requests.post( 'http://localhost:11434/api/embeddings', json={'model': model, 'prompt': text} ) return np.array(response.json()['embedding'])
def store_embeddings(db_path: str = 'messages.db'): conn = sqlite3.connect(db_path) conn.execute(''' CREATE TABLE IF NOT EXISTS embeddings ( id INTEGER PRIMARY KEY, content TEXT, embedding BLOB ) ''')
# Pre-compute embeddings for all messages cursor = conn.execute('SELECT id, content FROM messages') for row in cursor: embedding = get_embedding(row[1]) conn.execute( 'INSERT INTO embeddings (id, content, embedding) VALUES (?, ?, ?)', (row[0], row[1], embedding.tobytes()) ) conn.commit()
def vector_search(query: str, db_path: str = 'messages.db', top_k: int = 10): query_embedding = get_embedding(query) conn = sqlite3.connect(db_path)
cursor = conn.execute('SELECT id, content, embedding FROM embeddings') similarities = []
for row in cursor: stored_embedding = np.frombuffer(row[2], dtype=np.float32) similarity = np.dot(query_embedding, stored_embedding) / ( np.linalg.norm(query_embedding) * np.linalg.norm(stored_embedding) ) similarities.append((row[0], row[1], similarity))
return sorted(similarities, key=lambda x: x[2], reverse=True)[:top_k]
# Semantic search for fuzzy recallresults = vector_search("how we handled the deployment issue")# Response time: 100-500msVector search found what FTS5 couldn’t:
Me: Find that discussion about the deployment issueVector Search: Found 3 results: 1. "the push broke on production yesterday" (score: 0.89) 2. "we need to fix the release pipeline" (score: 0.82) 3. "staging is down after the update" (score: 0.78)But vector search had its own problems: latency.
The performance gap
I measured the response times:
Query Type | FTS5 | Vector Search------------------------|-----------|---------------Exact keyword match | 1-5ms | 100-500msSynonym match | 0 results | 100-500ms"Something about..." | 0 results | 150-600ms100k messages scanned | 2ms | 450msThe 100ms-500ms latency for vector search is noticeable. Users notice anything over 100ms as “laggy.”
Why the latency matters
On my M1 Mac with 16GB RAM:
Embedding generation (nomic-embed-text): 50-100ms per queryVector similarity calculation: 100-400ms for 100k messagesTotal: 150-500ms
vs.
FTS5 index lookup: 1-5msThe bottleneck is clear: vector search requires computing similarity against every stored embedding. FTS5 just looks up a pre-built index.
The hybrid solution
I realized I didn’t need to choose. I could use both:
import sqlite3import numpy as npfrom typing import List, Tuple
def hybrid_search(query: str, db_path: str = 'messages.db', top_k: int = 10): """ Combine FTS5 precision with vector search recall. Uses reciprocal rank fusion to merge results. """ # Fast keyword results from FTS5 fts5_results = fts5_search(query, db_path, top_k * 2)
# Semantic results for recall vector_results = vector_search(query, db_path, top_k * 2)
# Reciprocal rank fusion def get_rank_scores(results: List, k: int = 60) -> dict: return {r[0]: 1 / (k + i) for i, r in enumerate(results)}
fts5_scores = get_rank_scores(fts5_results) vector_scores = get_rank_scores(vector_results)
# Combine scores all_ids = set(fts5_scores.keys()) | set(vector_scores.keys()) combined = []
for msg_id in all_ids: score = fts5_scores.get(msg_id, 0) + vector_scores.get(msg_id, 0) # Get content from either result set content = next((r[1] for r in fts5_results + vector_results if r[0] == msg_id), '') combined.append((msg_id, content, score))
return sorted(combined, key=lambda x: x[2], reverse=True)[:top_k]The hybrid approach gives me both:
Query: "deployment issue"
FTS5 results (precision): - "deployment script failed" (exact match) - "deployment rollback needed" (exact match)
Vector results (recall): - "the push broke on production" (semantic match) - "release pipeline is stuck" (semantic match)
Hybrid merged results: 1. "deployment script failed" (score: 0.033) 2. "the push broke on production" (score: 0.028) 3. "deployment rollback needed" (score: 0.025) 4. "release pipeline is stuck" (score: 0.018)When to use each approach
Based on my testing, here’s the decision matrix:
| Query Type | Use FTS5 | Use Vector | Use Hybrid ||-----------------------------------|----------|------------|------------|| Exact phrase "error code 500" | Yes | No | No || Finding specific error code | Yes | No | No || "What did Alex say about..." | Yes | No | No || "Something about the migration" | No | Yes | Yes || "That bug with null pointer" | No | Yes | Yes || General search box | No | No | Yes || API docs lookup | Yes | No | No |Optimization tips I learned
- Pre-compute embeddings during sync, not at query time
# BAD: Compute at query timedef search_slow(query): embedding = get_embedding(query) # 50-100ms return vector_search_with_embedding(embedding)
# GOOD: Pre-compute during message syncdef sync_messages(messages): for msg in messages: embedding = get_embedding(msg.content) store_embedding(msg.id, embedding) # Now search is fast- Use smaller embedding models
Model comparison on M1 Mac:nomic-embed-text: 50-100ms per embeddingall-MiniLM-L6-v2: 30-60ms per embeddingtext-embedding-3-small (OpenAI): API latency varies- FTS5 as first pass, vectors as refinement
def two_pass_search(query: str, db_path: str = 'messages.db'): # Pass 1: FTS5 for candidates (fast) candidates = fts5_search(query, db_path, top_k=100)
# If FTS5 finds good results, skip vector search if len(candidates) >= 10: return candidates[:10]
# Pass 2: Vector search for recall (only if needed) return vector_search(query, db_path, top_k=10)This approach gives you FTS5 speed 90% of the time, with vector search as a fallback.
Related knowledge
Why FTS5 can’t handle synonyms
FTS5 uses inverted indices: it maps words to documents. When you search “deploy”, it looks up that exact word in the index. “Deployment” has a different index entry. “Push to production” has none of those words.
Vector embeddings, on the other hand, represent meaning in a high-dimensional space. “Deploy” and “push to production” have similar vectors because they appear in similar contexts in training data.
Reciprocal rank fusion explained
The fusion formula I use:
RRF_score(d) = sum(1 / (k + rank(d))) for each ranking listWhere k is a constant (typically 60) that smooths the rank differences. This combines multiple rankings without needing to normalize scores across different systems.
When to skip hybrid search
If your use case is:
- Documentation lookup (exact terms expected): Use FTS5 only
- Chat history search (fuzzy recall needed): Use hybrid
- Code search (exact identifiers needed): Use FTS5 only
- Research assistant (exploratory search): Use vector search
Summary
In this post, I compared FTS5 keyword search with Ollama vector search for local search systems. The key insight: you don’t need to choose one.
- FTS5 gives you millisecond response times and exact precision
- Vector search gives you semantic understanding and fuzzy recall
- Hybrid search gives you both, with reciprocal rank fusion
Start with FTS5 for exact keyword searches. Add vector search when users need to find things they can only vaguely remember. Use the two-pass approach for the best balance of speed and recall.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments