Best Vector Search Approach for AI Agent Knowledge Bases

May 3, 2026

I built a research aggregation agent that collected articles from multiple sources. The agent stored everything in LanceDB with vector embeddings. When I queried for “Python async patterns”, it returned results about async programming - but also returned an article about async patterns in JavaScript from 2019, a Reddit thread about async UI issues, and a tutorial that had been superseded by a newer version.

The problem: pure vector search returns semantically similar but contextually irrelevant results. Same topic, wrong entity, outdated info, different programming language.

Why Pure Vector Search Fails for Agents

I asked on Reddit which vector search approach works best for agent knowledge bases. The response surprised me: vector search alone isn’t enough. You need:

Entity resolution - Same concept across sources should map to one canonical record
Source provenance - Every piece of knowledge needs to trace back to its origin
Deduplication - Content hash prevents storing the same article multiple times
Recency decay - Old information should score lower than fresh data
Confidence scoring - Not all retrieved knowledge is equally reliable

The expensive part isn’t the vector database - it’s the knowledge governance layer on top.

What I Tried First (And Why It Failed)

My initial implementation was straightforward:

from lancedb import connect

db = connect("./knowledge_db")
table = db.open_table("agent_memory")

def retrieve(query: str, top_k: int = 10):
    query_embedding = embed(query)
    results = table.search(query_embedding).limit(top_k).to_df()
    return results

def add_knowledge(content: str, source: str):
    embedding = embed(content)
    table.add([
        {'embedding': embedding, 'content': content, 'source': source}
    ])

This worked for a demo. Then I ran it for a week and discovered:

The same article appeared 3 times (Reddit cross-post, blog mirror, newsletter archive)
A query for “React hooks” returned articles from 2018 - before the API stabilized
I couldn’t trace which source a piece of knowledge came from
Confidence scores were meaningless - everything scored 0.85+ because vectors cluster together

The Production Approach: Layered Retrieval

I rebuilt the retrieval system with multiple processing layers:

from lancedb import connect
import numpy as np
from datetime import datetime, timedelta

db = connect("./knowledge_db")
table = db.open_table("agent_memory")

def retrieve_with_context(query: str, top_k: int = 10):
    # Step 1: Vector search (base similarity)
    query_embedding = embed(query)
    similar = table.search(query_embedding).limit(top_k * 3).to_df()

    # Step 2: Entity resolution (canonical records)
    resolved = resolve_entities(similar)  # Dedupe by content_hash

    # Step 3: Source provenance (traceability)
    for record in resolved:
        record['sources'] = get_provenance(record['id'])

    # Step 4: Recency decay (freshness weighting)
    now = datetime.now()
    for record in resolved:
        age_days = (now - record['timestamp']).days
        record['decay_score'] = record['score'] * np.exp(-age_days / 30)

    # Step 5: Confidence scoring (quality threshold)
    confident = [r for r in resolved if r['decay_score'] > 0.5]

    return confident[:top_k]

def add_knowledge(content: str, source: str, metadata: dict):
    embedding = embed(content)
    content_hash = hash_content(content)

    # Check for duplicates before inserting
    existing = table.search(embedding).limit(5).to_df()
    for record in existing:
        if record['content_hash'] == content_hash:
            return record['id']  # Return existing canonical ID

    # Insert new canonical record
    table.add([
        {
            'embedding': embedding,
            'content': content,
            'content_hash': content_hash,
            'source': source,
            'timestamp': datetime.now(),
            'metadata': metadata
        }
    ])

The key changes:

Over-fetch then filter: I retrieve top_k * 3 results, then filter down. This gives room for deduplication.
Content hash: hash_content() creates a canonical identifier. Same content from different sources maps to the same hash.
Recency decay: The np.exp(-age_days / 30) formula halves the score every 30 days. A 60-day-old article scores 25% of a fresh one.
Return existing on duplicate: Instead of inserting duplicate content, I return the existing canonical ID. This prevents “the same article 3 times”.

Multi-Tenant Knowledge Base with Postgres

For systems with multiple users or organizations, tenant isolation becomes critical. LanceDB works well for personal use, but Postgres with pgvector handles multi-tenant production:

from sqlalchemy import text
import psycopg2

conn = psycopg2.connect("postgresql://user:pass@localhost/agent_db")

def retrieve_tenant_knowledge(tenant_id: str, query_embedding: list):
    with conn.cursor() as cur:
        # Vector search with tenant isolation
        cur.execute("""
            SELECT id, content, source, timestamp,
                   1 - (embedding <=> :query_vec::vector) as similarity,
                   created_at
            FROM knowledge_base
            WHERE tenant_id = :tenant_id
            AND is_canonical = true
            ORDER BY embedding <=> :query_vec::vector
            LIMIT 20
        """, {
            'tenant_id': tenant_id,
            'query_vec': str(query_embedding)
        })

        results = cur.fetchall()

        # Apply recency decay
        scored = apply_decay_scoring(results)

        return scored[:10]

The <=> operator is pgvector’s cosine distance. The is_canonical flag ensures only deduplicated master records are retrieved.

Stack Recommendations by Scale

Scale              | Vector DB        | Why
-------------------|------------------|------------------------------------------
Personal/Small     | LanceDB          | Embedded, zero-config, works on VPS
Medium/Multi-tenant| Postgres+pgvector| Existing infra, tenant isolation, SQL
Large/High-volume  | Qdrant/Milvus    | Dedicated vector infra, advanced filtering
Knowledge graphs   | Kuzu             | Entity relationships, structured queries

I use LanceDB for my personal agent because it’s embedded - no separate server to manage. For a production system with multiple users, Postgres with pgvector is the pragmatic choice because:

Tenant isolation with WHERE tenant_id = X
Existing backup and monitoring infrastructure
SQL queries combine vector search with metadata filters
Row-level security for compliance requirements

The Missing Pieces in Most Frameworks

I tried Hermes Agent and similar frameworks. They handle prompts and workflows well, but skip:

Canonical entity storage - No mechanism to deduplicate knowledge across sources
Source provenance - Can’t trace which URL/article a fact came from
Multi-tenant memory - No tenant isolation by default
Confidence scoring - No quality thresholding on retrieved knowledge
Auditability - No audit trail for what the agent “learned”

These gaps become production failures. An agent that can’t distinguish fresh from stale knowledge, or can’t trace sources, produces unreliable output.

My Current Architecture

I ended up with a multi-layer approach:

Source Content
     |
     v
[Content Hash + Deduplication]
     |
     v
[Embedding Generation] --> LanceDB (vector index)
     |
     v
[Entity Resolution] --> Canonical records
     |
     v
[Provenance Tracking] --> Source mapping table
     |
     v
[Decay Scoring] --> Timestamp-weighted retrieval
     |
     v
Agent Context

The vector database is just the storage layer. The intelligence is in the governance pipeline above it.

What I Would Do Differently

Start with deduplication, not embeddings. I spent weeks tuning embedding models before realizing my biggest problem was duplicate content.
Add provenance from day one. Every piece of knowledge needs a source URL. Without this, debugging agent decisions is impossible.
Test with stale content. My test data was all fresh. Production data includes articles from 2019, outdated APIs, deprecated libraries. Decay scoring catches this.
Use content hash, not URL hash. The same article at different URLs should be one canonical record. URL hashing creates duplicates from mirrors.
Measure retrieval quality, not just latency. I optimized for fast queries. Then discovered 40% of retrieved content was irrelevant. Quality metrics matter more than speed metrics.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!