How to Build Agentic RAG with Multiple Search Types?

Mar 27, 2026

I spent weeks building a RAG system that only used vector search. It worked fine for semantic queries like “find similar documents” but completely failed when users asked for exact code snippets, specific error messages, or time-sensitive information.

Then I discovered agentic RAG with multiple search types.

The Problem with Single-Search RAG

I had a standard RAG pipeline:

User Query → Embedding → Vector Search → Top K Results → LLM Response

This approach has critical blind spots:

Exact matches: Vector search finds semantically similar content, not exact strings. When a user searches for a specific error like TypeError: 'NoneType' object is not iterable, vector search might return unrelated Python error discussions.
Keyword-heavy queries: Queries like “config.yaml authentication settings” are better served by keyword search than semantic similarity.
Changing information: Vector indices need re-indexing when content changes. For rapidly evolving codebases, this becomes a maintenance nightmare.
Structural queries: Questions like “show me all functions that call process_payment” require code analysis, not semantic search.

I needed multiple search types, but how could I intelligently select the right one for each query?

Enter Agentic RAG

Agentic RAG uses a single LLM agent with ReAct (Reasoning + Acting) reasoning to intelligently select and combine different retrieval tools based on the query type.

The key insight from a Reddit discussion on “Vectorless RAG” resonated with me:

“Building multiple types of search and letting the LLM choose to call any combination of these based on the type of query”

Instead of hardcoding search logic, the agent becomes a semantic layer that understands query intent:

“Agents can reason about how to turn prompts into keyword searches and reformulate queries, they become your semantic layer”

Architecture Overview

Here’s how I structured the system:

┌─────────────────────────────────────────────────────────────────┐
│                         User Query                               │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ReAct Agent (LLM)                             │
│                                                                  │
│  1. Analyze query intent                                         │
│  2. Select appropriate tool(s)                                   │
│  3. Execute tool calls (parallel when possible)                 │
│  4. Reason about results                                         │
│  5. Iterate or synthesize final answer                          │
└─────────────────────────────────────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
        ▼                       ▼                       ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│ Vector Search │    │ Keyword Search│    │  Git + Grep   │
│ (Semantic)    │    │  (BM25)       │    │  (Code)       │
└───────────────┘    └───────────────┘    └───────────────┘
        │                       │                       │
        └───────────────────────┼───────────────────────┘
                                ▼
                    ┌───────────────────────┐
                    │   Combined Results     │
                    │   (Ranked & Deduped)   │
                    └───────────────────────┘

Defining the Tools

I implemented each search type as a separate tool using LangChain’s tool decorator:

from langchain.tools import tool
from typing import List, Dict

@tool
def vector_search(query: str, k: int = 5) -> List[Dict]:
    """Search documents using semantic similarity.

    Best for: conceptual questions, similar content discovery,
    natural language queries where exact wording doesn't matter.

    Args:
        query: Natural language search query
        k: Number of results to return

    Returns:
        List of documents with similarity scores
    """
    # Your vector search implementation
    # e.g., Pinecone, Weaviate, or local embeddings
    pass

@tool
def keyword_search(query: str, k: int = 5) -> List[Dict]:
    """Search documents using BM25 keyword matching.

    Best for: exact term matching, technical identifiers,
    error messages, specific function names.

    Args:
        query: Keywords to search for
        k: Number of results to return

    Returns:
        List of documents with BM25 scores
    """
    # Your BM25 implementation
    # e.g., Whoosh, Elasticsearch, or custom BM25
    pass

@tool
def hybrid_search(query: str, k: int = 5, alpha: float = 0.5) -> List[Dict]:
    """Combine vector and keyword search results.

    Best for: general-purpose queries where you want both
    semantic understanding and exact term matching.

    Args:
        query: Search query
        k: Number of results to return
        alpha: Weight for vector vs keyword (0.5 = equal weight)

    Returns:
        Reranked combined results
    """
    # Your hybrid search implementation
    pass

@tool
def code_search(pattern: str, file_pattern: str = "*.py") -> List[Dict]:
    """Search codebase using grep for exact pattern matches.

    Best for: finding specific code patterns, function calls,
    error messages in code, structural queries.

    Args:
        pattern: Regex pattern to search for
        file_pattern: Glob pattern to filter files

    Returns:
        List of matches with file, line number, and context
    """
    # Your grep/rg implementation
    pass

@tool
def git_search(query: str, max_commits: int = 20) -> List[Dict]:
    """Search git history for changes and commit messages.

    Best for: understanding code evolution, finding when
    features were added/changed, time-sensitive information.

    Args:
        query: Search query for commit messages
        max_commits: Maximum commits to search

    Returns:
        List of relevant commits with metadata
    """
    # Your git log search implementation
    pass

The Agent Prompt Template

The ReAct pattern requires a clear prompt that instructs the agent on how to reason and act:

REACT_PROMPT = """You are an intelligent retrieval agent with access to multiple search tools.

Your goal is to gather comprehensive information to answer user queries.

## Available Tools

1. **vector_search**: Semantic similarity search. Use for conceptual questions,
   finding similar content, natural language queries.

2. **keyword_search**: BM25 exact keyword matching. Use for technical terms,
   error messages, specific identifiers.

3. **hybrid_search**: Combination of vector + keyword. Use as default for
   general queries.

4. **code_search**: Grep-based pattern search. Use for finding specific code,
   function calls, structural patterns.

5. **git_search**: Search git history. Use for understanding code evolution,
   finding recent changes, time-sensitive information.

## Strategy

1. **Analyze** the query to understand intent
2. **Select** the most appropriate tool(s)
3. **Execute** tool calls (use parallel calls when possible)
4. **Evaluate** results - do you have enough information?
5. **Iterate** if needed, or **synthesize** the final answer

## Decision Heuristics

- Error messages → keyword_search first
- "How do I..." questions → hybrid_search
- "Where is..." code location → code_search
- "When was..." changes → git_search
- "What is similar to..." → vector_search
- Complex queries → combine multiple tools

Remember: You have a latency budget of 200-400ms total. Parallel tool calls
help stay within this budget.

Begin!

Question: {input}
Thought: {agent_scratchpad}
"""

Tool Selection Strategies

Through experimentation, I identified four effective strategies:

1. Default to Hybrid

For the majority of queries (probably 70-80%), hybrid search provides the best balance:

# Default behavior - let agent decide, but hybrid is often best
DEFAULT_STRATEGY = {
    "general_queries": "hybrid_search",
    "reasoning": "Covers both semantic and exact matching"
}

The agent learns this preference from the prompt examples.

2. Intent-Based Routing

The agent analyzes query intent and routes accordingly:

Query: "TypeError: 'NoneType' object is not iterable"
Decision: keyword_search (exact error message)
Reason: Need exact string match, not semantic similarity

Query: "How do I implement authentication in FastAPI?"
Decision: hybrid_search (general question)
Reason: Want both conceptual explanation and code examples

Query: "Where is the process_payment function called?"
Decision: code_search (structural query)
Reason: Need to find specific function calls in codebase

Query: "When was the payment system redesigned?"
Decision: git_search (historical query)
Reason: Need git history, not current code

Query: "Find documentation similar to OAuth2 flow"
Decision: vector_search (similarity query)
Reason: Semantic similarity is the goal

The agent can call multiple tools sequentially when initial results are insufficient:

# Example iteration flow
iterations = [
    # Iteration 1: Try hybrid search
    {
        "thought": "User asks about error handling. Start with hybrid search.",
        "action": "hybrid_search",
        "query": "error handling patterns",
        "result": "Found general documentation"
    },
    # Iteration 2: Need more specific code examples
    {
        "thought": "Results are too general. User might want code examples.",
        "action": "code_search",
        "query": "except.*Error",
        "result": "Found specific try-except blocks"
    },
    # Iteration 3: Synthesize answer
    {
        "thought": "Have both conceptual and code results. Can now answer.",
        "action": "synthesize_answer",
        "result": "Comprehensive answer with examples"
    }
]

4. Parallel Tool Calls

The biggest latency win comes from calling tools in parallel:

# Parallel execution reduces latency significantly
async def parallel_search(query: str):
    # Execute multiple searches simultaneously
    results = await asyncio.gather(
        vector_search.ainvoke(query),
        keyword_search.ainvoke(query),
        code_search.ainvoke(query)
    )

    # Merge and deduplicate results
    return merge_results(results)

Performance Optimization

My target was 200-400ms total latency. Here’s how I achieved it:

import asyncio
from time import perf_counter

async def optimized_retrieval(query: str):
    start = perf_counter()

    # Step 1: Analyze intent (LLM call, ~50ms)
    intent = await analyze_intent(query)

    # Step 2: Parallel tool calls (biggest win)
    if intent.needs_multiple_tools:
        # Execute all needed tools in parallel
        tasks = []
        if intent.use_vector:
            tasks.append(vector_search.ainvoke(query))
        if intent.use_keyword:
            tasks.append(keyword_search.ainvoke(query))
        if intent.use_code:
            tasks.append(code_search.ainvoke(query))

        results = await asyncio.gather(*tasks)
    else:
        # Single tool call
        results = [await get_tool(intent.primary_tool).ainvoke(query)]

    # Step 3: Merge results (~20ms)
    merged = merge_and_dedup(results)

    elapsed = perf_counter() - start
    print(f"Total retrieval time: {elapsed*1000:.0f}ms")

    return merged

Typical latency breakdown:

Intent Analysis (LLM):     40-80ms
Tool Selection:            5-10ms
Parallel Tool Calls:       100-200ms (depends on backend)
Result Merging:            15-30ms
────────────────────────────────────
Total:                     160-320ms ✓

Real-World Example: Changing Information

One powerful use case is handling rapidly changing information. Instead of constantly re-indexing vector databases, use git + grep:

# User asks: "What changed in the authentication module recently?"

async def handle_changing_info(query: str):
    # No vector search needed - use git history
    recent_changes = await git_search.ainvoke(
        query="authentication",
        max_commits=10
    )

    # Then grep for current state
    current_state = await code_search.ainvoke(
        pattern="class.*Auth",
        file_pattern="*.py"
    )

    return {
        "recent_changes": recent_changes,
        "current_implementation": current_state
    }

This approach works because:

Git history is always current - No re-indexing needed
Grep is fast - Sub-50ms for most codebases
No stale embeddings - Direct source access

Lessons Learned

What Worked Well

ReAct pattern for tool selection: The agent makes good decisions about which tools to use when given clear heuristics.
Parallel execution: The latency gains from concurrent tool calls were essential to meeting the 400ms target.
Hybrid as default: Starting with hybrid search covers most cases well.
Specialized tools for specialized queries: Code search and git search handle queries that vector search simply cannot.

What Surprised Me

Simple prompts work well: I expected to need complex routing logic, but the LLM agent handles intent analysis naturally.
Fewer iterations than expected: Most queries are answered in 1-2 tool calls. Complex multi-step reasoning is rare.
User satisfaction increased significantly: Users noticed the improvement in answer quality, especially for technical queries.

What I Would Do Differently

Add tool result caching: For repeated queries, caching results would help with latency.
Implement fallback strategies: When tools fail or return empty results, have automatic fallbacks.
Add more specialized tools: A documentation-specific search tool could help for certain query types.

Conclusion

Agentic RAG with multiple search types transforms the retrieval problem from “build the perfect index” to “give the agent the right tools.” The ReAct pattern lets the LLM reason about query intent and select appropriate retrieval strategies.

The key architectural decisions:

Single agent with multiple tools (simpler than router + specialized retrievers)
Clear tool descriptions with use-case heuristics
Parallel execution for latency optimization
Hybrid search as the default strategy

For most use cases, this approach achieves comprehensive information gathering within 200-400ms, making it practical for real-time applications while significantly improving answer quality over single-search RAG systems.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!