How to Build Agentic RAG with Multiple Search Types?
I spent weeks building a RAG system that only used vector search. It worked fine for semantic queries like “find similar documents” but completely failed when users asked for exact code snippets, specific error messages, or time-sensitive information.
Then I discovered agentic RAG with multiple search types.
The Problem with Single-Search RAG
I had a standard RAG pipeline:
User Query → Embedding → Vector Search → Top K Results → LLM ResponseThis approach has critical blind spots:
-
Exact matches: Vector search finds semantically similar content, not exact strings. When a user searches for a specific error like
TypeError: 'NoneType' object is not iterable, vector search might return unrelated Python error discussions. -
Keyword-heavy queries: Queries like “config.yaml authentication settings” are better served by keyword search than semantic similarity.
-
Changing information: Vector indices need re-indexing when content changes. For rapidly evolving codebases, this becomes a maintenance nightmare.
-
Structural queries: Questions like “show me all functions that call
process_payment” require code analysis, not semantic search.
I needed multiple search types, but how could I intelligently select the right one for each query?
Enter Agentic RAG
Agentic RAG uses a single LLM agent with ReAct (Reasoning + Acting) reasoning to intelligently select and combine different retrieval tools based on the query type.
The key insight from a Reddit discussion on “Vectorless RAG” resonated with me:
“Building multiple types of search and letting the LLM choose to call any combination of these based on the type of query”
Instead of hardcoding search logic, the agent becomes a semantic layer that understands query intent:
“Agents can reason about how to turn prompts into keyword searches and reformulate queries, they become your semantic layer”
Architecture Overview
Here’s how I structured the system:
┌─────────────────────────────────────────────────────────────────┐│ User Query │└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ ReAct Agent (LLM) ││ ││ 1. Analyze query intent ││ 2. Select appropriate tool(s) ││ 3. Execute tool calls (parallel when possible) ││ 4. Reason about results ││ 5. Iterate or synthesize final answer │└─────────────────────────────────────────────────────────────────┘ │ ┌───────────────────────┼───────────────────────┐ │ │ │ ▼ ▼ ▼┌───────────────┐ ┌───────────────┐ ┌───────────────┐│ Vector Search │ │ Keyword Search│ │ Git + Grep ││ (Semantic) │ │ (BM25) │ │ (Code) │└───────────────┘ └───────────────┘ └───────────────┘ │ │ │ └───────────────────────┼───────────────────────┘ ▼ ┌───────────────────────┐ │ Combined Results │ │ (Ranked & Deduped) │ └───────────────────────┘Defining the Tools
I implemented each search type as a separate tool using LangChain’s tool decorator:
from langchain.tools import toolfrom typing import List, Dict
@tooldef vector_search(query: str, k: int = 5) -> List[Dict]: """Search documents using semantic similarity.
Best for: conceptual questions, similar content discovery, natural language queries where exact wording doesn't matter.
Args: query: Natural language search query k: Number of results to return
Returns: List of documents with similarity scores """ # Your vector search implementation # e.g., Pinecone, Weaviate, or local embeddings pass
@tooldef keyword_search(query: str, k: int = 5) -> List[Dict]: """Search documents using BM25 keyword matching.
Best for: exact term matching, technical identifiers, error messages, specific function names.
Args: query: Keywords to search for k: Number of results to return
Returns: List of documents with BM25 scores """ # Your BM25 implementation # e.g., Whoosh, Elasticsearch, or custom BM25 pass
@tooldef hybrid_search(query: str, k: int = 5, alpha: float = 0.5) -> List[Dict]: """Combine vector and keyword search results.
Best for: general-purpose queries where you want both semantic understanding and exact term matching.
Args: query: Search query k: Number of results to return alpha: Weight for vector vs keyword (0.5 = equal weight)
Returns: Reranked combined results """ # Your hybrid search implementation pass
@tooldef code_search(pattern: str, file_pattern: str = "*.py") -> List[Dict]: """Search codebase using grep for exact pattern matches.
Best for: finding specific code patterns, function calls, error messages in code, structural queries.
Args: pattern: Regex pattern to search for file_pattern: Glob pattern to filter files
Returns: List of matches with file, line number, and context """ # Your grep/rg implementation pass
@tooldef git_search(query: str, max_commits: int = 20) -> List[Dict]: """Search git history for changes and commit messages.
Best for: understanding code evolution, finding when features were added/changed, time-sensitive information.
Args: query: Search query for commit messages max_commits: Maximum commits to search
Returns: List of relevant commits with metadata """ # Your git log search implementation passThe Agent Prompt Template
The ReAct pattern requires a clear prompt that instructs the agent on how to reason and act:
REACT_PROMPT = """You are an intelligent retrieval agent with access to multiple search tools.
Your goal is to gather comprehensive information to answer user queries.
## Available Tools
1. **vector_search**: Semantic similarity search. Use for conceptual questions, finding similar content, natural language queries.
2. **keyword_search**: BM25 exact keyword matching. Use for technical terms, error messages, specific identifiers.
3. **hybrid_search**: Combination of vector + keyword. Use as default for general queries.
4. **code_search**: Grep-based pattern search. Use for finding specific code, function calls, structural patterns.
5. **git_search**: Search git history. Use for understanding code evolution, finding recent changes, time-sensitive information.
## Strategy
1. **Analyze** the query to understand intent2. **Select** the most appropriate tool(s)3. **Execute** tool calls (use parallel calls when possible)4. **Evaluate** results - do you have enough information?5. **Iterate** if needed, or **synthesize** the final answer
## Decision Heuristics
- Error messages → keyword_search first- "How do I..." questions → hybrid_search- "Where is..." code location → code_search- "When was..." changes → git_search- "What is similar to..." → vector_search- Complex queries → combine multiple tools
Remember: You have a latency budget of 200-400ms total. Parallel tool callshelp stay within this budget.
Begin!
Question: {input}Thought: {agent_scratchpad}"""Tool Selection Strategies
Through experimentation, I identified four effective strategies:
1. Default to Hybrid
For the majority of queries (probably 70-80%), hybrid search provides the best balance:
# Default behavior - let agent decide, but hybrid is often bestDEFAULT_STRATEGY = { "general_queries": "hybrid_search", "reasoning": "Covers both semantic and exact matching"}The agent learns this preference from the prompt examples.
2. Intent-Based Routing
The agent analyzes query intent and routes accordingly:
Query: "TypeError: 'NoneType' object is not iterable"Decision: keyword_search (exact error message)Reason: Need exact string match, not semantic similarity
Query: "How do I implement authentication in FastAPI?"Decision: hybrid_search (general question)Reason: Want both conceptual explanation and code examples
Query: "Where is the process_payment function called?"Decision: code_search (structural query)Reason: Need to find specific function calls in codebase
Query: "When was the payment system redesigned?"Decision: git_search (historical query)Reason: Need git history, not current code
Query: "Find documentation similar to OAuth2 flow"Decision: vector_search (similarity query)Reason: Semantic similarity is the goal3. Iterative Refinement
The agent can call multiple tools sequentially when initial results are insufficient:
# Example iteration flowiterations = [ # Iteration 1: Try hybrid search { "thought": "User asks about error handling. Start with hybrid search.", "action": "hybrid_search", "query": "error handling patterns", "result": "Found general documentation" }, # Iteration 2: Need more specific code examples { "thought": "Results are too general. User might want code examples.", "action": "code_search", "query": "except.*Error", "result": "Found specific try-except blocks" }, # Iteration 3: Synthesize answer { "thought": "Have both conceptual and code results. Can now answer.", "action": "synthesize_answer", "result": "Comprehensive answer with examples" }]4. Parallel Tool Calls
The biggest latency win comes from calling tools in parallel:
# Parallel execution reduces latency significantlyasync def parallel_search(query: str): # Execute multiple searches simultaneously results = await asyncio.gather( vector_search.ainvoke(query), keyword_search.ainvoke(query), code_search.ainvoke(query) )
# Merge and deduplicate results return merge_results(results)Performance Optimization
My target was 200-400ms total latency. Here’s how I achieved it:
import asynciofrom time import perf_counter
async def optimized_retrieval(query: str): start = perf_counter()
# Step 1: Analyze intent (LLM call, ~50ms) intent = await analyze_intent(query)
# Step 2: Parallel tool calls (biggest win) if intent.needs_multiple_tools: # Execute all needed tools in parallel tasks = [] if intent.use_vector: tasks.append(vector_search.ainvoke(query)) if intent.use_keyword: tasks.append(keyword_search.ainvoke(query)) if intent.use_code: tasks.append(code_search.ainvoke(query))
results = await asyncio.gather(*tasks) else: # Single tool call results = [await get_tool(intent.primary_tool).ainvoke(query)]
# Step 3: Merge results (~20ms) merged = merge_and_dedup(results)
elapsed = perf_counter() - start print(f"Total retrieval time: {elapsed*1000:.0f}ms")
return mergedTypical latency breakdown:
Intent Analysis (LLM): 40-80msTool Selection: 5-10msParallel Tool Calls: 100-200ms (depends on backend)Result Merging: 15-30ms────────────────────────────────────Total: 160-320ms ✓Real-World Example: Changing Information
One powerful use case is handling rapidly changing information. Instead of constantly re-indexing vector databases, use git + grep:
# User asks: "What changed in the authentication module recently?"
async def handle_changing_info(query: str): # No vector search needed - use git history recent_changes = await git_search.ainvoke( query="authentication", max_commits=10 )
# Then grep for current state current_state = await code_search.ainvoke( pattern="class.*Auth", file_pattern="*.py" )
return { "recent_changes": recent_changes, "current_implementation": current_state }This approach works because:
- Git history is always current - No re-indexing needed
- Grep is fast - Sub-50ms for most codebases
- No stale embeddings - Direct source access
Lessons Learned
What Worked Well
-
ReAct pattern for tool selection: The agent makes good decisions about which tools to use when given clear heuristics.
-
Parallel execution: The latency gains from concurrent tool calls were essential to meeting the 400ms target.
-
Hybrid as default: Starting with hybrid search covers most cases well.
-
Specialized tools for specialized queries: Code search and git search handle queries that vector search simply cannot.
What Surprised Me
-
Simple prompts work well: I expected to need complex routing logic, but the LLM agent handles intent analysis naturally.
-
Fewer iterations than expected: Most queries are answered in 1-2 tool calls. Complex multi-step reasoning is rare.
-
User satisfaction increased significantly: Users noticed the improvement in answer quality, especially for technical queries.
What I Would Do Differently
-
Add tool result caching: For repeated queries, caching results would help with latency.
-
Implement fallback strategies: When tools fail or return empty results, have automatic fallbacks.
-
Add more specialized tools: A documentation-specific search tool could help for certain query types.
Conclusion
Agentic RAG with multiple search types transforms the retrieval problem from “build the perfect index” to “give the agent the right tools.” The ReAct pattern lets the LLM reason about query intent and select appropriate retrieval strategies.
The key architectural decisions:
- Single agent with multiple tools (simpler than router + specialized retrievers)
- Clear tool descriptions with use-case heuristics
- Parallel execution for latency optimization
- Hybrid search as the default strategy
For most use cases, this approach achieves comprehensive information gathering within 200-400ms, making it practical for real-time applications while significantly improving answer quality over single-search RAG systems.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments