How to Build a LangGraph Research Agent That Saves 5-10 Hours Per Week
I was spending 10-15 hours every week on deep-dive research for technical blog posts. Search arXiv, read papers, find blog posts, synthesize everything, fact-check, format citations… rinse and repeat. Then I discovered that a well-designed LangGraph agent with a reflection node could cut that time in half.
Here’s how I built it—and why the reflection pattern is the critical piece most people skip.
The Problem: Manual Research is a Time Sink
Last month I was researching “state management patterns in AI agent workflows.” I spent three hours:
- Searching arXiv for relevant papers
- Finding technical blog posts that weren’t behind paywalls
- Reading and taking notes
- Synthesizing a briefing document
- Fact-checking my own claims (and finding two hallucinations)
The worst part? After all that work, I still missed a key paper that would have changed my conclusions.
I needed automation. But not just “search and summarize”—I needed something that could catch its own mistakes.
First Attempt: Simple Chain (Failed)
I started with a basic LangChain chain:
# This is what NOT to dodef simple_research(query): results = search_tool(query) summary = llm.invoke(f"Summarize: {results}") return summaryThe results were disappointing:
- Summaries were superficial
- No citation tracking
- Hallucinations slipped through constantly
- No way to iterate or improve
I realized I needed a graph with state and loops, not a linear chain.
The Solution: LangGraph with Reflection
The breakthrough came from a Reddit thread where someone mentioned a “reflection node”—a secondary agent that critiques the first draft before output.
Here’s the architecture that actually works:
┌─────────────────────────────────────────────────────────────┐│ LangGraph Research Agent │├─────────────────────────────────────────────────────────────┤│ ││ [Query Input] ││ │ ││ ▼ ││ ┌─────────────────┐ ││ │ Topic Decomposer│ → Break into sub-questions ││ └────────┬────────┘ ││ │ ││ ▼ ││ ┌─────────────────┐ ││ │ Multi-Source │ → arXiv + blogs + dedupe ││ │ Search │ ││ └────────┬────────┘ ││ │ ││ ▼ ││ ┌─────────────────┐ ││ │ Synthesis │ → Markdown + citations ││ └────────┬────────┘ ││ │ ││ ▼ ││ ┌─────────────────┐ ┌─────────────────┐ ││ │ Reflection │ ───► │ Final Output │ ││ │ Node │ └─────────────────┘ ││ └────────┬────────┘ ││ │ (if issues found) ││ │ ││ └──────────► [back to Synthesis] ││ │└─────────────────────────────────────────────────────────────┘The reflection node is what separates useful agents from frustrating ones. Let me show you the implementation.
Implementation: The Core Graph
First, define the state that flows through the graph:
from typing import TypedDict, Listfrom langgraph.graph import StateGraph, END
class ResearchState(TypedDict): query: str sub_questions: List[str] search_results: List[dict] draft: str critique: str final_output: str iterations: intNode 1: Topic Decomposition
The first node breaks down a broad query into searchable pieces:
def decompose_topic(state: ResearchState) -> ResearchState: prompt = f"""Break down this research query into 3-5 specific sub-questions.
Query: {state['query']}
Return ONLY a JSON array of strings, like:["sub-question 1", "sub-question 2", "sub-question 3"]"""
response = llm.invoke([HumanMessage(content=prompt)]) state['sub_questions'] = parse_json_list(response.content) return stateWhy decompose? A single query like “state management in AI agents” is too broad. Breaking it into:
- “What are common state management patterns in LLM applications?”
- “How does LangGraph handle state persistence?”
- “What are the trade-offs between different state backends?”
…gives you better search results.
Node 2: Multi-Source Search
This node searches multiple sources and deduplicates:
def search_sources(state: ResearchState) -> ResearchState: all_results = []
for question in state['sub_questions']: # Search arXiv for academic papers arxiv_hits = arxiv_tool.search(question, max_results=5) all_results.extend(arxiv_hits)
# Search technical blogs blog_hits = blog_search_tool.search(question, max_results=5) all_results.extend(blog_hits)
# Rate limiting to avoid API blocks time.sleep(1)
# Deduplicate by URL/title similarity state['search_results'] = deduplicate_results(all_results) return stateMistake I made: I didn’t add rate limiting initially. arXiv blocked my IP within 10 minutes. Always add time.sleep() between API calls.
Node 3: Synthesis
Now combine the results into a coherent briefing:
def synthesize(state: ResearchState) -> ResearchState: context = format_search_results(state['search_results'])
prompt = f"""Create a research briefing based on these sources.
Original Query: {state['query']}
Sources:{context}
Requirements:- Structure with clear headings- Include inline citations like [1], [2]- List all sources at the end- Be factual, avoid speculation"""
state['draft'] = llm.invoke([HumanMessage(content=prompt)]).content return stateNode 4: Reflection (The Critical Piece)
This is where the magic happens. A separate LLM call reviews the draft:
def reflect(state: ResearchState) -> ResearchState: REFLECTION_PROMPT = """You are a research quality reviewer. Analyze this draft:
{draft}
Original Query: {query}
Evaluate on these criteria:1. **Coverage**: Are all aspects of the original query addressed?2. **Accuracy**: Are claims supported by citations?3. **Hallucination Check**: Identify any statements not backed by sources.4. **Gaps**: What important information is missing?
Respond in this EXACT format:ISSUES:- [list each problem found, or write "None" if none]
VERDICT: [NEEDS_REVISION or APPROVED]"""
critique = llm.invoke([ HumanMessage(content=REFLECTION_PROMPT.format( draft=state['draft'], query=state['query'] )) ]).content
state['critique'] = critique state['iterations'] += 1 return stateRouting Logic
The graph needs conditional routing based on the critique:
def should_continue(state: ResearchState) -> str: if 'APPROVED' in state['critique']: return 'finalize'
if state['iterations'] >= 3: # Max iterations reached, output what we have return 'finalize'
return 'revise'Building the Graph
Now wire it all together:
workflow = StateGraph(ResearchState)
# Add nodesworkflow.add_node('decompose', decompose_topic)workflow.add_node('search', search_sources)workflow.add_node('synthesize', synthesize)workflow.add_node('reflect', reflect)workflow.add_node('finalize', lambda s: {**s, 'final_output': s['draft']})
# Define flowworkflow.set_entry_point('decompose')workflow.add_edge('decompose', 'search')workflow.add_edge('search', 'synthesize')workflow.add_edge('synthesize', 'reflect')
# Conditional routing after reflectionworkflow.add_conditional_edges( 'reflect', should_continue, { 'revise': 'synthesize', # Go back and improve 'finalize': 'finalize' # Good enough, output })
workflow.add_edge('finalize', END)
# Compile and runapp = workflow.compile()What the Reflection Node Actually Catches
In practice, the reflection node catches things I would have missed:
| Issue Type | Example Caught |
|---|---|
| Missing context | ”The draft doesn’t address error handling in state persistence” |
| Hallucination | ”The claim about ‘most applications use Redis’ has no citation” |
| Citation error | ”Source [3] is referenced but not in the source list” |
| Logical gap | ”You explain the ‘what’ but not the ‘why’ for pattern X” |
Without reflection, my first version produced briefings that looked convincing but had subtle errors. With reflection, the quality improved dramatically.
Results: Time Saved
After two months of using this agent:
| Metric | Before | After |
|---|---|---|
| Research time per topic | 3-4 hours | 30-45 minutes |
| Hallucinations caught | 0 (I missed them) | ~3 per briefing |
| Citation accuracy | ~70% | ~95% |
| Weekly time saved | — | 5-10 hours |
The setup took about 4 hours, and I’ve saved that investment many times over.
Common Mistakes to Avoid
-
Skipping reflection: This is the biggest mistake. Without it, you’re just automating hallucination production.
-
Single source search: arXiv alone isn’t enough. You need blogs, documentation, and sometimes Reddit/HN discussions.
-
No iteration limit: Without
max_iterations=3, the agent can loop forever when the reflection keeps finding issues. -
Forgetting rate limits: APIs will block you. Add delays between calls.
-
Overly broad queries: “Tell me about AI” won’t work well. The decomposition helps, but start with focused queries.
When This Doesn’t Work
This approach struggles with:
- Very recent topics: arXiv and blogs may not have coverage yet
- Niche domains: If there are only 3 papers on a topic, the synthesis will be thin
- Non-English sources: My current setup only handles English
- Behind paywalls: Can’t access most academic journals without subscriptions
Getting Started
The fastest way to try this:
- Clone LangGraph’s starter template
- Implement the reflection node first (it’s the most important piece)
- Add one search source at a time (start with arXiv)
- Test with queries you’ve already researched manually—compare the outputs
The key insight: a dumb agent with reflection beats a smart agent without it. The second LLM call costs pennies but catches errors that would take hours to find manually.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments