Skip to content

How to Build a LangGraph Research Agent That Saves 5-10 Hours Per Week

I was spending 10-15 hours every week on deep-dive research for technical blog posts. Search arXiv, read papers, find blog posts, synthesize everything, fact-check, format citations… rinse and repeat. Then I discovered that a well-designed LangGraph agent with a reflection node could cut that time in half.

Here’s how I built it—and why the reflection pattern is the critical piece most people skip.

The Problem: Manual Research is a Time Sink

Last month I was researching “state management patterns in AI agent workflows.” I spent three hours:

  1. Searching arXiv for relevant papers
  2. Finding technical blog posts that weren’t behind paywalls
  3. Reading and taking notes
  4. Synthesizing a briefing document
  5. Fact-checking my own claims (and finding two hallucinations)

The worst part? After all that work, I still missed a key paper that would have changed my conclusions.

I needed automation. But not just “search and summarize”—I needed something that could catch its own mistakes.

First Attempt: Simple Chain (Failed)

I started with a basic LangChain chain:

# This is what NOT to do
def simple_research(query):
results = search_tool(query)
summary = llm.invoke(f"Summarize: {results}")
return summary

The results were disappointing:

  • Summaries were superficial
  • No citation tracking
  • Hallucinations slipped through constantly
  • No way to iterate or improve

I realized I needed a graph with state and loops, not a linear chain.

The Solution: LangGraph with Reflection

The breakthrough came from a Reddit thread where someone mentioned a “reflection node”—a secondary agent that critiques the first draft before output.

Here’s the architecture that actually works:

┌─────────────────────────────────────────────────────────────┐
│ LangGraph Research Agent │
├─────────────────────────────────────────────────────────────┤
│ │
│ [Query Input] │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Topic Decomposer│ → Break into sub-questions │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Multi-Source │ → arXiv + blogs + dedupe │
│ │ Search │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Synthesis │ → Markdown + citations │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Reflection │ ───► │ Final Output │ │
│ │ Node │ └─────────────────┘ │
│ └────────┬────────┘ │
│ │ (if issues found) │
│ │ │
│ └──────────► [back to Synthesis] │
│ │
└─────────────────────────────────────────────────────────────┘

The reflection node is what separates useful agents from frustrating ones. Let me show you the implementation.

Implementation: The Core Graph

First, define the state that flows through the graph:

from typing import TypedDict, List
from langgraph.graph import StateGraph, END
class ResearchState(TypedDict):
query: str
sub_questions: List[str]
search_results: List[dict]
draft: str
critique: str
final_output: str
iterations: int

Node 1: Topic Decomposition

The first node breaks down a broad query into searchable pieces:

def decompose_topic(state: ResearchState) -> ResearchState:
prompt = f"""Break down this research query into 3-5 specific sub-questions.
Query: {state['query']}
Return ONLY a JSON array of strings, like:
["sub-question 1", "sub-question 2", "sub-question 3"]"""
response = llm.invoke([HumanMessage(content=prompt)])
state['sub_questions'] = parse_json_list(response.content)
return state

Why decompose? A single query like “state management in AI agents” is too broad. Breaking it into:

  • “What are common state management patterns in LLM applications?”
  • “How does LangGraph handle state persistence?”
  • “What are the trade-offs between different state backends?”

…gives you better search results.

This node searches multiple sources and deduplicates:

def search_sources(state: ResearchState) -> ResearchState:
all_results = []
for question in state['sub_questions']:
# Search arXiv for academic papers
arxiv_hits = arxiv_tool.search(question, max_results=5)
all_results.extend(arxiv_hits)
# Search technical blogs
blog_hits = blog_search_tool.search(question, max_results=5)
all_results.extend(blog_hits)
# Rate limiting to avoid API blocks
time.sleep(1)
# Deduplicate by URL/title similarity
state['search_results'] = deduplicate_results(all_results)
return state

Mistake I made: I didn’t add rate limiting initially. arXiv blocked my IP within 10 minutes. Always add time.sleep() between API calls.

Node 3: Synthesis

Now combine the results into a coherent briefing:

def synthesize(state: ResearchState) -> ResearchState:
context = format_search_results(state['search_results'])
prompt = f"""Create a research briefing based on these sources.
Original Query: {state['query']}
Sources:
{context}
Requirements:
- Structure with clear headings
- Include inline citations like [1], [2]
- List all sources at the end
- Be factual, avoid speculation"""
state['draft'] = llm.invoke([HumanMessage(content=prompt)]).content
return state

Node 4: Reflection (The Critical Piece)

This is where the magic happens. A separate LLM call reviews the draft:

def reflect(state: ResearchState) -> ResearchState:
REFLECTION_PROMPT = """You are a research quality reviewer. Analyze this draft:
{draft}
Original Query: {query}
Evaluate on these criteria:
1. **Coverage**: Are all aspects of the original query addressed?
2. **Accuracy**: Are claims supported by citations?
3. **Hallucination Check**: Identify any statements not backed by sources.
4. **Gaps**: What important information is missing?
Respond in this EXACT format:
ISSUES:
- [list each problem found, or write "None" if none]
VERDICT: [NEEDS_REVISION or APPROVED]"""
critique = llm.invoke([
HumanMessage(content=REFLECTION_PROMPT.format(
draft=state['draft'],
query=state['query']
))
]).content
state['critique'] = critique
state['iterations'] += 1
return state

Routing Logic

The graph needs conditional routing based on the critique:

def should_continue(state: ResearchState) -> str:
if 'APPROVED' in state['critique']:
return 'finalize'
if state['iterations'] >= 3:
# Max iterations reached, output what we have
return 'finalize'
return 'revise'

Building the Graph

Now wire it all together:

workflow = StateGraph(ResearchState)
# Add nodes
workflow.add_node('decompose', decompose_topic)
workflow.add_node('search', search_sources)
workflow.add_node('synthesize', synthesize)
workflow.add_node('reflect', reflect)
workflow.add_node('finalize', lambda s: {**s, 'final_output': s['draft']})
# Define flow
workflow.set_entry_point('decompose')
workflow.add_edge('decompose', 'search')
workflow.add_edge('search', 'synthesize')
workflow.add_edge('synthesize', 'reflect')
# Conditional routing after reflection
workflow.add_conditional_edges(
'reflect',
should_continue,
{
'revise': 'synthesize', # Go back and improve
'finalize': 'finalize' # Good enough, output
}
)
workflow.add_edge('finalize', END)
# Compile and run
app = workflow.compile()

What the Reflection Node Actually Catches

In practice, the reflection node catches things I would have missed:

Issue TypeExample Caught
Missing context”The draft doesn’t address error handling in state persistence”
Hallucination”The claim about ‘most applications use Redis’ has no citation”
Citation error”Source [3] is referenced but not in the source list”
Logical gap”You explain the ‘what’ but not the ‘why’ for pattern X”

Without reflection, my first version produced briefings that looked convincing but had subtle errors. With reflection, the quality improved dramatically.

Results: Time Saved

After two months of using this agent:

MetricBeforeAfter
Research time per topic3-4 hours30-45 minutes
Hallucinations caught0 (I missed them)~3 per briefing
Citation accuracy~70%~95%
Weekly time saved5-10 hours

The setup took about 4 hours, and I’ve saved that investment many times over.

Common Mistakes to Avoid

  1. Skipping reflection: This is the biggest mistake. Without it, you’re just automating hallucination production.

  2. Single source search: arXiv alone isn’t enough. You need blogs, documentation, and sometimes Reddit/HN discussions.

  3. No iteration limit: Without max_iterations=3, the agent can loop forever when the reflection keeps finding issues.

  4. Forgetting rate limits: APIs will block you. Add delays between calls.

  5. Overly broad queries: “Tell me about AI” won’t work well. The decomposition helps, but start with focused queries.

When This Doesn’t Work

This approach struggles with:

  • Very recent topics: arXiv and blogs may not have coverage yet
  • Niche domains: If there are only 3 papers on a topic, the synthesis will be thin
  • Non-English sources: My current setup only handles English
  • Behind paywalls: Can’t access most academic journals without subscriptions

Getting Started

The fastest way to try this:

  1. Clone LangGraph’s starter template
  2. Implement the reflection node first (it’s the most important piece)
  3. Add one search source at a time (start with arXiv)
  4. Test with queries you’ve already researched manually—compare the outputs

The key insight: a dumb agent with reflection beats a smart agent without it. The second LLM call costs pennies but catches errors that would take hours to find manually.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments