Skip to content

What Agent Frameworks Actually Give You: Features Beyond the Hype

Problem

I wanted to build an AI agent. I had two choices: use a framework like LangGraph, or write raw Python from scratch. Everyone online had opinions, but no one gave me a concrete list of what frameworks actually provide.

I tried building from scratch first. Three weeks later, I had implemented a buggy REACT loop, a half-working session manager, and zero tracing. That’s when I realized: frameworks don’t just give you convenience—they give you features that take serious engineering to build correctly.

This post shows you the concrete features agent frameworks provide, so you can decide if the learning curve is worth it.

The Feature Checklist

From my research and painful firsthand experience, here’s what frameworks actually give you:

FeatureWhat It DoesBuild From Scratch?
Human-in-the-loop (HITL)Pause for human approval mid-execution20-40 hours
Dependency injection for toolsSwap tools at runtime, test easily10-20 hours
Tracing and observabilityTrack every decision and tool call15-30 hours
REACT loop implementationStandard Reason-Act-Observe pattern20-40 hours
Chat session managementPersist history, manage context windows10-25 hours
Retry and timeout handlingAuto-retry on failures, circuit breakers10-20 hours

Total: 85-175 hours of engineering work you can skip with a framework.

Feature 1: Human-in-the-Loop (HITL)

This was the feature I didn’t know I needed until my agent deleted the wrong files.

What HITL Does

HITL lets your agent pause execution and wait for human input. Use cases:

  • Approve expensive operations before execution
  • Collect feedback mid-workflow
  • Override agent decisions
  • Debug by inspecting state at breakpoints

Building It From Scratch

Here’s what I had to implement:

hitl_from_scratch.py
import json
import pickle
from pathlib import Path
from typing import Any, Dict, Optional
from enum import Enum
class AgentState(Enum):
RUNNING = "running"
WAITING_FOR_INPUT = "waiting_for_input"
COMPLETED = "completed"
FAILED = "failed"
class HITLAgent:
"""Basic HITL implementation from scratch."""
def __init__(self, checkpoint_dir: str = "./checkpoints"):
self.checkpoint_dir = Path(checkpoint_dir)
self.checkpoint_dir.mkdir(exist_ok=True)
self.state = AgentState.RUNNING
self.context: Dict[str, Any] = {}
def save_checkpoint(self, checkpoint_id: str, state: Dict[str, Any]):
"""Persist state for later resumption."""
checkpoint_path = self.checkpoint_dir / f"{checkpoint_id}.pkl"
with open(checkpoint_path, "wb") as f:
pickle.dump(state, f)
def load_checkpoint(self, checkpoint_id: str) -> Optional[Dict[str, Any]]:
"""Load persisted state."""
checkpoint_path = self.checkpoint_dir / f"{checkpoint_id}.pkl"
if checkpoint_path.exists():
with open(checkpoint_path, "rb") as f:
return pickle.load(f)
return None
def request_approval(self, action: str, details: Dict[str, Any]) -> bool:
"""Pause execution and request human approval."""
self.state = AgentState.WAITING_FOR_INPUT
# Save checkpoint so we can resume
self.save_checkpoint("pending_approval", {
"action": action,
"details": details,
"context": self.context
})
# In a real system, this would notify the human
print(f"Action requires approval: {action}")
print(f"Details: {json.dumps(details, indent=2)}")
# Wait for human input (simplified)
response = input("Approve? (y/n): ")
self.state = AgentState.RUNNING
return response.lower() == "y"
def execute_with_approval(self, action: str, details: Dict[str, Any], execute_fn):
"""Execute action only after approval."""
if self.request_approval(action, details):
result = execute_fn()
return {"approved": True, "result": result}
return {"approved": False, "result": None}

This is a simplified version. A production implementation needs:

  • State persistence across server restarts
  • Notification system for pending approvals
  • Approval workflow management
  • Timeout handling for approvals
  • Multi-user approval routing

Framework Approach

With LangGraph, HITL is built-in:

hitl_langgraph.py
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict
class AgentState(TypedDict):
messages: list
pending_action: dict
def human_approval_node(state: AgentState) -> AgentState:
"""Node that pauses for human input."""
# This automatically creates an interrupt point
return state
# Build graph with checkpointing
graph = StateGraph(AgentState)
graph.add_node("approval", human_approval_node)
# Checkpointer enables interrupt/resume
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)
# Interrupt at approval node
result = app.invoke(
{"messages": ["Delete file?"], "pending_action": {"file": "data.csv"}},
config={"configurable": {"thread_id": "session-1"}}
)
# Resume after human input
app.update_state(
config={"configurable": {"thread_id": "session-1"}},
values={"approved": True}
)
result = app.invoke(None, config={"configurable": {"thread_id": "session-1"}})

The framework handles state persistence, interrupt/resume, and session management automatically.

Feature 2: Dependency Injection for Tools

I used to hardcode database connections in my tools. Then I tried to write tests and everything broke.

What Dependency Injection Does

  • Inject databases, APIs, configurations at runtime
  • Swap real implementations for mocks during testing
  • Configure different tools for different environments

Building It From Scratch

di_from_scratch.py
from typing import Callable, Dict, Any, TypeVar, Generic
from dataclasses import dataclass
from functools import wraps
T = TypeVar('T')
@dataclass
class Tool:
name: str
func: Callable
dependencies: Dict[str, type]
class DependencyContainer:
"""Simple dependency injection container."""
def __init__(self):
self._services: Dict[type, Any] = {}
self._factories: Dict[type, Callable] = {}
def register_instance(self, interface: type, instance: Any):
"""Register a specific instance."""
self._services[interface] = instance
def register_factory(self, interface: type, factory: Callable):
"""Register a factory function."""
self._factories[interface] = factory
def resolve(self, interface: type) -> Any:
"""Resolve a dependency."""
if interface in self._services:
return self._services[interface]
if interface in self._factories:
instance = self._factories[interface]()
self._services[interface] = instance
return instance
raise ValueError(f"No registration for {interface}")
def tool_with_dependencies(container: DependencyContainer):
"""Decorator for injecting dependencies into tools."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
# Inject dependencies based on annotations
annotations = func.__annotations__
for param_name, param_type in annotations.items():
if param_name not in kwargs and param_type in container._services:
kwargs[param_name] = container.resolve(param_type)
return func(*args, **kwargs)
return wrapper
return decorator

Framework Approach

LangGraph handles this through configuration:

di_langgraph.py
from langgraph.tools import tool
from typing import Annotated
# Define tool with injectable dependency
@tool
def query_database(
query: str,
db: Annotated[Database, "injected"]
) -> str:
"""Execute a database query."""
return db.execute(query)
# Configure at runtime
config = {
"configurable": {
"db": ProductionDatabase()
}
}
# Or for testing
test_config = {
"configurable": {
"db": MockDatabase()
}
}
# Run with different configurations
result = agent.invoke({"input": "query"}, config=config)

Feature 3: Tracing and Observability

When my agent made a wrong decision, I had no idea why. Tracing fixes this.

What Tracing Does

  • Track every tool call and LLM request
  • Debug agent reasoning chains
  • Monitor token usage and costs
  • Visualize execution flow

Building It From Scratch

tracing_from_scratch.py
import time
import json
from typing import Any, Callable
from functools import wraps
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class Span:
name: str
start_time: float
end_time: float = 0.0
attributes: dict = field(default_factory=dict)
events: list = field(default_factory=list)
class Tracer:
"""Simple tracing implementation."""
def __init__(self):
self.spans: list[Span] = []
self.current_span: Span | None = None
def start_span(self, name: str, attributes: dict = None) -> Span:
span = Span(
name=name,
start_time=time.time(),
attributes=attributes or {}
)
self.spans.append(span)
return span
def end_span(self, span: Span):
span.end_time = time.time()
if self.current_span == span:
self.current_span = None
def add_event(self, span: Span, name: str, attributes: dict = None):
span.events.append({
"name": name,
"timestamp": datetime.now().isoformat(),
"attributes": attributes or {}
})
def export_traces(self) -> str:
"""Export traces for visualization."""
return json.dumps([{
"name": s.name,
"duration_ms": (s.end_time - s.start_time) * 1000,
"attributes": s.attributes,
"events": s.events
} for s in self.spans], indent=2)
def traced(tracer: Tracer):
"""Decorator for tracing function calls."""
def decorator(func: Callable):
@wraps(func)
def wrapper(*args, **kwargs):
span = tracer.start_span(func.__name__, {
"args": str(args)[:200],
"kwargs": str(kwargs)[:200]
})
try:
result = func(*args, **kwargs)
tracer.add_event(span, "completed", {"result": str(result)[:100]})
return result
except Exception as e:
tracer.add_event(span, "error", {"error": str(e)})
raise
finally:
tracer.end_span(span)
return wrapper
return decorator

Framework Approach

LangGraph has built-in LangSmith integration:

tracing_langgraph.py
import os
# Enable tracing (one line)
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-key"
# All agent executions are automatically traced
result = agent.invoke({"input": "search for Python tutorials"})
# View traces at smith.langchain.com

Feature 4: REACT Loop Implementation

This is the core of most agents. It’s also surprisingly complex to get right.

What REACT Does

The REACT pattern (Reason → Act → Observe) cycles through:

  1. Reason: LLM thinks about what to do
  2. Act: LLM selects and executes a tool
  3. Observe: Tool result feeds back to reasoning
  4. Repeat until final answer

Building It From Scratch

Here’s my attempt:

react_from_scratch.py
import json
import re
from typing import List, Callable, Dict, Any
class ReactAgent:
"""Basic REACT loop implementation."""
def __init__(self, llm, tools: List[Callable], max_steps: int = 5):
self.llm = llm
self.tools = {t.__name__: t for t in tools}
self.max_steps = max_steps
self.system_prompt = """You are a helpful assistant.
Use the following format:
Thought: Think about what to do
Action: tool_name[args]
Observation: (result will be provided)
... (repeat Thought/Action/Observation as needed)
Final Answer: Your final response
Available tools: {tool_names}"""
def run(self, query: str) -> str:
context = f"Question: {query}\n"
tool_names = list(self.tools.keys())
for step in range(self.max_steps):
# Get LLM response
prompt = self.system_prompt.format(tool_names=tool_names) + "\n" + context
response = self.llm.generate(prompt)
# Check for final answer
if "Final Answer:" in response:
return response.split("Final Answer:")[-1].strip()
# Parse action
action = self._parse_action(response)
if not action:
context += f"Thought: {response}\nInvalid action format. Try again.\n"
continue
# Execute tool
tool_name = action["tool"]
if tool_name not in self.tools:
context += f"Thought: {response}\nUnknown tool: {tool_name}\n"
continue
try:
result = self.tools[tool_name](**action["args"])
context += f"Thought: {response}\nObservation: {result}\n"
except Exception as e:
context += f"Thought: {response}\nObservation: Error: {str(e)}\n"
return "Could not reach final answer within step limit"
def _parse_action(self, response: str) -> Dict[str, Any] | None:
"""Parse action from LLM response. This is the tricky part."""
# Try multiple patterns
patterns = [
r"Action:\s*(\w+)\[(.+)\]",
r"Action:\s*(\w+)\((.+)\)",
r"Action:\s*(\w+)\s*:\s*(.+)",
]
for pattern in patterns:
match = re.search(pattern, response)
if match:
tool_name = match.group(1)
try:
args = json.loads(match.group(2))
except json.JSONDecodeError:
args = {"input": match.group(2)}
return {"tool": tool_name, "args": args}
return None

The challenges I ran into:

  • LLM output is unpredictable—parsing is fragile
  • Error handling for malformed actions
  • Token limit management across steps
  • Tool argument validation

Framework Approach

LangGraph handles REACT as a pre-built pattern:

react_langgraph.py
from langgraph.prebuilt import create_react_agent
from langgraph.tools import tool
@tool
def search(query: str) -> str:
"""Search for information."""
return f"Results for: {query}"
@tool
def calculate(expression: str) -> str:
"""Calculate a math expression."""
return str(eval(expression))
# Create REACT agent (one function call)
agent = create_react_agent(
model="gpt-4",
tools=[search, calculate]
)
# Run with automatic REACT loop
result = agent.invoke({"messages": [("user", "What is 15% of 200?")]})

The framework handles parsing, tool execution, error recovery, and step limits.

Feature 5: Chat Session Management

My first agent had no memory. Every message started fresh. Users hated it.

What Session Management Does

  • Maintain conversation history across turns
  • Manage context window limits
  • Persist sessions for later resumption
  • Handle multi-user conversations

Building It From Scratch

session_from_scratch.py
from typing import List, Dict
from dataclasses import dataclass, field
from datetime import datetime
import json
@dataclass
class Message:
role: str
content: str
timestamp: datetime = field(default_factory=datetime.now)
@dataclass
class Session:
session_id: str
messages: List[Message] = field(default_factory=list)
metadata: Dict = field(default_factory=dict)
created_at: datetime = field(default_factory=datetime.now)
last_accessed: datetime = field(default_factory=datetime.now)
class SessionManager:
"""Manage chat sessions with context window limits."""
def __init__(self, max_tokens: int = 4000, tokenizer=None):
self.sessions: Dict[str, Session] = {}
self.max_tokens = max_tokens
self.tokenizer = tokenizer or self._default_tokenizer
def _default_tokenizer(self, text: str) -> int:
"""Rough token count approximation."""
return len(text.split()) * 1.3 # Approximation
def create_session(self, session_id: str = None) -> Session:
"""Create a new session."""
import uuid
session_id = session_id or str(uuid.uuid4())
session = Session(session_id=session_id)
self.sessions[session_id] = session
return session
def add_message(self, session_id: str, role: str, content: str):
"""Add a message to session history."""
if session_id not in self.sessions:
self.create_session(session_id)
session = self.sessions[session_id]
session.messages.append(Message(role=role, content=content))
session.last_accessed = datetime.now()
# Trim if over token limit
self._trim_context(session)
def get_context(self, session_id: str) -> List[Dict]:
"""Get conversation context for LLM."""
if session_id not in self.sessions:
return []
session = self.sessions[session_id]
return [{"role": m.role, "content": m.content} for m in session.messages]
def _trim_context(self, session: Session):
"""Remove old messages if over token limit."""
while self._count_tokens(session) > self.max_tokens and len(session.messages) > 1:
# Remove oldest non-system message
for i, msg in enumerate(session.messages):
if msg.role != "system":
session.messages.pop(i)
break
def _count_tokens(self, session: Session) -> int:
"""Count total tokens in session."""
total = 0
for msg in session.messages:
total += self.tokenizer(msg.content)
return int(total)

Framework Approach

LangGraph has built-in checkpointer for sessions:

session_langgraph.py
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph
# Add checkpointer for session persistence
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)
# Each thread_id is a separate session
config = {"configurable": {"thread_id": "user-123"}}
# First turn
app.invoke({"messages": [("user", "Hello!")]}, config=config)
# Second turn (continues same session)
app.invoke({"messages": [("user", "What did I just say?")]}, config=config)
# Agent remembers the conversation

Feature 6: Retry and Timeout Handling

APIs fail. LLMs timeout. Without retries, your agent is fragile.

What Retry Handling Does

  • Automatic retries on transient failures
  • Configurable timeout policies
  • Exponential backoff
  • Circuit breakers for repeated failures

Building It From Scratch

retry_from_scratch.py
import time
import random
from functools import wraps
from typing import Callable, Type, Tuple
def retry(
max_attempts: int = 3,
exceptions: Tuple[Type[Exception], ...] = (Exception,),
base_delay: float = 1.0,
max_delay: float = 60.0,
exponential_base: float = 2.0
):
"""Retry decorator with exponential backoff."""
def decorator(func: Callable):
@wraps(func)
def wrapper(*args, **kwargs):
last_exception = None
for attempt in range(max_attempts):
try:
return func(*args, **kwargs)
except exceptions as e:
last_exception = e
if attempt == max_attempts - 1:
raise
# Calculate delay with exponential backoff
delay = min(
base_delay * (exponential_base ** attempt),
max_delay
)
# Add jitter
delay = delay * (0.5 + random.random())
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s")
time.sleep(delay)
raise last_exception
return wrapper
return decorator
def timeout(seconds: float):
"""Timeout decorator using threading."""
import threading
def decorator(func: Callable):
@wraps(func)
def wrapper(*args, **kwargs):
result = [None]
exception = [None]
def worker():
try:
result[0] = func(*args, **kwargs)
except Exception as e:
exception[0] = e
thread = threading.Thread(target=worker)
thread.start()
thread.join(timeout=seconds)
if thread.is_alive():
raise TimeoutError(f"Function {func.__name__} timed out after {seconds}s")
if exception[0]:
raise exception[0]
return result[0]
return wrapper
return decorator
# Usage
@retry(max_attempts=3, exceptions=(ConnectionError, TimeoutError))
@timeout(seconds=30)
def call_llm(prompt: str) -> str:
"""Call LLM with retry and timeout."""
# Your LLM call here
pass

Framework Approach

LangGraph has built-in retry configuration:

retry_langgraph.py
from langgraph.graph import StateGraph
# Configure retry at node level
graph.add_node(
"llm_call",
llm_node,
retry={
"max_attempts": 3,
"initial_interval": 1.0,
"backoff_factor": 2.0,
"retry_on": [ConnectionError, TimeoutError]
}
)
# Or globally for the graph
app = graph.compile(
retry={
"max_attempts": 3,
"initial_interval": 1.0
}
)

When Should You Use a Framework?

Based on my experience, use a framework when you need:

  1. 3 or more features from the checklist — The engineering time saved outweighs the learning curve.

  2. Production reliability — Frameworks handle edge cases you’ll miss.

  3. Team collaboration — Frameworks provide common patterns and abstractions.

  4. Observability — Built-in tracing is worth it alone for debugging.

Build from scratch when:

  1. Learning — You’ll understand agents better by building from zero.

  2. Simple use case — A single LLM call doesn’t need a framework.

  3. Maximum control — When framework constraints block your specific needs.

What Reddit Says

I found a helpful discussion on r/LangChain that confirmed my experience:

“Using a framework saves you time, if you need: HITL, Dependency injection to tools, Tracing, REACT Loop, Chat session management.” — Jorgestar29

“The principle behind an Agent is simple, but the engineering details are far more complex than you might imagine.” — MuninnW

“There is so much additional planning you need to do if you want to implement something from scratch. Frameworks remove that extra hassle, just get the dependency right and off you go.” — BarracudaExpensive03

Summary

In this post, I showed you the concrete features agent frameworks provide: human-in-the-loop workflows, dependency injection, tracing, REACT loops, session management, and retry handling.

Building these from scratch takes 85-175 hours of engineering. Frameworks give you all of it with a learning curve of 10-20 hours.

My recommendation: if you need 3 or more features from the checklist, use a framework. The time you save goes into your actual agent logic instead of boilerplate infrastructure.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments