What Agent Frameworks Actually Give You: Features Beyond the Hype
Problem
I wanted to build an AI agent. I had two choices: use a framework like LangGraph, or write raw Python from scratch. Everyone online had opinions, but no one gave me a concrete list of what frameworks actually provide.
I tried building from scratch first. Three weeks later, I had implemented a buggy REACT loop, a half-working session manager, and zero tracing. That’s when I realized: frameworks don’t just give you convenience—they give you features that take serious engineering to build correctly.
This post shows you the concrete features agent frameworks provide, so you can decide if the learning curve is worth it.
The Feature Checklist
From my research and painful firsthand experience, here’s what frameworks actually give you:
| Feature | What It Does | Build From Scratch? |
|---|---|---|
| Human-in-the-loop (HITL) | Pause for human approval mid-execution | 20-40 hours |
| Dependency injection for tools | Swap tools at runtime, test easily | 10-20 hours |
| Tracing and observability | Track every decision and tool call | 15-30 hours |
| REACT loop implementation | Standard Reason-Act-Observe pattern | 20-40 hours |
| Chat session management | Persist history, manage context windows | 10-25 hours |
| Retry and timeout handling | Auto-retry on failures, circuit breakers | 10-20 hours |
Total: 85-175 hours of engineering work you can skip with a framework.
Feature 1: Human-in-the-Loop (HITL)
This was the feature I didn’t know I needed until my agent deleted the wrong files.
What HITL Does
HITL lets your agent pause execution and wait for human input. Use cases:
- Approve expensive operations before execution
- Collect feedback mid-workflow
- Override agent decisions
- Debug by inspecting state at breakpoints
Building It From Scratch
Here’s what I had to implement:
import jsonimport picklefrom pathlib import Pathfrom typing import Any, Dict, Optionalfrom enum import Enum
class AgentState(Enum): RUNNING = "running" WAITING_FOR_INPUT = "waiting_for_input" COMPLETED = "completed" FAILED = "failed"
class HITLAgent: """Basic HITL implementation from scratch."""
def __init__(self, checkpoint_dir: str = "./checkpoints"): self.checkpoint_dir = Path(checkpoint_dir) self.checkpoint_dir.mkdir(exist_ok=True) self.state = AgentState.RUNNING self.context: Dict[str, Any] = {}
def save_checkpoint(self, checkpoint_id: str, state: Dict[str, Any]): """Persist state for later resumption.""" checkpoint_path = self.checkpoint_dir / f"{checkpoint_id}.pkl" with open(checkpoint_path, "wb") as f: pickle.dump(state, f)
def load_checkpoint(self, checkpoint_id: str) -> Optional[Dict[str, Any]]: """Load persisted state.""" checkpoint_path = self.checkpoint_dir / f"{checkpoint_id}.pkl" if checkpoint_path.exists(): with open(checkpoint_path, "rb") as f: return pickle.load(f) return None
def request_approval(self, action: str, details: Dict[str, Any]) -> bool: """Pause execution and request human approval.""" self.state = AgentState.WAITING_FOR_INPUT
# Save checkpoint so we can resume self.save_checkpoint("pending_approval", { "action": action, "details": details, "context": self.context })
# In a real system, this would notify the human print(f"Action requires approval: {action}") print(f"Details: {json.dumps(details, indent=2)}")
# Wait for human input (simplified) response = input("Approve? (y/n): ") self.state = AgentState.RUNNING return response.lower() == "y"
def execute_with_approval(self, action: str, details: Dict[str, Any], execute_fn): """Execute action only after approval.""" if self.request_approval(action, details): result = execute_fn() return {"approved": True, "result": result} return {"approved": False, "result": None}This is a simplified version. A production implementation needs:
- State persistence across server restarts
- Notification system for pending approvals
- Approval workflow management
- Timeout handling for approvals
- Multi-user approval routing
Framework Approach
With LangGraph, HITL is built-in:
from langgraph.graph import StateGraph, ENDfrom langgraph.checkpoint.memory import MemorySaverfrom typing import TypedDict
class AgentState(TypedDict): messages: list pending_action: dict
def human_approval_node(state: AgentState) -> AgentState: """Node that pauses for human input.""" # This automatically creates an interrupt point return state
# Build graph with checkpointinggraph = StateGraph(AgentState)graph.add_node("approval", human_approval_node)
# Checkpointer enables interrupt/resumecheckpointer = MemorySaver()app = graph.compile(checkpointer=checkpointer)
# Interrupt at approval noderesult = app.invoke( {"messages": ["Delete file?"], "pending_action": {"file": "data.csv"}}, config={"configurable": {"thread_id": "session-1"}})
# Resume after human inputapp.update_state( config={"configurable": {"thread_id": "session-1"}}, values={"approved": True})result = app.invoke(None, config={"configurable": {"thread_id": "session-1"}})The framework handles state persistence, interrupt/resume, and session management automatically.
Feature 2: Dependency Injection for Tools
I used to hardcode database connections in my tools. Then I tried to write tests and everything broke.
What Dependency Injection Does
- Inject databases, APIs, configurations at runtime
- Swap real implementations for mocks during testing
- Configure different tools for different environments
Building It From Scratch
from typing import Callable, Dict, Any, TypeVar, Genericfrom dataclasses import dataclassfrom functools import wraps
T = TypeVar('T')
@dataclassclass Tool: name: str func: Callable dependencies: Dict[str, type]
class DependencyContainer: """Simple dependency injection container."""
def __init__(self): self._services: Dict[type, Any] = {} self._factories: Dict[type, Callable] = {}
def register_instance(self, interface: type, instance: Any): """Register a specific instance.""" self._services[interface] = instance
def register_factory(self, interface: type, factory: Callable): """Register a factory function.""" self._factories[interface] = factory
def resolve(self, interface: type) -> Any: """Resolve a dependency.""" if interface in self._services: return self._services[interface] if interface in self._factories: instance = self._factories[interface]() self._services[interface] = instance return instance raise ValueError(f"No registration for {interface}")
def tool_with_dependencies(container: DependencyContainer): """Decorator for injecting dependencies into tools.""" def decorator(func): @wraps(func) def wrapper(*args, **kwargs): # Inject dependencies based on annotations annotations = func.__annotations__ for param_name, param_type in annotations.items(): if param_name not in kwargs and param_type in container._services: kwargs[param_name] = container.resolve(param_type) return func(*args, **kwargs) return wrapper return decoratorFramework Approach
LangGraph handles this through configuration:
from langgraph.tools import toolfrom typing import Annotated
# Define tool with injectable dependency@tooldef query_database( query: str, db: Annotated[Database, "injected"]) -> str: """Execute a database query.""" return db.execute(query)
# Configure at runtimeconfig = { "configurable": { "db": ProductionDatabase() }}
# Or for testingtest_config = { "configurable": { "db": MockDatabase() }}
# Run with different configurationsresult = agent.invoke({"input": "query"}, config=config)Feature 3: Tracing and Observability
When my agent made a wrong decision, I had no idea why. Tracing fixes this.
What Tracing Does
- Track every tool call and LLM request
- Debug agent reasoning chains
- Monitor token usage and costs
- Visualize execution flow
Building It From Scratch
import timeimport jsonfrom typing import Any, Callablefrom functools import wrapsfrom dataclasses import dataclass, fieldfrom datetime import datetime
@dataclassclass Span: name: str start_time: float end_time: float = 0.0 attributes: dict = field(default_factory=dict) events: list = field(default_factory=list)
class Tracer: """Simple tracing implementation."""
def __init__(self): self.spans: list[Span] = [] self.current_span: Span | None = None
def start_span(self, name: str, attributes: dict = None) -> Span: span = Span( name=name, start_time=time.time(), attributes=attributes or {} ) self.spans.append(span) return span
def end_span(self, span: Span): span.end_time = time.time() if self.current_span == span: self.current_span = None
def add_event(self, span: Span, name: str, attributes: dict = None): span.events.append({ "name": name, "timestamp": datetime.now().isoformat(), "attributes": attributes or {} })
def export_traces(self) -> str: """Export traces for visualization.""" return json.dumps([{ "name": s.name, "duration_ms": (s.end_time - s.start_time) * 1000, "attributes": s.attributes, "events": s.events } for s in self.spans], indent=2)
def traced(tracer: Tracer): """Decorator for tracing function calls.""" def decorator(func: Callable): @wraps(func) def wrapper(*args, **kwargs): span = tracer.start_span(func.__name__, { "args": str(args)[:200], "kwargs": str(kwargs)[:200] }) try: result = func(*args, **kwargs) tracer.add_event(span, "completed", {"result": str(result)[:100]}) return result except Exception as e: tracer.add_event(span, "error", {"error": str(e)}) raise finally: tracer.end_span(span) return wrapper return decoratorFramework Approach
LangGraph has built-in LangSmith integration:
import os
# Enable tracing (one line)os.environ["LANGCHAIN_TRACING_V2"] = "true"os.environ["LANGCHAIN_API_KEY"] = "your-key"
# All agent executions are automatically tracedresult = agent.invoke({"input": "search for Python tutorials"})
# View traces at smith.langchain.comFeature 4: REACT Loop Implementation
This is the core of most agents. It’s also surprisingly complex to get right.
What REACT Does
The REACT pattern (Reason → Act → Observe) cycles through:
- Reason: LLM thinks about what to do
- Act: LLM selects and executes a tool
- Observe: Tool result feeds back to reasoning
- Repeat until final answer
Building It From Scratch
Here’s my attempt:
import jsonimport refrom typing import List, Callable, Dict, Any
class ReactAgent: """Basic REACT loop implementation."""
def __init__(self, llm, tools: List[Callable], max_steps: int = 5): self.llm = llm self.tools = {t.__name__: t for t in tools} self.max_steps = max_steps
self.system_prompt = """You are a helpful assistant.Use the following format:
Thought: Think about what to doAction: tool_name[args]Observation: (result will be provided)... (repeat Thought/Action/Observation as needed)Final Answer: Your final response
Available tools: {tool_names}"""
def run(self, query: str) -> str: context = f"Question: {query}\n" tool_names = list(self.tools.keys())
for step in range(self.max_steps): # Get LLM response prompt = self.system_prompt.format(tool_names=tool_names) + "\n" + context response = self.llm.generate(prompt)
# Check for final answer if "Final Answer:" in response: return response.split("Final Answer:")[-1].strip()
# Parse action action = self._parse_action(response) if not action: context += f"Thought: {response}\nInvalid action format. Try again.\n" continue
# Execute tool tool_name = action["tool"] if tool_name not in self.tools: context += f"Thought: {response}\nUnknown tool: {tool_name}\n" continue
try: result = self.tools[tool_name](**action["args"]) context += f"Thought: {response}\nObservation: {result}\n" except Exception as e: context += f"Thought: {response}\nObservation: Error: {str(e)}\n"
return "Could not reach final answer within step limit"
def _parse_action(self, response: str) -> Dict[str, Any] | None: """Parse action from LLM response. This is the tricky part.""" # Try multiple patterns patterns = [ r"Action:\s*(\w+)\[(.+)\]", r"Action:\s*(\w+)\((.+)\)", r"Action:\s*(\w+)\s*:\s*(.+)", ]
for pattern in patterns: match = re.search(pattern, response) if match: tool_name = match.group(1) try: args = json.loads(match.group(2)) except json.JSONDecodeError: args = {"input": match.group(2)} return {"tool": tool_name, "args": args}
return NoneThe challenges I ran into:
- LLM output is unpredictable—parsing is fragile
- Error handling for malformed actions
- Token limit management across steps
- Tool argument validation
Framework Approach
LangGraph handles REACT as a pre-built pattern:
from langgraph.prebuilt import create_react_agentfrom langgraph.tools import tool
@tooldef search(query: str) -> str: """Search for information.""" return f"Results for: {query}"
@tooldef calculate(expression: str) -> str: """Calculate a math expression.""" return str(eval(expression))
# Create REACT agent (one function call)agent = create_react_agent( model="gpt-4", tools=[search, calculate])
# Run with automatic REACT loopresult = agent.invoke({"messages": [("user", "What is 15% of 200?")]})The framework handles parsing, tool execution, error recovery, and step limits.
Feature 5: Chat Session Management
My first agent had no memory. Every message started fresh. Users hated it.
What Session Management Does
- Maintain conversation history across turns
- Manage context window limits
- Persist sessions for later resumption
- Handle multi-user conversations
Building It From Scratch
from typing import List, Dictfrom dataclasses import dataclass, fieldfrom datetime import datetimeimport json
@dataclassclass Message: role: str content: str timestamp: datetime = field(default_factory=datetime.now)
@dataclassclass Session: session_id: str messages: List[Message] = field(default_factory=list) metadata: Dict = field(default_factory=dict) created_at: datetime = field(default_factory=datetime.now) last_accessed: datetime = field(default_factory=datetime.now)
class SessionManager: """Manage chat sessions with context window limits."""
def __init__(self, max_tokens: int = 4000, tokenizer=None): self.sessions: Dict[str, Session] = {} self.max_tokens = max_tokens self.tokenizer = tokenizer or self._default_tokenizer
def _default_tokenizer(self, text: str) -> int: """Rough token count approximation.""" return len(text.split()) * 1.3 # Approximation
def create_session(self, session_id: str = None) -> Session: """Create a new session.""" import uuid session_id = session_id or str(uuid.uuid4()) session = Session(session_id=session_id) self.sessions[session_id] = session return session
def add_message(self, session_id: str, role: str, content: str): """Add a message to session history.""" if session_id not in self.sessions: self.create_session(session_id)
session = self.sessions[session_id] session.messages.append(Message(role=role, content=content)) session.last_accessed = datetime.now()
# Trim if over token limit self._trim_context(session)
def get_context(self, session_id: str) -> List[Dict]: """Get conversation context for LLM.""" if session_id not in self.sessions: return []
session = self.sessions[session_id] return [{"role": m.role, "content": m.content} for m in session.messages]
def _trim_context(self, session: Session): """Remove old messages if over token limit.""" while self._count_tokens(session) > self.max_tokens and len(session.messages) > 1: # Remove oldest non-system message for i, msg in enumerate(session.messages): if msg.role != "system": session.messages.pop(i) break
def _count_tokens(self, session: Session) -> int: """Count total tokens in session.""" total = 0 for msg in session.messages: total += self.tokenizer(msg.content) return int(total)Framework Approach
LangGraph has built-in checkpointer for sessions:
from langgraph.checkpoint.memory import MemorySaverfrom langgraph.graph import StateGraph
# Add checkpointer for session persistencecheckpointer = MemorySaver()app = graph.compile(checkpointer=checkpointer)
# Each thread_id is a separate sessionconfig = {"configurable": {"thread_id": "user-123"}}
# First turnapp.invoke({"messages": [("user", "Hello!")]}, config=config)
# Second turn (continues same session)app.invoke({"messages": [("user", "What did I just say?")]}, config=config)# Agent remembers the conversationFeature 6: Retry and Timeout Handling
APIs fail. LLMs timeout. Without retries, your agent is fragile.
What Retry Handling Does
- Automatic retries on transient failures
- Configurable timeout policies
- Exponential backoff
- Circuit breakers for repeated failures
Building It From Scratch
import timeimport randomfrom functools import wrapsfrom typing import Callable, Type, Tuple
def retry( max_attempts: int = 3, exceptions: Tuple[Type[Exception], ...] = (Exception,), base_delay: float = 1.0, max_delay: float = 60.0, exponential_base: float = 2.0): """Retry decorator with exponential backoff."""
def decorator(func: Callable): @wraps(func) def wrapper(*args, **kwargs): last_exception = None
for attempt in range(max_attempts): try: return func(*args, **kwargs) except exceptions as e: last_exception = e
if attempt == max_attempts - 1: raise
# Calculate delay with exponential backoff delay = min( base_delay * (exponential_base ** attempt), max_delay ) # Add jitter delay = delay * (0.5 + random.random())
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s") time.sleep(delay)
raise last_exception
return wrapper return decorator
def timeout(seconds: float): """Timeout decorator using threading.""" import threading
def decorator(func: Callable): @wraps(func) def wrapper(*args, **kwargs): result = [None] exception = [None]
def worker(): try: result[0] = func(*args, **kwargs) except Exception as e: exception[0] = e
thread = threading.Thread(target=worker) thread.start() thread.join(timeout=seconds)
if thread.is_alive(): raise TimeoutError(f"Function {func.__name__} timed out after {seconds}s")
if exception[0]: raise exception[0]
return result[0]
return wrapper return decorator
# Usage@retry(max_attempts=3, exceptions=(ConnectionError, TimeoutError))@timeout(seconds=30)def call_llm(prompt: str) -> str: """Call LLM with retry and timeout.""" # Your LLM call here passFramework Approach
LangGraph has built-in retry configuration:
from langgraph.graph import StateGraph
# Configure retry at node levelgraph.add_node( "llm_call", llm_node, retry={ "max_attempts": 3, "initial_interval": 1.0, "backoff_factor": 2.0, "retry_on": [ConnectionError, TimeoutError] })
# Or globally for the graphapp = graph.compile( retry={ "max_attempts": 3, "initial_interval": 1.0 })When Should You Use a Framework?
Based on my experience, use a framework when you need:
-
3 or more features from the checklist — The engineering time saved outweighs the learning curve.
-
Production reliability — Frameworks handle edge cases you’ll miss.
-
Team collaboration — Frameworks provide common patterns and abstractions.
-
Observability — Built-in tracing is worth it alone for debugging.
Build from scratch when:
-
Learning — You’ll understand agents better by building from zero.
-
Simple use case — A single LLM call doesn’t need a framework.
-
Maximum control — When framework constraints block your specific needs.
What Reddit Says
I found a helpful discussion on r/LangChain that confirmed my experience:
“Using a framework saves you time, if you need: HITL, Dependency injection to tools, Tracing, REACT Loop, Chat session management.” — Jorgestar29
“The principle behind an Agent is simple, but the engineering details are far more complex than you might imagine.” — MuninnW
“There is so much additional planning you need to do if you want to implement something from scratch. Frameworks remove that extra hassle, just get the dependency right and off you go.” — BarracudaExpensive03
Summary
In this post, I showed you the concrete features agent frameworks provide: human-in-the-loop workflows, dependency injection, tracing, REACT loops, session management, and retry handling.
Building these from scratch takes 85-175 hours of engineering. Frameworks give you all of it with a learning curve of 10-20 hours.
My recommendation: if you need 3 or more features from the checklist, use a framework. The time you save goes into your actual agent logic instead of boilerplate infrastructure.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 LangGraph Documentation
- 👨💻 LangChain Documentation
- 👨💻 Reddit Discussion: Is it better to code raw Python or use a framework for LLM agents?
- 👨💻 ReAct Paper: Synergizing Reasoning and Acting in Language Models
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments