What Agent Frameworks Actually Give You: Features Beyond the Hype

Mar 18, 2026

Problem

I wanted to build an AI agent. I had two choices: use a framework like LangGraph, or write raw Python from scratch. Everyone online had opinions, but no one gave me a concrete list of what frameworks actually provide.

I tried building from scratch first. Three weeks later, I had implemented a buggy REACT loop, a half-working session manager, and zero tracing. That’s when I realized: frameworks don’t just give you convenience—they give you features that take serious engineering to build correctly.

This post shows you the concrete features agent frameworks provide, so you can decide if the learning curve is worth it.

The Feature Checklist

From my research and painful firsthand experience, here’s what frameworks actually give you:

Feature	What It Does	Build From Scratch?
Human-in-the-loop (HITL)	Pause for human approval mid-execution	20-40 hours
Dependency injection for tools	Swap tools at runtime, test easily	10-20 hours
Tracing and observability	Track every decision and tool call	15-30 hours
REACT loop implementation	Standard Reason-Act-Observe pattern	20-40 hours
Chat session management	Persist history, manage context windows	10-25 hours
Retry and timeout handling	Auto-retry on failures, circuit breakers	10-20 hours

Total: 85-175 hours of engineering work you can skip with a framework.

Feature 1: Human-in-the-Loop (HITL)

This was the feature I didn’t know I needed until my agent deleted the wrong files.

What HITL Does

HITL lets your agent pause execution and wait for human input. Use cases:

Approve expensive operations before execution
Collect feedback mid-workflow
Override agent decisions
Debug by inspecting state at breakpoints

Building It From Scratch

Here’s what I had to implement:

import json
import pickle
from pathlib import Path
from typing import Any, Dict, Optional
from enum import Enum

class AgentState(Enum):
    RUNNING = "running"
    WAITING_FOR_INPUT = "waiting_for_input"
    COMPLETED = "completed"
    FAILED = "failed"

class HITLAgent:
    """Basic HITL implementation from scratch."""

    def __init__(self, checkpoint_dir: str = "./checkpoints"):
        self.checkpoint_dir = Path(checkpoint_dir)
        self.checkpoint_dir.mkdir(exist_ok=True)
        self.state = AgentState.RUNNING
        self.context: Dict[str, Any] = {}

    def save_checkpoint(self, checkpoint_id: str, state: Dict[str, Any]):
        """Persist state for later resumption."""
        checkpoint_path = self.checkpoint_dir / f"{checkpoint_id}.pkl"
        with open(checkpoint_path, "wb") as f:
            pickle.dump(state, f)

    def load_checkpoint(self, checkpoint_id: str) -> Optional[Dict[str, Any]]:
        """Load persisted state."""
        checkpoint_path = self.checkpoint_dir / f"{checkpoint_id}.pkl"
        if checkpoint_path.exists():
            with open(checkpoint_path, "rb") as f:
                return pickle.load(f)
        return None

    def request_approval(self, action: str, details: Dict[str, Any]) -> bool:
        """Pause execution and request human approval."""
        self.state = AgentState.WAITING_FOR_INPUT

        # Save checkpoint so we can resume
        self.save_checkpoint("pending_approval", {
            "action": action,
            "details": details,
            "context": self.context
        })

        # In a real system, this would notify the human
        print(f"Action requires approval: {action}")
        print(f"Details: {json.dumps(details, indent=2)}")

        # Wait for human input (simplified)
        response = input("Approve? (y/n): ")
        self.state = AgentState.RUNNING
        return response.lower() == "y"

    def execute_with_approval(self, action: str, details: Dict[str, Any], execute_fn):
        """Execute action only after approval."""
        if self.request_approval(action, details):
            result = execute_fn()
            return {"approved": True, "result": result}
        return {"approved": False, "result": None}

This is a simplified version. A production implementation needs:

State persistence across server restarts
Notification system for pending approvals
Approval workflow management
Timeout handling for approvals
Multi-user approval routing

Framework Approach

With LangGraph, HITL is built-in:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict

class AgentState(TypedDict):
    messages: list
    pending_action: dict

def human_approval_node(state: AgentState) -> AgentState:
    """Node that pauses for human input."""
    # This automatically creates an interrupt point
    return state

# Build graph with checkpointing
graph = StateGraph(AgentState)
graph.add_node("approval", human_approval_node)

# Checkpointer enables interrupt/resume
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)

# Interrupt at approval node
result = app.invoke(
    {"messages": ["Delete file?"], "pending_action": {"file": "data.csv"}},
    config={"configurable": {"thread_id": "session-1"}}
)

# Resume after human input
app.update_state(
    config={"configurable": {"thread_id": "session-1"}},
    values={"approved": True}
)
result = app.invoke(None, config={"configurable": {"thread_id": "session-1"}})

The framework handles state persistence, interrupt/resume, and session management automatically.

Feature 2: Dependency Injection for Tools

I used to hardcode database connections in my tools. Then I tried to write tests and everything broke.

What Dependency Injection Does

Inject databases, APIs, configurations at runtime
Swap real implementations for mocks during testing
Configure different tools for different environments

Building It From Scratch

from typing import Callable, Dict, Any, TypeVar, Generic
from dataclasses import dataclass
from functools import wraps

T = TypeVar('T')

@dataclass
class Tool:
    name: str
    func: Callable
    dependencies: Dict[str, type]

class DependencyContainer:
    """Simple dependency injection container."""

    def __init__(self):
        self._services: Dict[type, Any] = {}
        self._factories: Dict[type, Callable] = {}

    def register_instance(self, interface: type, instance: Any):
        """Register a specific instance."""
        self._services[interface] = instance

    def register_factory(self, interface: type, factory: Callable):
        """Register a factory function."""
        self._factories[interface] = factory

    def resolve(self, interface: type) -> Any:
        """Resolve a dependency."""
        if interface in self._services:
            return self._services[interface]
        if interface in self._factories:
            instance = self._factories[interface]()
            self._services[interface] = instance
            return instance
        raise ValueError(f"No registration for {interface}")

def tool_with_dependencies(container: DependencyContainer):
    """Decorator for injecting dependencies into tools."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            # Inject dependencies based on annotations
            annotations = func.__annotations__
            for param_name, param_type in annotations.items():
                if param_name not in kwargs and param_type in container._services:
                    kwargs[param_name] = container.resolve(param_type)
            return func(*args, **kwargs)
        return wrapper
    return decorator

Framework Approach

LangGraph handles this through configuration:

from langgraph.tools import tool
from typing import Annotated

# Define tool with injectable dependency
@tool
def query_database(
    query: str,
    db: Annotated[Database, "injected"]
) -> str:
    """Execute a database query."""
    return db.execute(query)

# Configure at runtime
config = {
    "configurable": {
        "db": ProductionDatabase()
    }
}

# Or for testing
test_config = {
    "configurable": {
        "db": MockDatabase()
    }
}

# Run with different configurations
result = agent.invoke({"input": "query"}, config=config)

Feature 3: Tracing and Observability

When my agent made a wrong decision, I had no idea why. Tracing fixes this.

What Tracing Does

Track every tool call and LLM request
Debug agent reasoning chains
Monitor token usage and costs
Visualize execution flow

Building It From Scratch

import time
import json
from typing import Any, Callable
from functools import wraps
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Span:
    name: str
    start_time: float
    end_time: float = 0.0
    attributes: dict = field(default_factory=dict)
    events: list = field(default_factory=list)

class Tracer:
    """Simple tracing implementation."""

    def __init__(self):
        self.spans: list[Span] = []
        self.current_span: Span | None = None

    def start_span(self, name: str, attributes: dict = None) -> Span:
        span = Span(
            name=name,
            start_time=time.time(),
            attributes=attributes or {}
        )
        self.spans.append(span)
        return span

    def end_span(self, span: Span):
        span.end_time = time.time()
        if self.current_span == span:
            self.current_span = None

    def add_event(self, span: Span, name: str, attributes: dict = None):
        span.events.append({
            "name": name,
            "timestamp": datetime.now().isoformat(),
            "attributes": attributes or {}
        })

    def export_traces(self) -> str:
        """Export traces for visualization."""
        return json.dumps([{
            "name": s.name,
            "duration_ms": (s.end_time - s.start_time) * 1000,
            "attributes": s.attributes,
            "events": s.events
        } for s in self.spans], indent=2)

def traced(tracer: Tracer):
    """Decorator for tracing function calls."""
    def decorator(func: Callable):
        @wraps(func)
        def wrapper(*args, **kwargs):
            span = tracer.start_span(func.__name__, {
                "args": str(args)[:200],
                "kwargs": str(kwargs)[:200]
            })
            try:
                result = func(*args, **kwargs)
                tracer.add_event(span, "completed", {"result": str(result)[:100]})
                return result
            except Exception as e:
                tracer.add_event(span, "error", {"error": str(e)})
                raise
            finally:
                tracer.end_span(span)
        return wrapper
    return decorator

Framework Approach

LangGraph has built-in LangSmith integration:

import os

# Enable tracing (one line)
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-key"

# All agent executions are automatically traced
result = agent.invoke({"input": "search for Python tutorials"})

# View traces at smith.langchain.com

Feature 4: REACT Loop Implementation

This is the core of most agents. It’s also surprisingly complex to get right.

What REACT Does

The REACT pattern (Reason → Act → Observe) cycles through:

Reason: LLM thinks about what to do
Act: LLM selects and executes a tool
Observe: Tool result feeds back to reasoning
Repeat until final answer

Building It From Scratch

Here’s my attempt:

import json
import re
from typing import List, Callable, Dict, Any

class ReactAgent:
    """Basic REACT loop implementation."""

    def __init__(self, llm, tools: List[Callable], max_steps: int = 5):
        self.llm = llm
        self.tools = {t.__name__: t for t in tools}
        self.max_steps = max_steps

        self.system_prompt = """You are a helpful assistant.
Use the following format:

Thought: Think about what to do
Action: tool_name[args]
Observation: (result will be provided)
... (repeat Thought/Action/Observation as needed)
Final Answer: Your final response

Available tools: {tool_names}"""

    def run(self, query: str) -> str:
        context = f"Question: {query}\n"
        tool_names = list(self.tools.keys())

        for step in range(self.max_steps):
            # Get LLM response
            prompt = self.system_prompt.format(tool_names=tool_names) + "\n" + context
            response = self.llm.generate(prompt)

            # Check for final answer
            if "Final Answer:" in response:
                return response.split("Final Answer:")[-1].strip()

            # Parse action
            action = self._parse_action(response)
            if not action:
                context += f"Thought: {response}\nInvalid action format. Try again.\n"
                continue

            # Execute tool
            tool_name = action["tool"]
            if tool_name not in self.tools:
                context += f"Thought: {response}\nUnknown tool: {tool_name}\n"
                continue

            try:
                result = self.tools[tool_name](**action["args"])
                context += f"Thought: {response}\nObservation: {result}\n"
            except Exception as e:
                context += f"Thought: {response}\nObservation: Error: {str(e)}\n"

        return "Could not reach final answer within step limit"

    def _parse_action(self, response: str) -> Dict[str, Any] | None:
        """Parse action from LLM response. This is the tricky part."""
        # Try multiple patterns
        patterns = [
            r"Action:\s*(\w+)\[(.+)\]",
            r"Action:\s*(\w+)\((.+)\)",
            r"Action:\s*(\w+)\s*:\s*(.+)",
        ]

        for pattern in patterns:
            match = re.search(pattern, response)
            if match:
                tool_name = match.group(1)
                try:
                    args = json.loads(match.group(2))
                except json.JSONDecodeError:
                    args = {"input": match.group(2)}
                return {"tool": tool_name, "args": args}

        return None

The challenges I ran into:

LLM output is unpredictable—parsing is fragile
Error handling for malformed actions
Token limit management across steps
Tool argument validation

Framework Approach

LangGraph handles REACT as a pre-built pattern:

from langgraph.prebuilt import create_react_agent
from langgraph.tools import tool

@tool
def search(query: str) -> str:
    """Search for information."""
    return f"Results for: {query}"

@tool
def calculate(expression: str) -> str:
    """Calculate a math expression."""
    return str(eval(expression))

# Create REACT agent (one function call)
agent = create_react_agent(
    model="gpt-4",
    tools=[search, calculate]
)

# Run with automatic REACT loop
result = agent.invoke({"messages": [("user", "What is 15% of 200?")]})

The framework handles parsing, tool execution, error recovery, and step limits.

Feature 5: Chat Session Management

My first agent had no memory. Every message started fresh. Users hated it.

What Session Management Does

Maintain conversation history across turns
Manage context window limits
Persist sessions for later resumption
Handle multi-user conversations

Building It From Scratch

from typing import List, Dict
from dataclasses import dataclass, field
from datetime import datetime
import json

@dataclass
class Message:
    role: str
    content: str
    timestamp: datetime = field(default_factory=datetime.now)

@dataclass
class Session:
    session_id: str
    messages: List[Message] = field(default_factory=list)
    metadata: Dict = field(default_factory=dict)
    created_at: datetime = field(default_factory=datetime.now)
    last_accessed: datetime = field(default_factory=datetime.now)

class SessionManager:
    """Manage chat sessions with context window limits."""

    def __init__(self, max_tokens: int = 4000, tokenizer=None):
        self.sessions: Dict[str, Session] = {}
        self.max_tokens = max_tokens
        self.tokenizer = tokenizer or self._default_tokenizer

    def _default_tokenizer(self, text: str) -> int:
        """Rough token count approximation."""
        return len(text.split()) * 1.3  # Approximation

    def create_session(self, session_id: str = None) -> Session:
        """Create a new session."""
        import uuid
        session_id = session_id or str(uuid.uuid4())
        session = Session(session_id=session_id)
        self.sessions[session_id] = session
        return session

    def add_message(self, session_id: str, role: str, content: str):
        """Add a message to session history."""
        if session_id not in self.sessions:
            self.create_session(session_id)

        session = self.sessions[session_id]
        session.messages.append(Message(role=role, content=content))
        session.last_accessed = datetime.now()

        # Trim if over token limit
        self._trim_context(session)

    def get_context(self, session_id: str) -> List[Dict]:
        """Get conversation context for LLM."""
        if session_id not in self.sessions:
            return []

        session = self.sessions[session_id]
        return [{"role": m.role, "content": m.content} for m in session.messages]

    def _trim_context(self, session: Session):
        """Remove old messages if over token limit."""
        while self._count_tokens(session) > self.max_tokens and len(session.messages) > 1:
            # Remove oldest non-system message
            for i, msg in enumerate(session.messages):
                if msg.role != "system":
                    session.messages.pop(i)
                    break

    def _count_tokens(self, session: Session) -> int:
        """Count total tokens in session."""
        total = 0
        for msg in session.messages:
            total += self.tokenizer(msg.content)
        return int(total)

Framework Approach

LangGraph has built-in checkpointer for sessions:

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph

# Add checkpointer for session persistence
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)

# Each thread_id is a separate session
config = {"configurable": {"thread_id": "user-123"}}

# First turn
app.invoke({"messages": [("user", "Hello!")]}, config=config)

# Second turn (continues same session)
app.invoke({"messages": [("user", "What did I just say?")]}, config=config)
# Agent remembers the conversation

Feature 6: Retry and Timeout Handling

APIs fail. LLMs timeout. Without retries, your agent is fragile.

What Retry Handling Does

Automatic retries on transient failures
Configurable timeout policies
Exponential backoff
Circuit breakers for repeated failures

Building It From Scratch

import time
import random
from functools import wraps
from typing import Callable, Type, Tuple

def retry(
    max_attempts: int = 3,
    exceptions: Tuple[Type[Exception], ...] = (Exception,),
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    exponential_base: float = 2.0
):
    """Retry decorator with exponential backoff."""

    def decorator(func: Callable):
        @wraps(func)
        def wrapper(*args, **kwargs):
            last_exception = None

            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    last_exception = e

                    if attempt == max_attempts - 1:
                        raise

                    # Calculate delay with exponential backoff
                    delay = min(
                        base_delay * (exponential_base ** attempt),
                        max_delay
                    )
                    # Add jitter
                    delay = delay * (0.5 + random.random())

                    print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s")
                    time.sleep(delay)

            raise last_exception

        return wrapper
    return decorator

def timeout(seconds: float):
    """Timeout decorator using threading."""
    import threading

    def decorator(func: Callable):
        @wraps(func)
        def wrapper(*args, **kwargs):
            result = [None]
            exception = [None]

            def worker():
                try:
                    result[0] = func(*args, **kwargs)
                except Exception as e:
                    exception[0] = e

            thread = threading.Thread(target=worker)
            thread.start()
            thread.join(timeout=seconds)

            if thread.is_alive():
                raise TimeoutError(f"Function {func.__name__} timed out after {seconds}s")

            if exception[0]:
                raise exception[0]

            return result[0]

        return wrapper
    return decorator

# Usage
@retry(max_attempts=3, exceptions=(ConnectionError, TimeoutError))
@timeout(seconds=30)
def call_llm(prompt: str) -> str:
    """Call LLM with retry and timeout."""
    # Your LLM call here
    pass

Framework Approach

LangGraph has built-in retry configuration:

from langgraph.graph import StateGraph

# Configure retry at node level
graph.add_node(
    "llm_call",
    llm_node,
    retry={
        "max_attempts": 3,
        "initial_interval": 1.0,
        "backoff_factor": 2.0,
        "retry_on": [ConnectionError, TimeoutError]
    }
)

# Or globally for the graph
app = graph.compile(
    retry={
        "max_attempts": 3,
        "initial_interval": 1.0
    }
)

When Should You Use a Framework?

Based on my experience, use a framework when you need:

3 or more features from the checklist — The engineering time saved outweighs the learning curve.
Production reliability — Frameworks handle edge cases you’ll miss.
Team collaboration — Frameworks provide common patterns and abstractions.
Observability — Built-in tracing is worth it alone for debugging.

Build from scratch when:

Learning — You’ll understand agents better by building from zero.
Simple use case — A single LLM call doesn’t need a framework.
Maximum control — When framework constraints block your specific needs.

What Reddit Says

I found a helpful discussion on r/LangChain that confirmed my experience:

“Using a framework saves you time, if you need: HITL, Dependency injection to tools, Tracing, REACT Loop, Chat session management.” — Jorgestar29

“The principle behind an Agent is simple, but the engineering details are far more complex than you might imagine.” — MuninnW

“There is so much additional planning you need to do if you want to implement something from scratch. Frameworks remove that extra hassle, just get the dependency right and off you go.” — BarracudaExpensive03

Summary

In this post, I showed you the concrete features agent frameworks provide: human-in-the-loop workflows, dependency injection, tracing, REACT loops, session management, and retry handling.

Building these from scratch takes 85-175 hours of engineering. Frameworks give you all of it with a learning curve of 10-20 hours.

My recommendation: if you need 3 or more features from the checklist, use a framework. The time you save goes into your actual agent logic instead of boilerplate infrastructure.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 LangGraph Documentation
👨‍💻 LangChain Documentation
👨‍💻 Reddit Discussion: Is it better to code raw Python or use a framework for LLM agents?
👨‍💻 ReAct Paper: Synergizing Reasoning and Acting in Language Models

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

What Agent Frameworks Actually Give You: Features Beyond the Hype

Problem

The Feature Checklist

Feature 1: Human-in-the-Loop (HITL)

What HITL Does

Building It From Scratch

Framework Approach

Feature 2: Dependency Injection for Tools

What Dependency Injection Does

Building It From Scratch

Framework Approach

Feature 3: Tracing and Observability

What Tracing Does

Building It From Scratch

Framework Approach

Feature 4: REACT Loop Implementation

What REACT Does

Building It From Scratch

Framework Approach

Feature 5: Chat Session Management

What Session Management Does

Building It From Scratch

Framework Approach

Feature 6: Retry and Timeout Handling

What Retry Handling Does

Building It From Scratch

Framework Approach

When Should You Use a Framework?

What Reddit Says

Summary

Final Words + More Resources

Comments