Skip to content

Function Calling in AI Agents: Complete Implementation Guide with OpenAI and LangChain

Function Calling in AI Agents: Complete Implementation Guide with OpenAI and LangChain

I’ve seen too many developers hit the same wall when building AI agents. They expect the AI to actually do things - query databases, call APIs, execute code - only to discover that large language models are just text predictors. They can decide what to do, but they can’t actually do it.

This realization hits hard: “The AI does not actually do the work. It simply chooses which function to run. Your Python function does the work. If your function is buggy, the best AI in the world cannot save you.”

Function calling bridges this gap. It gives AI agents the ability to interact with external systems by letting the model decide which functions to call, while your code handles the actual execution. The implementation pattern is simple in concept but tricky in practice - you need type-safe schemas, robust error handling, and proper execution orchestration.

What Function Calling Actually Does

The Separation of Concerns

The key insight that took me too long to grasp: the AI model is a decision engine, not an execution engine. When you give an AI agent tools, you’re not giving it the ability to run code. You’re giving it a menu of options it can choose from, with structured schemas that tell it what parameters each option expects.

Here’s the actual flow:

  1. You define function schemas with names, descriptions, and parameter types
  2. The AI model analyzes user requests and decides which function to call
  3. Your code executes the function with the parameters the model chose
  4. You return results back to the model, which incorporates them into its response

This separation is powerful. The model handles the reasoning - figuring out what the user wants and which tool addresses it - while your code handles the actual work with proper validation, error handling, and system integration.

Core Components

Every function calling implementation needs three pieces:

Function Definitions: Schemas that describe each tool - its name, what it does, and what parameters it accepts. These can be JSON Schema objects or Pydantic models.

Execution Layer: Your code that actually runs the functions. This is where you implement the business logic, make API calls, query databases, or perform any other operations.

Result Integration: The mechanism for feeding tool results back into the conversation so the model can use them in its response.

Defining Tools with Type Safety

OpenAI with Pydantic

I strongly prefer using Pydantic models over raw JSON Schema. Pydantic gives you automatic schema generation, runtime validation, and IDE support - all essential for production code.

# title: "Pydantic Tool Schema Definition"
from pydantic import BaseModel, Field
from enum import Enum
from typing import List
class Table(str, Enum):
orders = "orders"
customers = "customers"
products = "products"
class Condition(BaseModel):
column: str = Field(description="Column name to filter on")
operator: str = Field(description="Comparison operator: =, >, <, LIKE")
value: str = Field(description="Value to compare against")
class Query(BaseModel):
"""Execute a SQL query on the database."""
table_name: Table = Field(description="Table to query")
columns: List[str] = Field(description="Columns to select")
conditions: List[Condition] = Field(
default_factory=list,
description="Filter conditions"
)
limit: int = Field(default=100, description="Maximum rows to return")

The Pydantic approach gives you several advantages:

  • Automatic JSON schema generation - OpenAI’s SDK converts these to tool definitions automatically
  • Runtime validation - Invalid arguments are caught before your function runs
  • Type hints - Your IDE can autocomplete and type-check everything
  • Strict mode - Guarantees the model provides valid inputs

LangChain Tool Decorators

LangChain provides a simpler decorator-based approach for basic tools:

# title: "LangChain Tool Definition"
from langchain.tools import tool
@tool
def get_weather(city: str) -> str:
"""Get the current weather in a city.
Args:
city: Name of the city to get weather for
Returns:
Weather description string
"""
# Your weather API implementation here
return f"Weather in {city}: 72F, sunny"
@tool
def search_products(query: str, category: str = "all") -> List[dict]:
"""Search for products in the catalog.
Args:
query: Search terms
category: Product category to filter (default: all)
"""
# Your search implementation here
return [{"name": "Product 1", "price": 29.99}]

For tools with execution constraints, LangChain lets you specify who can call them:

# title: "LangChain Tool with Caller Restrictions"
@tool(extras={"allowed_callers": ["code_execution"]})
def delete_files(pattern: str) -> str:
"""Delete files matching a pattern. Restricted to code execution context."""
# This tool can only be called by code execution agents
return f"Deleted files matching {pattern}"

Converting to OpenAI Tool Format

When you need to pass tools to OpenAI’s API, you need the right schema format:

# title: "Schema Conversion Function"
def function_to_schema(func) -> dict:
"""Convert a Python function to OpenAI tool schema."""
import inspect
from typing import get_type_hints
sig = inspect.signature(func)
hints = get_type_hints(func)
properties = {}
required = []
for name, param in sig.parameters.items():
param_type = hints.get(name, str)
properties[name] = {"type": "string"} # Simplified
if param.default == inspect.Parameter.empty:
required.append(name)
return {
"type": "function",
"function": {
"name": func.__name__,
"description": func.__doc__ or "",
"parameters": {
"type": "object",
"properties": properties,
"required": required
}
}
}

Implementing the Execution Layer

Basic Execution Pattern

The execution layer is where the rubber meets the road. This is your code that actually runs the functions the model requests:

# title: "Basic Tool Execution"
import json
from typing import Callable, Dict, Any
def execute_tool_call(
tool_call: Any,
tools_map: Dict[str, Callable]
) -> Any:
"""Execute a single tool call from the model."""
name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
if name not in tools_map:
return f"Error: Unknown tool '{name}'"
try:
return tools_map[name](**args)
except Exception as e:
return f"Error executing {name}: {str(e)}"

Complete Agent Loop

The agent loop ties everything together - it keeps calling the model, executing tools, and feeding results back until the model produces a final response:

# title: "Complete Agent Loop"
from openai import OpenAI
client = OpenAI()
def run_agent(
messages: list,
tools: list,
model: str = "gpt-4o"
) -> str:
"""Run an agent with tool calling capabilities."""
tools_map = {t.__name__: t for t in tools}
tool_schemas = [function_to_schema(t) for t in tools]
while True:
response = client.chat.completions.create(
model=model,
messages=messages,
tools=tool_schemas,
tool_choice="auto"
)
message = response.choices[0].message
messages.append(message)
# No tool calls means we're done
if not message.tool_calls:
return message.content
# Execute each tool call
for tool_call in message.tool_calls:
result = execute_tool_call(tool_call, tools_map)
# Append tool result to conversation
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})

LangGraph Functional Approach

For more complex workflows, LangGraph provides a functional API with built-in state management and streaming:

# title: "LangGraph Functional Agent"
from langgraph.func import task, entrypoint
@task
async def fetch_user_data(user_id: str) -> dict:
"""Fetch user data from database."""
# Database query here
return {"id": user_id, "name": "John"}
@task
async def check_permissions(user_id: str, action: str) -> bool:
"""Check if user can perform action."""
# Permission check here
return True
@entrypoint
async def agent_workflow(query: str, user_id: str):
"""Main agent workflow with task orchestration."""
user = await fetch_user_data(user_id)
allowed = await check_permissions(user_id, "query")
if not allowed:
return "Permission denied"
# Continue with agent logic...

Error Handling and Resilience

Why Error Handling Is Critical

Here’s the uncomfortable truth: models will fail, APIs will time out, and edge cases will break your code. I’ve learned this the hard way - production function calling needs robust error handling at every layer.

The model doesn’t know your code has bugs. It doesn’t know the API is down. It just knows it called a function and got an error message back. Your error handling determines whether the agent recovers gracefully or crashes spectacularly.

Retry Patterns with Tenacity

For transient failures - network timeouts, rate limits, temporary service issues - you need automatic retries:

# title: "Retry Pattern with Tenacity"
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type
)
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=4, max=10),
retry=retry_if_exception_type(ConnectionError)
)
def call_external_api(endpoint: str, data: dict) -> dict:
"""Call external API with automatic retry on connection errors."""
import requests
response = requests.post(endpoint, json=data, timeout=30)
response.raise_for_status()
return response.json()

This pattern gives you exponential backoff (4s, 8s, 10s between retries) and stops after 3 attempts, preventing infinite retry loops.

Comprehensive Validation

Never trust model-generated input. Validate everything before execution:

# title: "Safe Tool Execution with Validation"
def safe_execute(
tool_call: Any,
tools_map: Dict[str, Callable],
schemas: Dict[str, type]
) -> Any:
"""Execute tool call with comprehensive validation."""
from pydantic import ValidationError
name = tool_call.function.name
# 1. Check tool exists
if name not in tools_map:
return {
"error": f"Unknown tool: {name}",
"available_tools": list(tools_map.keys())
}
# 2. Parse and validate arguments
try:
args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError as e:
return {"error": f"Invalid JSON in arguments: {e}"}
# 3. Validate against schema if available
if name in schemas:
try:
validated = schemas[name](**args)
args = validated.model_dump()
except ValidationError as e:
return {"error": f"Validation failed: {e}"}
# 4. Execute with error capture
try:
result = tools_map[name](**args)
return {"success": True, "result": result}
except Exception as e:
return {
"error": f"Execution failed: {str(e)}",
"error_type": type(e).__name__
}

Middleware for Logging and Monitoring

Production systems need visibility into tool execution. A middleware pattern gives you cross-cutting logging:

# title: "Tool Execution Middleware"
import time
import logging
from functools import wraps
logger = logging.getLogger(__name__)
def log_tool_execution(func):
"""Decorator to log all tool executions."""
@wraps(func)
def wrapper(*args, **kwargs):
start_time = time.time()
tool_name = func.__name__
logger.info(f"Tool call started: {tool_name}")
logger.debug(f"Arguments: {kwargs}")
try:
result = func(*args, **kwargs)
duration = time.time() - start_time
logger.info(
f"Tool call completed: {tool_name} "
f"(took {duration:.2f}s)"
)
return result
except Exception as e:
duration = time.time() - start_time
logger.error(
f"Tool call failed: {tool_name} "
f"after {duration:.2f}s - {str(e)}"
)
raise
return wrapper
# Usage
@log_tool_execution
def get_user(user_id: str) -> dict:
# Your implementation
pass

Best Practices and Common Pitfalls

Do’s

Use Pydantic for type safety and validation. The automatic schema generation and runtime validation catch errors early. Your future self will thank you.

Implement comprehensive error handling. Every tool call can fail. Plan for it. Return helpful error messages that the model can use to explain what went wrong.

Add retry logic for transient failures. Network timeouts and rate limits are facts of life. Let your code handle them automatically.

Log all tool calls for debugging. When something goes wrong (and it will), you need to know exactly what the model requested and what your code returned.

Write clear function descriptions. The model relies on your descriptions to understand when and how to use each tool. Be specific about what the tool does and what parameters it expects.

Don’ts

Don’t assume the model will always call the right tool. Sometimes it will hallucinate parameters or choose a tool that doesn’t fit. Your validation layer catches these mistakes.

Don’t skip input validation. The model can generate any JSON it wants. Your Pydantic schemas are the last line of defense.

Don’t ignore timeout handling. External APIs and databases can hang. Set reasonable timeouts and fail gracefully.

Don’t hardcode sensitive values in tool definitions. API keys, passwords, and tokens should come from environment variables, not your function schemas.

Performance Considerations

Parallelize independent tool calls. If the model requests multiple tools that don’t depend on each other, run them concurrently:

# title: "Parallel Tool Execution"
import asyncio
async def execute_tools_parallel(
tool_calls: list,
tools_map: dict
) -> list:
"""Execute multiple independent tool calls in parallel."""
tasks = [
asyncio.to_thread(
execute_tool_call,
call,
tools_map
)
for call in tool_calls
]
return await asyncio.gather(*tasks)

Cache frequently used tool results. If the same query runs repeatedly, cache the results with a reasonable TTL.

Set reasonable timeouts. Each tool should have a maximum execution time. The agent loop should have an overall timeout.

Monitor token usage from tool definitions. Large schemas consume tokens. Keep descriptions concise but informative.

Real-World Example: Customer Service Agent

Let me show you a complete implementation for a customer service agent with order lookup and refund capabilities:

# title: "Customer Service Agent Implementation"
from pydantic import BaseModel, Field
from enum import Enum
from typing import Optional
from datetime import datetime
class OrderStatus(str, Enum):
pending = "pending"
shipped = "shipped"
delivered = "delivered"
cancelled = "cancelled"
class LookupOrder(BaseModel):
"""Look up an order by order ID or customer email."""
order_id: Optional[str] = Field(
None,
description="Order ID to look up"
)
customer_email: Optional[str] = Field(
None,
description="Customer email for order lookup"
)
class ProcessRefund(BaseModel):
"""Process a refund for an order."""
order_id: str = Field(description="Order ID to refund")
reason: str = Field(description="Reason for refund")
amount: Optional[float] = Field(
None,
description="Partial refund amount (null for full refund)"
)
# Tool implementations
def lookup_order(order_id: str = None, customer_email: str = None) -> dict:
"""Look up order from database."""
# Database query here
if order_id:
return {
"order_id": order_id,
"status": "shipped",
"total": 149.99,
"items": [{"name": "Widget", "qty": 2, "price": 74.99}],
"customer_email": "[email protected]"
}
return {"error": "Order not found"}
def process_refund(order_id: str, reason: str, amount: float = None) -> dict:
"""Process refund through payment system."""
# Payment API call here
return {
"refund_id": f"REF-{order_id}",
"status": "processed",
"amount": amount or "full"
}
# Agent setup
tools = [lookup_order, process_refund]
schemas = {
"lookup_order": LookupOrder,
"process_refund": ProcessRefund
}
def handle_customer_query(query: str) -> str:
"""Handle a customer service query."""
messages = [
{
"role": "system",
"content": "You are a helpful customer service agent. "
"Use tools to look up orders and process refunds."
},
{"role": "user", "content": query}
]
return run_agent(messages, tools)

This implementation handles the complete workflow: the model decides whether to look up an order or process a refund based on the customer’s query, your code executes the actual database queries and API calls, and results flow back to the model for a natural language response.

Summary

Function calling transforms AI agents from text generators into action-taking systems. The key insight is that the AI model is a decision-maker, not an executor - your code must handle all the real work with proper validation, error handling, and retry logic.

Success depends on three pillars:

  1. Well-defined, type-safe tool schemas that guide the model’s choices
  2. Robust execution layers that handle failures gracefully
  3. Clear integration patterns that return results to the conversation

The combination of OpenAI’s Pydantic integration and LangChain’s orchestration abstractions provides a solid foundation. Start with simple tools, implement thorough error handling, then expand to more complex multi-tool workflows.

Action Items

  1. Start with Pydantic models for all your tool definitions - the type safety pays dividends immediately
  2. Build your error handling first - write the validation and retry logic before you write the actual tool implementations
  3. Log everything - you’ll need those logs when debugging why the model made unexpected choices
  4. Test edge cases explicitly - invalid inputs, missing tools, timeout scenarios - before deploying to production

Remember: “If your function is buggy, the best AI in the world cannot save you.” Invest in making your tools bulletproof.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments