How to Add Contracts and Validation to LLM Tool Calls

Mar 20, 2026

The Problem: Hallucinated Tool Arguments

I built an AI agent that calls database tools. In testing, it worked perfectly. In production, I started seeing weird errors.

A user asked the agent to search for “10 recent orders”. The agent called the database tool with limit="ten" instead of limit=10. The query failed. No error message. Just an empty result returned confidently.

Another time, the agent passed query=123 where a string was expected. The tool crashed somewhere deep in the codebase. I spent hours debugging why the model decided to pass an integer for a search query.

The logs showed a pattern:

Input: "find 5 users named john"
Expected call: search_users(query="john", limit=5)
Actual call: search_users(query="john", limit="five", type="exact")

Input: "get order details for ID 12345"
Expected call: get_order(order_id=12345)
Actual call: get_order(id="12345", include_items=True, format="json")

Same intent. Different parameters. Hallucinated arguments. Silent wrong executions.

My First Mistake: Trusting Model Output Directly

My original code passed model output directly to tools:

from langchain.tools import tool

@tool
def search_database(query: str, limit: int) -> dict:
    """Search the database."""
    # What if limit="ten" or limit=-5?
    # What if query=123 instead of string?
    # What if model hallucinates extra parameters?
    return db.execute(f"SELECT * FROM items WHERE name LIKE '%{query}%' LIMIT {limit}")

# Model output goes directly to execution
result = agent.run("find 5 items matching python")
# No validation. No contract. Just hope.

The model generates tool calls based on its training. But models do not guarantee:

Correct types (string vs integer)
Valid ranges (limit between 1 and 100)
Required fields present
No extra hallucinated fields

I needed contracts. Typed, validated inputs before anything executes.

The Solution: Input Contracts with Pydantic

I started by defining exactly what each tool accepts:

from pydantic import BaseModel, Field, field_validator
from typing import Literal

class DatabaseSearchInput(BaseModel):
    """Contract for database search tool - strict typing with constraints"""

    query: str = Field(..., min_length=1, max_length=500)
    limit: int = Field(..., ge=1, le=100)  # Between 1 and 100
    search_type: Literal["exact", "fuzzy", "semantic"] = "fuzzy"

    @field_validator("query")
    @classmethod
    def validate_query(cls, v: str) -> str:
        # Prevent SQL injection patterns
        if ";" in v or "--" in v:
            raise ValueError("Invalid query characters detected")
        return v.strip()

# Now I can validate model output before execution
try:
    validated = DatabaseSearchInput(
        query="python",
        limit=5,
        search_type="fuzzy"
    )
    # All validations pass, safe to execute
except ValidationError as e:
    # Contract violation caught before any damage
    print(f"Invalid parameters: {e}")

If the model passes limit="ten", validation fails immediately. No silent wrong execution. No corrupted database queries.

Adding Output Contracts

Input validation was not enough. I also needed to validate what tools return:

from pydantic import BaseModel
from typing import Literal

class DatabaseSearchOutput(BaseModel):
    """Contract for what the tool returns"""

    results: list[dict]
    total_count: int
    query_time_ms: float
    status: Literal["success", "partial", "empty"]

# Tool returns validated output
@tool(args_schema=DatabaseSearchInput)
def search_database(
    query: str,
    limit: int,
    search_type: str = "fuzzy"
) -> DatabaseSearchOutput:
    """Search the database with validated inputs."""

    result = db.search(query=query, limit=limit, mode=search_type)

    # Output is validated on construction
    return DatabaseSearchOutput(
        results=result.items,
        total_count=result.total,
        query_time_ms=result.duration_ms,
        status="success" if result.items else "empty"
    )

Now both inputs and outputs have contracts. If the tool returns malformed data, the output contract catches it.

Wrapping Tools with Validation

I created a decorator to add contracts to any tool:

from functools import wraps
from pydantic import BaseModel, ValidationError
import json

def validated_tool(input_model: type[BaseModel], output_model: type[BaseModel]):
    """Decorator that adds contract validation to any tool."""

    def decorator(func):
        @wraps(func)
        def wrapper(**kwargs):
            # 1. Validate inputs against contract
            try:
                validated_input = input_model(**kwargs)
            except ValidationError as e:
                # Surface as data, not exception
                return {
                    "error": "input_validation_failed",
                    "details": json.loads(e.json()),
                    "rejected_args": kwargs
                }

            # 2. Execute with validated inputs
            try:
                result = func(**validated_input.model_dump())
            except Exception as e:
                return {
                    "error": "execution_failed",
                    "details": str(e),
                    "validated_input": validated_input.model_dump()
                }

            # 3. Validate outputs against contract
            try:
                validated_output = output_model(**result)
                return validated_output.model_dump()
            except ValidationError as e:
                return {
                    "error": "output_validation_failed",
                    "details": json.loads(e.json()),
                    "raw_output": result
                }

        return wrapper
    return decorator

# Usage
@validated_tool(
    input_model=DatabaseSearchInput,
    output_model=DatabaseSearchOutput
)
def search_database(query: str, limit: int, search_type: str = "fuzzy") -> dict:
    # Function only receives validated data
    result = db.search(query=query, limit=limit, mode=search_type)
    return {
        "results": result.items,
        "total_count": result.total,
        "query_time_ms": result.duration_ms,
        "status": "success" if result.items else "empty"
    }

The wrapper creates a validation boundary. Invalid inputs never reach the tool. Invalid outputs never leave the agent.

JSON Schema for OpenAI Function Calling

For OpenAI’s function calling API, I define contracts as JSON Schema:

from openai import OpenAI
import json
from pydantic import ValidationError

client = OpenAI()

# Define contract as JSON Schema
search_tool_schema = {
    "type": "object",
    "properties": {
        "query": {
            "type": "string",
            "minLength": 1,
            "maxLength": 500,
            "description": "Search query string"
        },
        "limit": {
            "type": "integer",
            "minimum": 1,
            "maximum": 100,
            "description": "Maximum number of results"
        },
        "search_type": {
            "type": "string",
            "enum": ["exact", "fuzzy", "semantic"],
            "default": "fuzzy",
            "description": "Type of search to perform"
        }
    },
    "required": ["query", "limit"],
    "additionalProperties": False  # Reject unknown fields
}

tools = [{
    "type": "function",
    "function": {
        "name": "search_database",
        "description": "Search the database with validated inputs",
        "parameters": search_tool_schema,
        "strict": True  # OpenAI strict mode for exact schema adherence
    }
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Find 5 items matching 'python'"}],
    tools=tools
)

# Validate the tool call matches our contract
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Re-validate with Pydantic for extra safety
    try:
        validated = DatabaseSearchInput(**args)
        result = search_database(**validated.model_dump())
    except ValidationError as e:
        print(f"Contract violation: {e}")
        # Handle gracefully, possibly ask model to retry

The strict: True flag tells OpenAI to follow the schema exactly. I still re-validate with Pydantic because I do not trust external APIs completely.

Observability: Tracking Validation Failures

Validation failures are data, not exceptions. I track them as metrics:

from dataclasses import dataclass, field
from datetime import datetime
from typing import Any

@dataclass
class ValidationMetrics:
    input_validations_passed: int = 0
    input_validations_failed: int = 0
    output_validations_passed: int = 0
    output_validations_failed: int = 0
    failures: list[dict] = field(default_factory=list)

    def record_input_validation(
        self,
        passed: bool,
        tool_name: str,
        args: dict,
        error: str | None = None
    ):
        if passed:
            self.input_validations_passed += 1
        else:
            self.input_validations_failed += 1
            self.failures.append({
                "type": "input_validation",
                "tool": tool_name,
                "args": args,
                "error": error,
                "timestamp": datetime.utcnow().isoformat()
            })

    def summary(self) -> dict:
        total_input = self.input_validations_passed + self.input_validations_failed
        total_output = self.output_validations_passed + self.output_validations_failed
        return {
            "input_pass_rate": self.input_validations_passed / max(total_input, 1),
            "output_pass_rate": self.output_validations_passed / max(total_output, 1),
            "total_failures": len(self.failures),
            "recent_failures": self.failures[-5:]
        }

# Usage in agent loop
metrics = ValidationMetrics()

def safe_tool_execution(
    tool_name: str,
    tool_func: callable,
    args: dict,
    input_model: type[BaseModel]
) -> dict:
    # Validate input
    try:
        validated_input = input_model(**args)
        metrics.record_input_validation(True, tool_name, args)
    except ValidationError as e:
        metrics.record_input_validation(False, tool_name, args, str(e))
        return {"error": "validation_failed", "details": json.loads(e.json())}

    # Execute
    result = tool_func(**validated_input.model_dump())
    return result

Now I can see validation pass rates in production. When pass rates drop, I know the model is having trouble with certain tools.

What Changed in Production

After adding contracts:

Before:
Input: "find 5 users named john"
Call: search_users(query="john", limit="five") -> crash

After:
Input: "find 5 users named john"
Call: search_users(query="john", limit="five") -> validation error
Retry: model corrects to limit=5 -> success

The validation layer catches hallucinations before they reach my tools. Errors surface as data, not silent wrong executions.

Common Mistakes I Made

Trusting model output blindly: I passed model outputs directly to tools without validation. This let hallucinated parameters through to production.

Vague tool definitions: I used loose typing and optional fields everywhere. Models exploited the vagueness to pass wrong parameters.

Validation inside tools: I put validation logic inside tool implementations instead of at boundaries. This made validation inconsistent across tools.

No output validation: I checked inputs but let malformed outputs propagate to downstream code.

Silent fallbacks: I returned None or empty values on validation failure instead of explicit errors. This hid problems instead of surfacing them.

Summary

Contracts transform LLM tool calls from fragile operations into robust, debuggable components. Define strict Pydantic models or JSON Schemas for every tool’s inputs and outputs. Validate at boundaries. When validation fails, surface errors as structured data.

The key insight: if parameters do not match the schema, the call does not happen. No hallucinated arguments. No silent wrong executions. Every output gets checked structurally and logically before it leaves the agent.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit discussion on LLM agent production failures
👨‍💻 Pydantic Documentation
👨‍💻 OpenAI Function Calling Guide
👨‍💻 JSON Schema Specification

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!