How to Efficiently Provide API Documentation to Claude and LLMs Without Burning Tokens

Mar 16, 2026

I pasted Stripe’s OpenAPI spec into Claude and watched my token count explode. The spec was 1.2 million tokens. My context window? 200K. The result? A very expensive error message and no useful API integration.

Here’s what I learned about feeding API documentation to LLMs the right way.

The Problem: API Specs Are Built for Humans, Not Agents

I was building an AI agent that needed to interact with Stripe’s API. My first instinct was to dump the entire OpenAPI spec into the context:

# This burned through tokens like crazy
with open('stripe-openapi.json') as f:
    spec = json.load(f)

prompt = f"Here is the Stripe API documentation: {json.dumps(spec)}"
response = claude.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=4096,
    messages=[{"role": "user", "content": prompt}]
)
# Result: Token limit exceeded, huge cost, and still hallucinations

The problem? OpenAPI specs are designed for human developers browsing documentation. They contain:

Verbose descriptions meant for reading
Multiple examples for each endpoint
Marketing language in summaries
Nested schemas that repeat definitions
Response examples that duplicate information

A Reddit user put it perfectly: “API specs are built for humans, not agents.”

Why Raw Specs Fail

I ran into three specific issues:

Token burn: Stripe’s spec hit 1.2M tokens. Even smaller APIs like GitHub’s spec consumed 500K+ tokens. That’s $3+ per request just for context.

Context dilution: With so much irrelevant information, Claude struggled to find the right endpoints. It would mix up parameters from different endpoints.

Hallucinations: Ironically, more context led to more hallucinations. Claude would invent parameters that “looked right” based on patterns in other endpoints.

Error: prompt is too long: 1247392 tokens > 200000 maximum

Solution 1: Trim Aggressively

I started building a spec trimmer that extracts only what an LLM needs:

def extract_endpoints(spec, needed_paths):
    """Extract only relevant endpoints from OpenAPI spec."""
    trimmed = {
        "openapi": spec["openapi"],
        "info": {"title": spec["info"]["title"], "version": spec["info"]["version"]},
        "paths": {},
        "components": {"schemas": {}}
    }

    # Track which schemas we need
    needed_schemas = set()

    for path in needed_paths:
        if path in spec["paths"]:
            path_item = spec["paths"][path]
            trimmed["paths"][path] = {}

            for method in ["get", "post", "put", "patch", "delete"]:
                if method in path_item:
                    operation = path_item[method]

                    # Keep only essential fields
                    trimmed_op = {
                        "operationId": operation.get("operationId"),
                        "summary": operation.get("summary"),
                        "parameters": operation.get("parameters", []),
                        "requestBody": operation.get("requestBody"),
                        "responses": {}
                    }

                    # Only keep success responses
                    for code, resp in operation.get("responses", {}).items():
                        if code in ["200", "201", "202", "204"]:
                            trimmed_op["responses"][code] = resp
                            # Extract schema references
                            needed_schemas.update(extract_schema_refs(resp))

                    trimmed["paths"][path][method] = trimmed_op

    # Include only referenced schemas
    for schema_name in needed_schemas:
        if schema_name in spec.get("components", {}).get("schemas", {}):
            trimmed["components"]["schemas"][schema_name] = \
                spec["components"]["schemas"][schema_name]

    return trimmed

def extract_schema_refs(obj, refs=None):
    """Recursively find all $ref references."""
    if refs is None:
        refs = set()

    if isinstance(obj, dict):
        if "$ref" in obj:
            # Extract schema name from reference
            ref = obj["$ref"]
            if ref.startswith("#/components/schemas/"):
                refs.add(ref.split("/")[-1])
        for value in obj.values():
            extract_schema_refs(value, refs)
    elif isinstance(obj, list):
        for item in obj:
            extract_schema_refs(item, refs)

    return refs

This reduced Stripe’s spec from 1.2M tokens to about 50K tokens for the endpoints I actually needed. That’s a 96% reduction.

But there’s a trade-off: How aggressive should the trimming be? I found that stripping descriptions entirely sometimes hurt accuracy. I now keep a single-line summary for each endpoint but remove the verbose descriptions and examples.

Solution 2: Use MCP Servers for On-Demand Retrieval

Instead of loading the entire spec upfront, I switched to Model Context Protocol (MCP) servers. MCP servers let Claude fetch API documentation dynamically, only retrieving what’s needed for each request.

Here’s how I configured it:

{
  "mcpServers": {
    "stripe-api": {
      "command": "python",
      "args": ["/path/to/stripe_mcp_server.py"],
      "env": {
        "STRIPE_API_KEY": "sk_test_..."
      }
    }
  }
}

The MCP server implementation:

from mcp.server import Server
from mcp.server.stdio import stdio_server
import json

server = Server("stripe-api")

# Load spec once at startup
with open("stripe-openapi-trimmed.json") as f:
    STRIPE_SPEC = json.load(f)

@server.list_tools()
async def list_tools():
    return [
        {
            "name": "get_endpoint_docs",
            "description": "Get documentation for a specific Stripe API endpoint",
            "inputSchema": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "API path, e.g., /v1/charges"},
                    "method": {"type": "string", "enum": ["get", "post", "delete"]}
                },
                "required": ["path"]
            }
        },
        {
            "name": "search_endpoints",
            "description": "Search for Stripe API endpoints by keyword",
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search keyword"}
                },
                "required": ["query"]
            }
        }
    ]

@server.call_tool()
async def call_tool(name, arguments):
    if name == "get_endpoint_docs":
        path = arguments["path"]
        method = arguments.get("method", "get")

        if path in STRIPE_SPEC["paths"]:
            endpoint = STRIPE_SPEC["paths"][path].get(method, {})
            return {"content": [{"type": "text", "text": json.dumps(endpoint, indent=2)}]}
        return {"content": [{"type": "text", "text": f"Endpoint {path} not found"}]}

    elif name == "search_endpoints":
        query = arguments["query"].lower()
        results = []
        for path, methods in STRIPE_SPEC["paths"].items():
            for method, op in methods.items():
                if query in op.get("summary", "").lower() or query in path.lower():
                    results.append({
                        "path": path,
                        "method": method,
                        "summary": op.get("summary")
                    })
        return {"content": [{"type": "text", "text": json.dumps(results, indent=2)}]}

async def main():
    async with stdio_server() as (read_stream, write_stream):
        await server.run(read_stream, write_stream)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Now Claude can query specific endpoints on demand instead of loading everything at once. This approach:

Reduces initial token load - No spec in context until needed
Improves accuracy - Claude sees only relevant documentation
Enables exploration - Claude can search for endpoints by keyword

Solution 3: Use Context7 for Pre-Indexed Docs

Context7 provides pre-indexed documentation libraries optimized for LLM consumption. It’s essentially a search engine for API docs that returns structured, token-efficient responses.

from context7 import Context7Client

client = Context7Client()

# Search for Stripe charge creation docs
results = client.search(
    library="stripe",
    query="create charge",
    max_tokens=5000  # Limit response size
)

# Results are already trimmed and structured for LLMs
print(results.text)

Context7 handles the trimming automatically, returning only the most relevant documentation sections.

What I Learned About Restructuring Specs

The biggest insight was that OpenAPI specs are organized for RESTful browsing, not agent efficiency. I restructured my trimmed specs by operation type:

def restructure_for_agents(spec):
    """Reorganize spec by operation type instead of REST hierarchy."""
    restructured = {
        "create": {},   # POST endpoints
        "read": {},     # GET single resource
        "list": {},     # GET collections
        "update": {},   # PUT/PATCH endpoints
        "delete": {}    # DELETE endpoints
    }

    for path, methods in spec.get("paths", {}).items():
        for method, operation in methods.items():
            op_id = operation.get("operationId", path)
            entry = {
                "path": path,
                "method": method,
                "parameters": operation.get("parameters", []),
                "requestBody": operation.get("requestBody"),
                "responseSchema": extract_response_schema(operation)
            }

            # Categorize by operation type
            if method == "post":
                restructured["create"][op_id] = entry
            elif method == "get":
                if "{id}" in path or path.endswith("}"):
                    restructured["read"][op_id] = entry
                else:
                    restructured["list"][op_id] = entry
            elif method in ["put", "patch"]:
                restructured["update"][op_id] = entry
            elif method == "delete":
                restructured["delete"][op_id] = entry

    return restructured

This structure matches how agents actually think about APIs: “I need to create something” or “I need to list all items.”

Common Mistakes I Made

Mistake 1: Pasting entire specs

# DON'T DO THIS
prompt = f"API Docs: {json.dumps(entire_stripe_spec)}"
# This is 1.2M tokens of mostly noise

Mistake 2: Assuming “more context is better”

More context can actually increase hallucinations. Claude tries to use all the context it has, and irrelevant information creates confusion.

Mistake 3: Not testing for hallucinations

After context changes, always test that Claude uses the correct parameters. I added validation:

def validate_api_call(endpoint_path, parameters, spec):
    """Check if parameters match the spec."""
    endpoint = spec["paths"].get(endpoint_path, {})
    valid_params = set()

    for param in endpoint.get("get", {}).get("parameters", []):
        valid_params.add(param["name"])

    invalid = set(parameters.keys()) - valid_params
    if invalid:
        print(f"WARNING: Potentially hallucinated parameters: {invalid}")
        return False
    return True

The Cost Difference

I measured token usage across three approaches:

Approach              | Input Tokens | Cost (Opus) | Accuracy
----------------------|--------------|-------------|----------
Full spec             | 1,200,000    | $12.00      | 60%
Trimmed spec          | 45,000       | $0.45       | 85%
MCP on-demand         | 5,000*       | $0.05       | 95%
Context7              | 3,000*       | $0.03       | 92%

*Per request, not upfront

The MCP approach costs 240x less and produces more accurate results.

Summary

API documentation for LLMs requires a different approach than documentation for humans:

Trim aggressively - Remove examples, verbose descriptions, and unused endpoints
Use MCP servers - Let agents fetch documentation on demand
Restructure for agents - Organize by operation type, not REST hierarchy
Validate against hallucinations - Test that parameters match the spec

The key insight: LLMs don’t need comprehensive documentation. They need focused, relevant context that answers specific questions about the API.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!