Skip to content

How to Efficiently Provide API Documentation to Claude and LLMs Without Burning Tokens

I pasted Stripe’s OpenAPI spec into Claude and watched my token count explode. The spec was 1.2 million tokens. My context window? 200K. The result? A very expensive error message and no useful API integration.

Here’s what I learned about feeding API documentation to LLMs the right way.

The Problem: API Specs Are Built for Humans, Not Agents

I was building an AI agent that needed to interact with Stripe’s API. My first instinct was to dump the entire OpenAPI spec into the context:

naive_approach.py
# This burned through tokens like crazy
with open('stripe-openapi.json') as f:
spec = json.load(f)
prompt = f"Here is the Stripe API documentation: {json.dumps(spec)}"
response = claude.messages.create(
model="claude-3-opus-20240229",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
)
# Result: Token limit exceeded, huge cost, and still hallucinations

The problem? OpenAPI specs are designed for human developers browsing documentation. They contain:

  • Verbose descriptions meant for reading
  • Multiple examples for each endpoint
  • Marketing language in summaries
  • Nested schemas that repeat definitions
  • Response examples that duplicate information

A Reddit user put it perfectly: “API specs are built for humans, not agents.”

Why Raw Specs Fail

I ran into three specific issues:

Token burn: Stripe’s spec hit 1.2M tokens. Even smaller APIs like GitHub’s spec consumed 500K+ tokens. That’s $3+ per request just for context.

Context dilution: With so much irrelevant information, Claude struggled to find the right endpoints. It would mix up parameters from different endpoints.

Hallucinations: Ironically, more context led to more hallucinations. Claude would invent parameters that “looked right” based on patterns in other endpoints.

terminal_output.txt
Error: prompt is too long: 1247392 tokens > 200000 maximum

Solution 1: Trim Aggressively

I started building a spec trimmer that extracts only what an LLM needs:

spec_trimmer.py
def extract_endpoints(spec, needed_paths):
"""Extract only relevant endpoints from OpenAPI spec."""
trimmed = {
"openapi": spec["openapi"],
"info": {"title": spec["info"]["title"], "version": spec["info"]["version"]},
"paths": {},
"components": {"schemas": {}}
}
# Track which schemas we need
needed_schemas = set()
for path in needed_paths:
if path in spec["paths"]:
path_item = spec["paths"][path]
trimmed["paths"][path] = {}
for method in ["get", "post", "put", "patch", "delete"]:
if method in path_item:
operation = path_item[method]
# Keep only essential fields
trimmed_op = {
"operationId": operation.get("operationId"),
"summary": operation.get("summary"),
"parameters": operation.get("parameters", []),
"requestBody": operation.get("requestBody"),
"responses": {}
}
# Only keep success responses
for code, resp in operation.get("responses", {}).items():
if code in ["200", "201", "202", "204"]:
trimmed_op["responses"][code] = resp
# Extract schema references
needed_schemas.update(extract_schema_refs(resp))
trimmed["paths"][path][method] = trimmed_op
# Include only referenced schemas
for schema_name in needed_schemas:
if schema_name in spec.get("components", {}).get("schemas", {}):
trimmed["components"]["schemas"][schema_name] = \
spec["components"]["schemas"][schema_name]
return trimmed
def extract_schema_refs(obj, refs=None):
"""Recursively find all $ref references."""
if refs is None:
refs = set()
if isinstance(obj, dict):
if "$ref" in obj:
# Extract schema name from reference
ref = obj["$ref"]
if ref.startswith("#/components/schemas/"):
refs.add(ref.split("/")[-1])
for value in obj.values():
extract_schema_refs(value, refs)
elif isinstance(obj, list):
for item in obj:
extract_schema_refs(item, refs)
return refs

This reduced Stripe’s spec from 1.2M tokens to about 50K tokens for the endpoints I actually needed. That’s a 96% reduction.

But there’s a trade-off: How aggressive should the trimming be? I found that stripping descriptions entirely sometimes hurt accuracy. I now keep a single-line summary for each endpoint but remove the verbose descriptions and examples.

Solution 2: Use MCP Servers for On-Demand Retrieval

Instead of loading the entire spec upfront, I switched to Model Context Protocol (MCP) servers. MCP servers let Claude fetch API documentation dynamically, only retrieving what’s needed for each request.

Here’s how I configured it:

claude_desktop_config.json
{
"mcpServers": {
"stripe-api": {
"command": "python",
"args": ["/path/to/stripe_mcp_server.py"],
"env": {
"STRIPE_API_KEY": "sk_test_..."
}
}
}
}

The MCP server implementation:

stripe_mcp_server.py
from mcp.server import Server
from mcp.server.stdio import stdio_server
import json
server = Server("stripe-api")
# Load spec once at startup
with open("stripe-openapi-trimmed.json") as f:
STRIPE_SPEC = json.load(f)
@server.list_tools()
async def list_tools():
return [
{
"name": "get_endpoint_docs",
"description": "Get documentation for a specific Stripe API endpoint",
"inputSchema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "API path, e.g., /v1/charges"},
"method": {"type": "string", "enum": ["get", "post", "delete"]}
},
"required": ["path"]
}
},
{
"name": "search_endpoints",
"description": "Search for Stripe API endpoints by keyword",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search keyword"}
},
"required": ["query"]
}
}
]
@server.call_tool()
async def call_tool(name, arguments):
if name == "get_endpoint_docs":
path = arguments["path"]
method = arguments.get("method", "get")
if path in STRIPE_SPEC["paths"]:
endpoint = STRIPE_SPEC["paths"][path].get(method, {})
return {"content": [{"type": "text", "text": json.dumps(endpoint, indent=2)}]}
return {"content": [{"type": "text", "text": f"Endpoint {path} not found"}]}
elif name == "search_endpoints":
query = arguments["query"].lower()
results = []
for path, methods in STRIPE_SPEC["paths"].items():
for method, op in methods.items():
if query in op.get("summary", "").lower() or query in path.lower():
results.append({
"path": path,
"method": method,
"summary": op.get("summary")
})
return {"content": [{"type": "text", "text": json.dumps(results, indent=2)}]}
async def main():
async with stdio_server() as (read_stream, write_stream):
await server.run(read_stream, write_stream)
if __name__ == "__main__":
import asyncio
asyncio.run(main())

Now Claude can query specific endpoints on demand instead of loading everything at once. This approach:

  1. Reduces initial token load - No spec in context until needed
  2. Improves accuracy - Claude sees only relevant documentation
  3. Enables exploration - Claude can search for endpoints by keyword

Solution 3: Use Context7 for Pre-Indexed Docs

Context7 provides pre-indexed documentation libraries optimized for LLM consumption. It’s essentially a search engine for API docs that returns structured, token-efficient responses.

context7_usage.py
from context7 import Context7Client
client = Context7Client()
# Search for Stripe charge creation docs
results = client.search(
library="stripe",
query="create charge",
max_tokens=5000 # Limit response size
)
# Results are already trimmed and structured for LLMs
print(results.text)

Context7 handles the trimming automatically, returning only the most relevant documentation sections.

What I Learned About Restructuring Specs

The biggest insight was that OpenAPI specs are organized for RESTful browsing, not agent efficiency. I restructured my trimmed specs by operation type:

restructured_spec.py
def restructure_for_agents(spec):
"""Reorganize spec by operation type instead of REST hierarchy."""
restructured = {
"create": {}, # POST endpoints
"read": {}, # GET single resource
"list": {}, # GET collections
"update": {}, # PUT/PATCH endpoints
"delete": {} # DELETE endpoints
}
for path, methods in spec.get("paths", {}).items():
for method, operation in methods.items():
op_id = operation.get("operationId", path)
entry = {
"path": path,
"method": method,
"parameters": operation.get("parameters", []),
"requestBody": operation.get("requestBody"),
"responseSchema": extract_response_schema(operation)
}
# Categorize by operation type
if method == "post":
restructured["create"][op_id] = entry
elif method == "get":
if "{id}" in path or path.endswith("}"):
restructured["read"][op_id] = entry
else:
restructured["list"][op_id] = entry
elif method in ["put", "patch"]:
restructured["update"][op_id] = entry
elif method == "delete":
restructured["delete"][op_id] = entry
return restructured

This structure matches how agents actually think about APIs: “I need to create something” or “I need to list all items.”

Common Mistakes I Made

Mistake 1: Pasting entire specs

mistake1.py
# DON'T DO THIS
prompt = f"API Docs: {json.dumps(entire_stripe_spec)}"
# This is 1.2M tokens of mostly noise

Mistake 2: Assuming “more context is better”

More context can actually increase hallucinations. Claude tries to use all the context it has, and irrelevant information creates confusion.

Mistake 3: Not testing for hallucinations

After context changes, always test that Claude uses the correct parameters. I added validation:

hallucination_check.py
def validate_api_call(endpoint_path, parameters, spec):
"""Check if parameters match the spec."""
endpoint = spec["paths"].get(endpoint_path, {})
valid_params = set()
for param in endpoint.get("get", {}).get("parameters", []):
valid_params.add(param["name"])
invalid = set(parameters.keys()) - valid_params
if invalid:
print(f"WARNING: Potentially hallucinated parameters: {invalid}")
return False
return True

The Cost Difference

I measured token usage across three approaches:

cost_comparison.txt
Approach | Input Tokens | Cost (Opus) | Accuracy
----------------------|--------------|-------------|----------
Full spec | 1,200,000 | $12.00 | 60%
Trimmed spec | 45,000 | $0.45 | 85%
MCP on-demand | 5,000* | $0.05 | 95%
Context7 | 3,000* | $0.03 | 92%
*Per request, not upfront

The MCP approach costs 240x less and produces more accurate results.

Summary

API documentation for LLMs requires a different approach than documentation for humans:

  1. Trim aggressively - Remove examples, verbose descriptions, and unused endpoints
  2. Use MCP servers - Let agents fetch documentation on demand
  3. Restructure for agents - Organize by operation type, not REST hierarchy
  4. Validate against hallucinations - Test that parameters match the spec

The key insight: LLMs don’t need comprehensive documentation. They need focused, relevant context that answers specific questions about the API.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments