How to Efficiently Provide API Documentation to Claude and LLMs Without Burning Tokens
I pasted Stripe’s OpenAPI spec into Claude and watched my token count explode. The spec was 1.2 million tokens. My context window? 200K. The result? A very expensive error message and no useful API integration.
Here’s what I learned about feeding API documentation to LLMs the right way.
The Problem: API Specs Are Built for Humans, Not Agents
I was building an AI agent that needed to interact with Stripe’s API. My first instinct was to dump the entire OpenAPI spec into the context:
# This burned through tokens like crazywith open('stripe-openapi.json') as f: spec = json.load(f)
prompt = f"Here is the Stripe API documentation: {json.dumps(spec)}"response = claude.messages.create( model="claude-3-opus-20240229", max_tokens=4096, messages=[{"role": "user", "content": prompt}])# Result: Token limit exceeded, huge cost, and still hallucinationsThe problem? OpenAPI specs are designed for human developers browsing documentation. They contain:
- Verbose descriptions meant for reading
- Multiple examples for each endpoint
- Marketing language in summaries
- Nested schemas that repeat definitions
- Response examples that duplicate information
A Reddit user put it perfectly: “API specs are built for humans, not agents.”
Why Raw Specs Fail
I ran into three specific issues:
Token burn: Stripe’s spec hit 1.2M tokens. Even smaller APIs like GitHub’s spec consumed 500K+ tokens. That’s $3+ per request just for context.
Context dilution: With so much irrelevant information, Claude struggled to find the right endpoints. It would mix up parameters from different endpoints.
Hallucinations: Ironically, more context led to more hallucinations. Claude would invent parameters that “looked right” based on patterns in other endpoints.
Error: prompt is too long: 1247392 tokens > 200000 maximumSolution 1: Trim Aggressively
I started building a spec trimmer that extracts only what an LLM needs:
def extract_endpoints(spec, needed_paths): """Extract only relevant endpoints from OpenAPI spec.""" trimmed = { "openapi": spec["openapi"], "info": {"title": spec["info"]["title"], "version": spec["info"]["version"]}, "paths": {}, "components": {"schemas": {}} }
# Track which schemas we need needed_schemas = set()
for path in needed_paths: if path in spec["paths"]: path_item = spec["paths"][path] trimmed["paths"][path] = {}
for method in ["get", "post", "put", "patch", "delete"]: if method in path_item: operation = path_item[method]
# Keep only essential fields trimmed_op = { "operationId": operation.get("operationId"), "summary": operation.get("summary"), "parameters": operation.get("parameters", []), "requestBody": operation.get("requestBody"), "responses": {} }
# Only keep success responses for code, resp in operation.get("responses", {}).items(): if code in ["200", "201", "202", "204"]: trimmed_op["responses"][code] = resp # Extract schema references needed_schemas.update(extract_schema_refs(resp))
trimmed["paths"][path][method] = trimmed_op
# Include only referenced schemas for schema_name in needed_schemas: if schema_name in spec.get("components", {}).get("schemas", {}): trimmed["components"]["schemas"][schema_name] = \ spec["components"]["schemas"][schema_name]
return trimmed
def extract_schema_refs(obj, refs=None): """Recursively find all $ref references.""" if refs is None: refs = set()
if isinstance(obj, dict): if "$ref" in obj: # Extract schema name from reference ref = obj["$ref"] if ref.startswith("#/components/schemas/"): refs.add(ref.split("/")[-1]) for value in obj.values(): extract_schema_refs(value, refs) elif isinstance(obj, list): for item in obj: extract_schema_refs(item, refs)
return refsThis reduced Stripe’s spec from 1.2M tokens to about 50K tokens for the endpoints I actually needed. That’s a 96% reduction.
But there’s a trade-off: How aggressive should the trimming be? I found that stripping descriptions entirely sometimes hurt accuracy. I now keep a single-line summary for each endpoint but remove the verbose descriptions and examples.
Solution 2: Use MCP Servers for On-Demand Retrieval
Instead of loading the entire spec upfront, I switched to Model Context Protocol (MCP) servers. MCP servers let Claude fetch API documentation dynamically, only retrieving what’s needed for each request.
Here’s how I configured it:
{ "mcpServers": { "stripe-api": { "command": "python", "args": ["/path/to/stripe_mcp_server.py"], "env": { "STRIPE_API_KEY": "sk_test_..." } } }}The MCP server implementation:
from mcp.server import Serverfrom mcp.server.stdio import stdio_serverimport json
server = Server("stripe-api")
# Load spec once at startupwith open("stripe-openapi-trimmed.json") as f: STRIPE_SPEC = json.load(f)
@server.list_tools()async def list_tools(): return [ { "name": "get_endpoint_docs", "description": "Get documentation for a specific Stripe API endpoint", "inputSchema": { "type": "object", "properties": { "path": {"type": "string", "description": "API path, e.g., /v1/charges"}, "method": {"type": "string", "enum": ["get", "post", "delete"]} }, "required": ["path"] } }, { "name": "search_endpoints", "description": "Search for Stripe API endpoints by keyword", "inputSchema": { "type": "object", "properties": { "query": {"type": "string", "description": "Search keyword"} }, "required": ["query"] } } ]
@server.call_tool()async def call_tool(name, arguments): if name == "get_endpoint_docs": path = arguments["path"] method = arguments.get("method", "get")
if path in STRIPE_SPEC["paths"]: endpoint = STRIPE_SPEC["paths"][path].get(method, {}) return {"content": [{"type": "text", "text": json.dumps(endpoint, indent=2)}]} return {"content": [{"type": "text", "text": f"Endpoint {path} not found"}]}
elif name == "search_endpoints": query = arguments["query"].lower() results = [] for path, methods in STRIPE_SPEC["paths"].items(): for method, op in methods.items(): if query in op.get("summary", "").lower() or query in path.lower(): results.append({ "path": path, "method": method, "summary": op.get("summary") }) return {"content": [{"type": "text", "text": json.dumps(results, indent=2)}]}
async def main(): async with stdio_server() as (read_stream, write_stream): await server.run(read_stream, write_stream)
if __name__ == "__main__": import asyncio asyncio.run(main())Now Claude can query specific endpoints on demand instead of loading everything at once. This approach:
- Reduces initial token load - No spec in context until needed
- Improves accuracy - Claude sees only relevant documentation
- Enables exploration - Claude can search for endpoints by keyword
Solution 3: Use Context7 for Pre-Indexed Docs
Context7 provides pre-indexed documentation libraries optimized for LLM consumption. It’s essentially a search engine for API docs that returns structured, token-efficient responses.
from context7 import Context7Client
client = Context7Client()
# Search for Stripe charge creation docsresults = client.search( library="stripe", query="create charge", max_tokens=5000 # Limit response size)
# Results are already trimmed and structured for LLMsprint(results.text)Context7 handles the trimming automatically, returning only the most relevant documentation sections.
What I Learned About Restructuring Specs
The biggest insight was that OpenAPI specs are organized for RESTful browsing, not agent efficiency. I restructured my trimmed specs by operation type:
def restructure_for_agents(spec): """Reorganize spec by operation type instead of REST hierarchy.""" restructured = { "create": {}, # POST endpoints "read": {}, # GET single resource "list": {}, # GET collections "update": {}, # PUT/PATCH endpoints "delete": {} # DELETE endpoints }
for path, methods in spec.get("paths", {}).items(): for method, operation in methods.items(): op_id = operation.get("operationId", path) entry = { "path": path, "method": method, "parameters": operation.get("parameters", []), "requestBody": operation.get("requestBody"), "responseSchema": extract_response_schema(operation) }
# Categorize by operation type if method == "post": restructured["create"][op_id] = entry elif method == "get": if "{id}" in path or path.endswith("}"): restructured["read"][op_id] = entry else: restructured["list"][op_id] = entry elif method in ["put", "patch"]: restructured["update"][op_id] = entry elif method == "delete": restructured["delete"][op_id] = entry
return restructuredThis structure matches how agents actually think about APIs: “I need to create something” or “I need to list all items.”
Common Mistakes I Made
Mistake 1: Pasting entire specs
# DON'T DO THISprompt = f"API Docs: {json.dumps(entire_stripe_spec)}"# This is 1.2M tokens of mostly noiseMistake 2: Assuming “more context is better”
More context can actually increase hallucinations. Claude tries to use all the context it has, and irrelevant information creates confusion.
Mistake 3: Not testing for hallucinations
After context changes, always test that Claude uses the correct parameters. I added validation:
def validate_api_call(endpoint_path, parameters, spec): """Check if parameters match the spec.""" endpoint = spec["paths"].get(endpoint_path, {}) valid_params = set()
for param in endpoint.get("get", {}).get("parameters", []): valid_params.add(param["name"])
invalid = set(parameters.keys()) - valid_params if invalid: print(f"WARNING: Potentially hallucinated parameters: {invalid}") return False return TrueThe Cost Difference
I measured token usage across three approaches:
Approach | Input Tokens | Cost (Opus) | Accuracy----------------------|--------------|-------------|----------Full spec | 1,200,000 | $12.00 | 60%Trimmed spec | 45,000 | $0.45 | 85%MCP on-demand | 5,000* | $0.05 | 95%Context7 | 3,000* | $0.03 | 92%
*Per request, not upfrontThe MCP approach costs 240x less and produces more accurate results.
Summary
API documentation for LLMs requires a different approach than documentation for humans:
- Trim aggressively - Remove examples, verbose descriptions, and unused endpoints
- Use MCP servers - Let agents fetch documentation on demand
- Restructure for agents - Organize by operation type, not REST hierarchy
- Validate against hallucinations - Test that parameters match the spec
The key insight: LLMs don’t need comprehensive documentation. They need focused, relevant context that answers specific questions about the API.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments