How to Reduce Token Usage When Using APIs with LLMs: A Practical Guide
I burned through my API budget in a week. The culprit? Pasting raw OpenAPI specifications directly into my LLM prompts.
The Problem: Million-Token Specs
I was building an integration with Stripe’s API. I grabbed their OpenAPI spec and did the naive thing:
import json
# Load the full Stripe OpenAPI specwith open('stripe-api-openapi.json') as f: raw_spec = json.load(f)
# Pass it directly to Claudeprompt = f"""Based on this API spec, create an integration to handle payments:
{json.dumps(raw_spec)}"""The Stripe OpenAPI spec is approximately 1.2 million tokens. At $3 per million input tokens, that’s $3.60 per API call just for the spec. I was making dozens of calls a day during development. The costs added up fast, and the response times were sluggish.
Understanding Why Raw Specs Are Bloated
OpenAPI specifications are designed for humans and code generators, not LLMs. They contain:
- Verbose descriptions meant for documentation sites
- Example requests and responses
- Full schema definitions with redundant patterns
- Vendor extensions (x- prefixed fields)
- Deprecated endpoints that are still documented
When I examined a single endpoint in Stripe’s spec:
Endpoint: GET /v1/charges├── Description: 150+ tokens of human-readable docs├── Parameters: 20+ parameters with examples├── Response schema: Full object definition (200+ tokens)├── Examples: 3 example responses (300+ tokens)└── Vendor extensions: Stripe-specific metadata (50+ tokens)
Total for ONE endpoint: ~700+ tokensBut an LLM only needs to know: the method, path, required parameters, and what the response looks like. Everything else is noise in the context window.
The Solution: Compile Specs for LLMs
I built a compiler that strips OpenAPI specs down to what an LLM actually needs:
import jsonfrom typing import Any
def compile_openapi_for_llm(raw_spec: dict[str, Any]) -> dict[str, Any]: """ Compile OpenAPI spec for efficient LLM consumption. Removes examples, verbose descriptions, and non-essential metadata. """ compiled = { "api_name": raw_spec.get("info", {}).get("title", "API"), "version": raw_spec.get("info", {}).get("version", ""), "base_url": raw_spec.get("servers", [{}])[0].get("url", "") if raw_spec.get("servers") else "", "endpoints": [] }
for path, methods in raw_spec.get("paths", {}).items(): for method, spec in methods.items(): if method.lower() not in ["get", "post", "put", "delete", "patch"]: continue
# Extract only essential info endpoint = { "method": method.upper(), "path": path, "summary": spec.get("summary", ""), "required_params": [ p["name"] for p in spec.get("parameters", []) if p.get("required", False) ], "optional_params": [ p["name"] for p in spec.get("parameters", []) if not p.get("required", False) ][:5], # Limit optional params "response_type": _extract_response_type(spec) } compiled["endpoints"].append(endpoint)
return compiled
def _extract_response_type(spec: dict[str, Any]) -> str: """Extract simplified response type from spec.""" responses = spec.get("responses", {}) success = responses.get("200", responses.get("201", {})) content = success.get("content", {}) if "application/json" in content: schema = content["application/json"].get("schema", {}) return schema.get("type", "object") return "object"The compiled output looks like this:
{ "api_name": "Stripe API", "version": "2024-01-01", "endpoints": [ { "method": "GET", "path": "/v1/charges", "summary": "List all charges", "required_params": [], "optional_params": ["limit", "starting_after", "ending_before"], "response_type": "object" } ]}The Results: Measurable ROI
After implementing this approach, I measured the difference:
Raw Spec Compiled Spec Improvement────────────────────────────────────────────────────────────────────Token count: 1,200,000 120,000 90% reductionCost per call: $3.60 $0.36 $3.24 savingsResponse time: 12.4 seconds 8.8 seconds 29% fasterContext remaining: ~8,000 tokens ~188,000 tokens 23x more space────────────────────────────────────────────────────────────────────Over a month of development, the savings were significant:
API calls per day: 50Working days: 22Monthly calls: 1,100
Cost with raw specs: $3,960 ($3.60 × 1,100)Cost with compiled specs: $396 ($0.36 × 1,100)Monthly savings: $3,564
Time saved per call: 3.6 secondsTotal time saved: 66 minutes/monthCommon Mistakes I Made
Mistake 1: Including Every Endpoint
Initially, I compiled the entire Stripe spec. But my app only used payment intents and customers:
# BAD: Compile everythingcompiled = compile_openapi_for_llm(full_stripe_spec)
# GOOD: Filter to what you needneeded_paths = [ "/v1/payment_intents", "/v1/payment_intents/{id}", "/v1/customers", "/v1/customers/{id}"]
filtered_spec = { "paths": { k: v for k, v in full_stripe_spec["paths"].items() if k in needed_paths }}compiled = compile_openapi_for_llm(filtered_spec)Mistake 2: Not Caching Compiled Specs
I re-compiled the spec on every request. Bad idea:
import hashlibimport jsonfrom functools import lru_cache
@lru_cache(maxsize=10)def get_compiled_spec(spec_hash: str) -> dict: """Cache compiled specs to avoid re-processing.""" # spec_hash is a hash of the original spec content # Actual spec passed separately in real implementation pass
# Better: Pre-compile and storedef precompile_and_save(raw_spec_path: str, output_path: str) -> None: with open(raw_spec_path) as f: raw_spec = json.load(f)
compiled = compile_openapi_for_llm(raw_spec)
with open(output_path, 'w') as f: json.dump(compiled, f, indent=2)Mistake 3: Keeping Example Responses
Examples are helpful for humans but consume tokens without benefit to LLMs:
def strip_examples(spec: dict[str, Any]) -> dict[str, Any]: """Remove examples from OpenAPI spec.""" if isinstance(spec, dict): return { k: strip_examples(v) for k, v in spec.items() if k not in ["example", "examples", "x-code-samples"] } elif isinstance(spec, list): return [strip_examples(item) for item in spec] return specHow to Verify Your Compilation Works
I test compiled specs with a simple prompt:
Given this compiled API spec:{compiled_spec}
Write code to list the first 10 charges.
Expected behavior: The model should correctly identify the endpoint,method, and relevant parameters without asking for clarification.If the model asks questions like “What’s the endpoint URL?” or “What parameters are available?”, the compilation is too aggressive. If it works correctly, the spec is properly optimized.
Related Knowledge
This approach connects to broader LLM optimization strategies:
- Context window management: Every token you save on API specs is a token available for user queries and conversation history
- Prompt engineering: Structured, concise data formats work better than verbose documentation
- Cost monitoring: Track token usage per feature to identify optimization opportunities
- Schema compression: Similar techniques apply to database schemas, type definitions, and other structured data
Reference Implementation
Here’s a complete script to compile any OpenAPI spec:
import jsonimport sysfrom pathlib import Path
def compile_openapi(input_path: str, output_path: str) -> None: """Compile OpenAPI spec for LLM consumption.""" with open(input_path) as f: raw_spec = json.load(f)
compiled = { "api_name": raw_spec.get("info", {}).get("title", "API"), "version": raw_spec.get("info", {}).get("version", ""), "endpoints": [] }
for path, methods in raw_spec.get("paths", {}).items(): for method, spec in methods.items(): if method.lower() not in ["get", "post", "put", "delete", "patch"]: continue
endpoint = { "method": method.upper(), "path": path, "summary": spec.get("summary", "")[:100], # Truncate long summaries "params": [ { "name": p["name"], "required": p.get("required", False), "type": p.get("schema", {}).get("type", "any") } for p in spec.get("parameters", [])[:10] # Limit params ] } compiled["endpoints"].append(endpoint)
with open(output_path, 'w') as f: json.dump(compiled, f, indent=2)
print(f"Compiled {len(compiled['endpoints'])} endpoints") print(f"Output saved to: {output_path}")
if __name__ == "__main__": if len(sys.argv) != 3: print("Usage: python openapi_compiler.py <input.json> <output.json>") sys.exit(1)
compile_openapi(sys.argv[1], sys.argv[2])When Not to Compile
Compilation isn’t always the right choice:
- Exploratory development: When learning a new API, the full spec’s descriptions help you understand capabilities
- Complex schemas: If your API has deeply nested objects, over-trimming may remove critical information
- One-time scripts: The compilation effort may exceed the cost savings for infrequent use
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments