Skip to content

How to Reduce Token Usage When Using APIs with LLMs: A Practical Guide

I burned through my API budget in a week. The culprit? Pasting raw OpenAPI specifications directly into my LLM prompts.

The Problem: Million-Token Specs

I was building an integration with Stripe’s API. I grabbed their OpenAPI spec and did the naive thing:

naive_approach.py
import json
# Load the full Stripe OpenAPI spec
with open('stripe-api-openapi.json') as f:
raw_spec = json.load(f)
# Pass it directly to Claude
prompt = f"""
Based on this API spec, create an integration to handle payments:
{json.dumps(raw_spec)}
"""

The Stripe OpenAPI spec is approximately 1.2 million tokens. At $3 per million input tokens, that’s $3.60 per API call just for the spec. I was making dozens of calls a day during development. The costs added up fast, and the response times were sluggish.

Understanding Why Raw Specs Are Bloated

OpenAPI specifications are designed for humans and code generators, not LLMs. They contain:

  • Verbose descriptions meant for documentation sites
  • Example requests and responses
  • Full schema definitions with redundant patterns
  • Vendor extensions (x- prefixed fields)
  • Deprecated endpoints that are still documented

When I examined a single endpoint in Stripe’s spec:

single-endpoint-analysis.txt
Endpoint: GET /v1/charges
├── Description: 150+ tokens of human-readable docs
├── Parameters: 20+ parameters with examples
├── Response schema: Full object definition (200+ tokens)
├── Examples: 3 example responses (300+ tokens)
└── Vendor extensions: Stripe-specific metadata (50+ tokens)
Total for ONE endpoint: ~700+ tokens

But an LLM only needs to know: the method, path, required parameters, and what the response looks like. Everything else is noise in the context window.

The Solution: Compile Specs for LLMs

I built a compiler that strips OpenAPI specs down to what an LLM actually needs:

compile_spec.py
import json
from typing import Any
def compile_openapi_for_llm(raw_spec: dict[str, Any]) -> dict[str, Any]:
"""
Compile OpenAPI spec for efficient LLM consumption.
Removes examples, verbose descriptions, and non-essential metadata.
"""
compiled = {
"api_name": raw_spec.get("info", {}).get("title", "API"),
"version": raw_spec.get("info", {}).get("version", ""),
"base_url": raw_spec.get("servers", [{}])[0].get("url", "") if raw_spec.get("servers") else "",
"endpoints": []
}
for path, methods in raw_spec.get("paths", {}).items():
for method, spec in methods.items():
if method.lower() not in ["get", "post", "put", "delete", "patch"]:
continue
# Extract only essential info
endpoint = {
"method": method.upper(),
"path": path,
"summary": spec.get("summary", ""),
"required_params": [
p["name"]
for p in spec.get("parameters", [])
if p.get("required", False)
],
"optional_params": [
p["name"]
for p in spec.get("parameters", [])
if not p.get("required", False)
][:5], # Limit optional params
"response_type": _extract_response_type(spec)
}
compiled["endpoints"].append(endpoint)
return compiled
def _extract_response_type(spec: dict[str, Any]) -> str:
"""Extract simplified response type from spec."""
responses = spec.get("responses", {})
success = responses.get("200", responses.get("201", {}))
content = success.get("content", {})
if "application/json" in content:
schema = content["application/json"].get("schema", {})
return schema.get("type", "object")
return "object"

The compiled output looks like this:

compiled_endpoint.json
{
"api_name": "Stripe API",
"version": "2024-01-01",
"endpoints": [
{
"method": "GET",
"path": "/v1/charges",
"summary": "List all charges",
"required_params": [],
"optional_params": ["limit", "starting_after", "ending_before"],
"response_type": "object"
}
]
}

The Results: Measurable ROI

After implementing this approach, I measured the difference:

token_comparison.txt
Raw Spec Compiled Spec Improvement
────────────────────────────────────────────────────────────────────
Token count: 1,200,000 120,000 90% reduction
Cost per call: $3.60 $0.36 $3.24 savings
Response time: 12.4 seconds 8.8 seconds 29% faster
Context remaining: ~8,000 tokens ~188,000 tokens 23x more space
────────────────────────────────────────────────────────────────────

Over a month of development, the savings were significant:

monthly_savings.txt
API calls per day: 50
Working days: 22
Monthly calls: 1,100
Cost with raw specs: $3,960 ($3.60 × 1,100)
Cost with compiled specs: $396 ($0.36 × 1,100)
Monthly savings: $3,564
Time saved per call: 3.6 seconds
Total time saved: 66 minutes/month

Common Mistakes I Made

Mistake 1: Including Every Endpoint

Initially, I compiled the entire Stripe spec. But my app only used payment intents and customers:

selective_compilation.py
# BAD: Compile everything
compiled = compile_openapi_for_llm(full_stripe_spec)
# GOOD: Filter to what you need
needed_paths = [
"/v1/payment_intents",
"/v1/payment_intents/{id}",
"/v1/customers",
"/v1/customers/{id}"
]
filtered_spec = {
"paths": {
k: v for k, v in full_stripe_spec["paths"].items()
if k in needed_paths
}
}
compiled = compile_openapi_for_llm(filtered_spec)

Mistake 2: Not Caching Compiled Specs

I re-compiled the spec on every request. Bad idea:

caching_approach.py
import hashlib
import json
from functools import lru_cache
@lru_cache(maxsize=10)
def get_compiled_spec(spec_hash: str) -> dict:
"""Cache compiled specs to avoid re-processing."""
# spec_hash is a hash of the original spec content
# Actual spec passed separately in real implementation
pass
# Better: Pre-compile and store
def precompile_and_save(raw_spec_path: str, output_path: str) -> None:
with open(raw_spec_path) as f:
raw_spec = json.load(f)
compiled = compile_openapi_for_llm(raw_spec)
with open(output_path, 'w') as f:
json.dump(compiled, f, indent=2)

Mistake 3: Keeping Example Responses

Examples are helpful for humans but consume tokens without benefit to LLMs:

remove_examples.py
def strip_examples(spec: dict[str, Any]) -> dict[str, Any]:
"""Remove examples from OpenAPI spec."""
if isinstance(spec, dict):
return {
k: strip_examples(v)
for k, v in spec.items()
if k not in ["example", "examples", "x-code-samples"]
}
elif isinstance(spec, list):
return [strip_examples(item) for item in spec]
return spec

How to Verify Your Compilation Works

I test compiled specs with a simple prompt:

verification_prompt.txt
Given this compiled API spec:
{compiled_spec}
Write code to list the first 10 charges.
Expected behavior: The model should correctly identify the endpoint,
method, and relevant parameters without asking for clarification.

If the model asks questions like “What’s the endpoint URL?” or “What parameters are available?”, the compilation is too aggressive. If it works correctly, the spec is properly optimized.

This approach connects to broader LLM optimization strategies:

  • Context window management: Every token you save on API specs is a token available for user queries and conversation history
  • Prompt engineering: Structured, concise data formats work better than verbose documentation
  • Cost monitoring: Track token usage per feature to identify optimization opportunities
  • Schema compression: Similar techniques apply to database schemas, type definitions, and other structured data

Reference Implementation

Here’s a complete script to compile any OpenAPI spec:

openapi_compiler.py
import json
import sys
from pathlib import Path
def compile_openapi(input_path: str, output_path: str) -> None:
"""Compile OpenAPI spec for LLM consumption."""
with open(input_path) as f:
raw_spec = json.load(f)
compiled = {
"api_name": raw_spec.get("info", {}).get("title", "API"),
"version": raw_spec.get("info", {}).get("version", ""),
"endpoints": []
}
for path, methods in raw_spec.get("paths", {}).items():
for method, spec in methods.items():
if method.lower() not in ["get", "post", "put", "delete", "patch"]:
continue
endpoint = {
"method": method.upper(),
"path": path,
"summary": spec.get("summary", "")[:100], # Truncate long summaries
"params": [
{
"name": p["name"],
"required": p.get("required", False),
"type": p.get("schema", {}).get("type", "any")
}
for p in spec.get("parameters", [])[:10] # Limit params
]
}
compiled["endpoints"].append(endpoint)
with open(output_path, 'w') as f:
json.dump(compiled, f, indent=2)
print(f"Compiled {len(compiled['endpoints'])} endpoints")
print(f"Output saved to: {output_path}")
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python openapi_compiler.py <input.json> <output.json>")
sys.exit(1)
compile_openapi(sys.argv[1], sys.argv[2])

When Not to Compile

Compilation isn’t always the right choice:

  • Exploratory development: When learning a new API, the full spec’s descriptions help you understand capabilities
  • Complex schemas: If your API has deeply nested objects, over-trimming may remove critical information
  • One-time scripts: The compilation effort may exceed the cost savings for infrequent use

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments