How I Used MCP as Middleware to Fix Legacy API Integration for AI
Problem
I had a legacy ERP API at work that returned way too much data. My AI agent needed three fields: customer name, email, and account status. The API returned 50+ fields including audit logs, internal IDs, historical tracking data, and metadata I never asked for.
# API response sizecurl https://legacy-erp.company.com/api/customers/12345 | wc -c# Output: 15,420 bytes of JSONWhen I plugged this directly into my AI agent:
# Token usage for a single customer lookupInput tokens: 2,847 (the bloated API response)Output tokens: 89Total tokens: 2,936
# At $3/million tokens, this adds up fast# 1000 lookups = $8.81 just for wasted contextAnd here’s the kicker: I couldn’t get the API changed. When I asked the API team, they said legacy users depend on the current response format. Changes require approval from three teams. It’s like an act of parliament at this point.
My First Attempt: Client-Side Filtering
I tried filtering the data on the client side after receiving it:
async def get_customer_info(customer_id: str): # Call legacy API response = await httpx.get(f"https://legacy-erp.company.com/api/customers/{customer_id}") full_data = response.json()
# Filter client-side return { "name": full_data["cust_nm"], "email": full_data["contact_info"]["primary_email"], "status": "active" if full_data["acct_stat_cd"] == "A" else "inactive" }This worked, but the wasted tokens were already consumed. The AI still processed all 2,847 tokens before I could filter them out. The cost and latency remained the same.
The Real Problem
The issue wasn’t just one endpoint. My AI agent needed to:
- Get order details (5 separate API calls)
- Calculate totals server-side
- Handle quirky field names (
cust_nminstead ofname) - Deal with inconsistent availability fields across API versions
Direct API integration was a mess:
+-------------+ +------------------+| AI Agent |-------->| Legacy API #1 | (50+ fields)+-------------+ +------------------+ | | | v | +------------------+ +----------------->| Legacy API #2 | (quirky formats) +------------------+ | v +------------------+ | Legacy API #3 | (needs 3 calls for 1 entity) +------------------+
Problem: AI burns tokens on irrelevant dataSolution: MCP as Middleware
I built an MCP server that sits between my AI agent and the legacy APIs. The MCP server does the filtering and aggregation before the AI ever sees the data.
+-------------+ +------------------+ +------------------+| AI Agent |<------->| MCP Server |<------>| Legacy API #1 |+-------------+ | (middleware) | +------------------+ +------------------+ | | v | +------------------+ +-------------------->| Legacy API #2 | +------------------+
Key: MCP filters BEFORE tokens are consumedHere’s the core implementation:
from mcp import MCPServerimport httpx
server = MCPServer("legacy-erp-wrapper")
@server.tool()async def get_customer_info(customer_id: str) -> dict: """ Get essential customer information. Filters legacy ERP response from 50+ fields to 3. """ async with httpx.AsyncClient() as client: response = await client.get( f"https://legacy-erp.company.com/api/customers/{customer_id}", headers={"X-Legacy-Auth": "..."}, timeout=10.0 )
full_data = response.json()
# Middleware transformation: extract only what's needed return { "name": full_data["cust_nm"], "email": full_data["contact_info"]["primary_email"], "status": "active" if full_data["acct_stat_cd"] == "A" else "inactive" }Token Savings
After implementing the MCP middleware:
# Before: Direct API callAPI response: 2,847 tokensAgent reasoning: 200 tokensTotal: 3,047 tokens
# After: MCP middlewareMCP response: 80 tokensAgent reasoning: 80 tokensTotal: 160 tokens
# Savings: 95%But the real win was aggregating multiple API calls:
@server.tool()async def get_order_summary(order_id: str) -> dict: """ Get consolidated order summary. Reduces 5 legacy API calls to 1 response. """ async with httpx.AsyncClient() as client: # Legacy API is fragmented across 5 endpoints order = await client.get(f"/legacy/orders/{order_id}") items = await client.get(f"/legacy/orders/{order_id}/items") customer = await client.get(f"/legacy/customers/{order['cust_id']}") shipping = await client.get(f"/legacy/shipping/{order['ship_id']}") payments = await client.get(f"/legacy/payments?order={order_id}")
# Server-side preprocessing (no tokens consumed) total = sum(item["price"] * item["qty"] for item in items.json()) paid = sum(p["amount"] for p in payments.json() if p["status"] == "completed")
return { "order_number": order.json()["order_num"], "customer_name": customer.json()["name"], "total_amount": total, "amount_paid": paid, "balance_due": total - paid, "shipping_status": shipping.json()["status"], "item_count": len(items.json()) }# Before: 5 separate calls5 calls x 1,500 tokens = 7,500 tokens
# After: 1 aggregated response1 response x 200 tokens = 200 tokens
# Savings: 97%Handling Legacy API Quirks
The MCP server also handles inconsistent field names and edge cases:
@server.tool()async def search_inventory(query: str) -> list[dict]: """ Search inventory with normalized results. Handles legacy API quirks internally. """ async with httpx.AsyncClient() as client: response = await client.get( "/legacy/inventory/search", params={"q": query, "format": "json"} # Force JSON, API defaults to XML )
raw_results = response.json()
normalized = [] for item in raw_results.get("results", []): # Legacy API sometimes returns None for price price = item.get("price") or item.get("list_price", 0)
# Legacy API has inconsistent availability field in_stock = ( item.get("qty_avail", 0) > 0 or item.get("stock_status") == "AVAIL" or item.get("backorder_ok", False) )
# Legacy API has different product types across versions product_type = item.get("prod_type") or item.get("product_category", "unknown")
normalized.append({ "sku": item["sku"], "name": item["description"], "price": price, "in_stock": in_stock, "type": product_type.lower() })
return normalizedThe AI agent never needs to know about qty_avail vs stock_status vs backorder_ok. The MCP server normalizes everything.
Unexpected Benefit: Running with a Smaller Model
After implementing the MCP middleware, I could run my application with a 4b local LLM instead of GPT-4:
# Before: Needed GPT-4 for complex parsingCost: $0.03 per requestLatency: 2-3 seconds
# After: Local 4b model works fineCost: $0 (local)Latency: 300-500msPrivacy: Data never leaves my machineThe smaller context window of a 4b model wasn’t a problem anymore because the MCP server eliminated the bloat.
Common Mistakes I Made
Mistake 1: Passthrough MCP Server
Initially, I just wrapped the API without any transformation:
# WRONG: No value added@server.tool()async def get_customer(customer_id: str): return await legacy_api.get(customer_id) # Just passes throughThis added complexity without reducing tokens. The middleware must do meaningful work.
Mistake 2: Over-Abstracting
I tried to create a generic transformer with configuration:
# WRONG: Too generic, defeats the purpose@server.tool()async def transform_api(endpoint: str, fields: list[str]): # This just moves the complexity to the AI ...The AI ended up specifying the fields anyway. Keep MCP tools specific and purposeful.
Mistake 3: Ignoring Cache Freshness
Pre-processing can lead to stale data:
# WRONG: No cache invalidation@server.tool()async def get_cached_data(key: str): return cache.get(key) # Could be hours oldImplement appropriate cache invalidation based on your data freshness requirements.
When MCP Middleware Makes Sense
Not every API needs MCP middleware. Use it when:
- API returns 10x more data than you need
- Multiple API calls could be aggregated
- Field names are cryptic or inconsistent
- You can’t get the API changed
- You want to use smaller, cheaper models
Summary
MCP middleware transforms legacy APIs into AI-friendly interfaces without requiring changes to the original API. The key benefits:
- Token reduction: Up to 97% fewer tokens consumed
- Cost savings: Smaller models become viable
- Latency improvement: Less data to process
- Organizational independence: No need to change legacy APIs
- Clean agent code: MCP handles the quirks
Start by identifying your top API pain points (bloated responses, multiple calls, quirky formats). These are your MCP middleware candidates.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Model Context Protocol Specification
- 👨💻 Reddit: MCP value discussion
- 👨💻 LangChain Documentation
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments