How I Used MCP as Middleware to Fix Legacy API Integration for AI

Mar 18, 2026

Problem

I had a legacy ERP API at work that returned way too much data. My AI agent needed three fields: customer name, email, and account status. The API returned 50+ fields including audit logs, internal IDs, historical tracking data, and metadata I never asked for.

# API response size
curl https://legacy-erp.company.com/api/customers/12345 | wc -c
# Output: 15,420 bytes of JSON

When I plugged this directly into my AI agent:

# Token usage for a single customer lookup
Input tokens:  2,847  (the bloated API response)
Output tokens:   89
Total tokens: 2,936

# At $3/million tokens, this adds up fast
# 1000 lookups = $8.81 just for wasted context

And here’s the kicker: I couldn’t get the API changed. When I asked the API team, they said legacy users depend on the current response format. Changes require approval from three teams. It’s like an act of parliament at this point.

My First Attempt: Client-Side Filtering

I tried filtering the data on the client side after receiving it:

async def get_customer_info(customer_id: str):
    # Call legacy API
    response = await httpx.get(f"https://legacy-erp.company.com/api/customers/{customer_id}")
    full_data = response.json()

    # Filter client-side
    return {
        "name": full_data["cust_nm"],
        "email": full_data["contact_info"]["primary_email"],
        "status": "active" if full_data["acct_stat_cd"] == "A" else "inactive"
    }

This worked, but the wasted tokens were already consumed. The AI still processed all 2,847 tokens before I could filter them out. The cost and latency remained the same.

The Real Problem

The issue wasn’t just one endpoint. My AI agent needed to:

Get order details (5 separate API calls)
Calculate totals server-side
Handle quirky field names (cust_nm instead of name)
Deal with inconsistent availability fields across API versions

Direct API integration was a mess:

+-------------+         +------------------+
|  AI Agent   |-------->|  Legacy API #1   |  (50+ fields)
+-------------+         +------------------+
      |                         |
      |                         v
      |                  +------------------+
      +----------------->|  Legacy API #2   |  (quirky formats)
                         +------------------+
                                |
                                v
                         +------------------+
                         |  Legacy API #3   |  (needs 3 calls for 1 entity)
                         +------------------+

Problem: AI burns tokens on irrelevant data

Solution: MCP as Middleware

I built an MCP server that sits between my AI agent and the legacy APIs. The MCP server does the filtering and aggregation before the AI ever sees the data.

+-------------+         +------------------+         +------------------+
|  AI Agent   |<------->|   MCP Server     |<------>|  Legacy API #1   |
+-------------+         |   (middleware)   |         +------------------+
                        +------------------+                |
                              |                            v
                              |                     +------------------+
                              +-------------------->|  Legacy API #2   |
                                                    +------------------+

Key: MCP filters BEFORE tokens are consumed

Here’s the core implementation:

from mcp import MCPServer
import httpx

server = MCPServer("legacy-erp-wrapper")

@server.tool()
async def get_customer_info(customer_id: str) -> dict:
    """
    Get essential customer information.
    Filters legacy ERP response from 50+ fields to 3.
    """
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"https://legacy-erp.company.com/api/customers/{customer_id}",
            headers={"X-Legacy-Auth": "..."},
            timeout=10.0
        )

    full_data = response.json()

    # Middleware transformation: extract only what's needed
    return {
        "name": full_data["cust_nm"],
        "email": full_data["contact_info"]["primary_email"],
        "status": "active" if full_data["acct_stat_cd"] == "A" else "inactive"
    }

Token Savings

After implementing the MCP middleware:

# Before: Direct API call
API response:     2,847 tokens
Agent reasoning:    200 tokens
Total:            3,047 tokens

# After: MCP middleware
MCP response:       80 tokens
Agent reasoning:    80 tokens
Total:             160 tokens

# Savings: 95%

But the real win was aggregating multiple API calls:

@server.tool()
async def get_order_summary(order_id: str) -> dict:
    """
    Get consolidated order summary.
    Reduces 5 legacy API calls to 1 response.
    """
    async with httpx.AsyncClient() as client:
        # Legacy API is fragmented across 5 endpoints
        order = await client.get(f"/legacy/orders/{order_id}")
        items = await client.get(f"/legacy/orders/{order_id}/items")
        customer = await client.get(f"/legacy/customers/{order['cust_id']}")
        shipping = await client.get(f"/legacy/shipping/{order['ship_id']}")
        payments = await client.get(f"/legacy/payments?order={order_id}")

    # Server-side preprocessing (no tokens consumed)
    total = sum(item["price"] * item["qty"] for item in items.json())
    paid = sum(p["amount"] for p in payments.json() if p["status"] == "completed")

    return {
        "order_number": order.json()["order_num"],
        "customer_name": customer.json()["name"],
        "total_amount": total,
        "amount_paid": paid,
        "balance_due": total - paid,
        "shipping_status": shipping.json()["status"],
        "item_count": len(items.json())
    }

# Before: 5 separate calls
5 calls x 1,500 tokens = 7,500 tokens

# After: 1 aggregated response
1 response x 200 tokens = 200 tokens

# Savings: 97%

Handling Legacy API Quirks

The MCP server also handles inconsistent field names and edge cases:

@server.tool()
async def search_inventory(query: str) -> list[dict]:
    """
    Search inventory with normalized results.
    Handles legacy API quirks internally.
    """
    async with httpx.AsyncClient() as client:
        response = await client.get(
            "/legacy/inventory/search",
            params={"q": query, "format": "json"}  # Force JSON, API defaults to XML
        )

    raw_results = response.json()

    normalized = []
    for item in raw_results.get("results", []):
        # Legacy API sometimes returns None for price
        price = item.get("price") or item.get("list_price", 0)

        # Legacy API has inconsistent availability field
        in_stock = (
            item.get("qty_avail", 0) > 0 or
            item.get("stock_status") == "AVAIL" or
            item.get("backorder_ok", False)
        )

        # Legacy API has different product types across versions
        product_type = item.get("prod_type") or item.get("product_category", "unknown")

        normalized.append({
            "sku": item["sku"],
            "name": item["description"],
            "price": price,
            "in_stock": in_stock,
            "type": product_type.lower()
        })

    return normalized

The AI agent never needs to know about qty_avail vs stock_status vs backorder_ok. The MCP server normalizes everything.

Unexpected Benefit: Running with a Smaller Model

After implementing the MCP middleware, I could run my application with a 4b local LLM instead of GPT-4:

# Before: Needed GPT-4 for complex parsing
Cost: $0.03 per request
Latency: 2-3 seconds

# After: Local 4b model works fine
Cost: $0 (local)
Latency: 300-500ms
Privacy: Data never leaves my machine

The smaller context window of a 4b model wasn’t a problem anymore because the MCP server eliminated the bloat.

Common Mistakes I Made

Mistake 1: Passthrough MCP Server

Initially, I just wrapped the API without any transformation:

# WRONG: No value added
@server.tool()
async def get_customer(customer_id: str):
    return await legacy_api.get(customer_id)  # Just passes through

This added complexity without reducing tokens. The middleware must do meaningful work.

Mistake 2: Over-Abstracting

I tried to create a generic transformer with configuration:

# WRONG: Too generic, defeats the purpose
@server.tool()
async def transform_api(endpoint: str, fields: list[str]):
    # This just moves the complexity to the AI
    ...

The AI ended up specifying the fields anyway. Keep MCP tools specific and purposeful.

Mistake 3: Ignoring Cache Freshness

Pre-processing can lead to stale data:

# WRONG: No cache invalidation
@server.tool()
async def get_cached_data(key: str):
    return cache.get(key)  # Could be hours old

Implement appropriate cache invalidation based on your data freshness requirements.

When MCP Middleware Makes Sense

Not every API needs MCP middleware. Use it when:

API returns 10x more data than you need
Multiple API calls could be aggregated
Field names are cryptic or inconsistent
You can’t get the API changed
You want to use smaller, cheaper models

Summary

MCP middleware transforms legacy APIs into AI-friendly interfaces without requiring changes to the original API. The key benefits:

Token reduction: Up to 97% fewer tokens consumed
Cost savings: Smaller models become viable
Latency improvement: Less data to process
Organizational independence: No need to change legacy APIs
Clean agent code: MCP handles the quirks

Start by identifying your top API pain points (bloated responses, multiple calls, quirky formats). These are your MCP middleware candidates.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Model Context Protocol Specification
👨‍💻 Reddit: MCP value discussion
👨‍💻 LangChain Documentation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!