Skip to content

Does MCP Cause Context Bloat? Token Optimization Explained

The Concern

I was browsing a Reddit thread about the top 50 most popular MCP servers in 2026 when I saw this comment:

“The thing I found bloated with playwright is the amount of tokens the agent will use for simple tasks, probably fetching tons and tons of html each time”

That comment echoed something I’d heard before. MCP causes context bloat. It wastes tokens. It’s slow and expensive.

But then I read the reply:

“The context bloat accusations were overblown considering Claude can load MCP tools on demand and MCP gateways offer tool filtering”

That reply made me dig deeper. What I found changed my understanding of MCP’s token usage.

The Direct Answer

No, MCP doesn’t inherently cause context bloat. Modern MCP implementations include two key optimization features:

  1. On-demand tool loading - Claude loads tools only when needed
  2. MCP gateways - Tool filtering to expose only relevant tools

These mechanisms dramatically reduce token consumption compared to loading all available tools into context.

What is Context Bloat?

Context bloat happens when an AI model’s context window fills with unnecessary information:

  • All tool schemas from every connected MCP server
  • Full HTML responses from web scraping tools
  • Redundant data from multiple similar tools
  • Unused tool definitions consuming valuable tokens

Picture this scenario: You connect 10 MCP servers with 20 tools each. That’s 200 tool schemas injected into every request before any actual work begins. That’s thousands of tokens wasted.

But here’s the thing—that’s not how MCP actually works.

How MCP Addresses Context Bloat

On-Demand Tool Loading

Claude’s MCP implementation uses lazy loading. Tools don’t load into context until actually needed.

Traditional approach:
[All 200 tool schemas loaded] -> High token cost upfront
MCP on-demand approach:
[Tool names only] -> User requests tool -> [Load specific tool schema]
Result: Dramatically lower token usage

The process works like this:

  • Claude first sees only tool names and brief descriptions
  • Full tool schemas load only when a tool is about to be invoked
  • Unused tools never consume context space

This matches how I work as a developer. I don’t need to know how to use every tool before starting a task. I learn the tools I need, when I need them.

MCP Gateways with Tool Filtering

MCP gateways act as intermediaries that provide additional control:

┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Claude │────▶│ MCP Gateway │────▶│ MCP Servers │
│ Client │ │ (Filter) │ │ (Playwright,│
│ │ │ │ │ GitHub, │
│ │ │ - Scope │ │ etc.) │
│ │ │ - Filter │ │ │
│ │ │ - Route │ │ │
└─────────────┘ └─────────────┘ └─────────────┘

Gateways can:

  • Filter which tools are exposed to specific clients
  • Group tools by category or use case
  • Provide scoped access based on permissions
  • Reduce the tool surface area available to agents

Stateless Tool Invocation

Libraries like LangChain’s langchain-mcp-adapters use stateless MCP clients by default:

  • Each tool invocation creates a fresh MCPClientSession
  • Executes the tool
  • Cleans up the session
  • No lingering context pollution

Why This Matters

MetricWithout OptimizationWith MCP Optimization
Initial context200+ tool schemasTool names only
Token cost per requestHigh (baseline)Low (pay-per-use)
FlexibilityRigid (all or nothing)Dynamic (load as needed)
Cost efficiencyPoorExcellent

Code Examples

Example 1: Gateway Configuration for Scoped Access

gateway-config.json
{
"gateway": {
"name": "dev-tools-only",
"filters": {
"allowedTools": [
"filesystem.*",
"github.search",
"playwright.screenshot"
],
"deniedTools": ["*.delete", "*.remove"]
}
}
}

This configuration exposes only filesystem tools, GitHub search, and Playwright screenshots. Delete and remove operations are blocked.

Example 2: LangChain Stateless MCP Client

mcp_client.py
from langchain_mcp_adapters import MultiServerMCPClient
# Stateless by default - no context accumulation
client = MultiServerMCPClient({
"playwright": {
"command": "npx",
"args": ["-y", "@anthropic/mcp-server-playwright"]
}
})
# Each tool call is isolated
tools = client.get_tools()
# Tools are loaded on-demand, not all at once

Example 3: Configuring Playwright for Minimal Output

playwright_config.py
# Avoid context bloat from HTML responses
playwright_config = {
"return_format": "structured",
"extract": ["title", "main_text", "links"],
"max_length": 5000,
"exclude": ["scripts", "styles", "comments"]
}

The real bloat often comes from tool outputs, not tool definitions. A full HTML page can easily consume 50,000+ tokens. Configuring tools to return structured, minimal data prevents this.

Example 4: Claude Code MCP Scope Configuration

Adding MCP servers with scopes
# User-level scope (shared across projects)
claude mcp add filesystem -s user -- npx -y @modelcontextprotocol/server-filesystem ~/Projects
# Project-specific scope (isolated context)
claude mcp add playwright -s project -- npx @playwright/mcp@latest

User-level scope shares tools across projects. Project scope isolates tools to a single project’s context.

Common Mistakes to Avoid

Mistake 1: Connecting Too Many MCP Servers Without Filtering

Each server adds tools to the available pool. Use gateways to scope what’s visible.

Don't do this
{
"mcpServers": {
"playwright": {...},
"github": {...},
"filesystem": {...},
"slack": {...},
"database": {...},
// ... 15 more servers
}
}

This exposes hundreds of tools. Most will never be used. Filter them.

Mistake 2: Not Understanding Lazy Loading

Tools don’t consume tokens until invoked. Fear of “bloat” often comes from misunderstanding how MCP actually loads tools.

The tool definition in your config doesn’t load the full schema. Claude loads tool schemas lazily.

Mistake 3: Ignoring Tool Output Size

The real bloat culprit is tool outputs, not tool definitions:

This causes bloat
# Full HTML page = 50,000+ tokens
html = playwright.get_page_content(url)
This prevents bloat
# Structured data = 500 tokens
data = playwright.extract(url, ["title", "price", "availability"])

Mistake 4: Over-fetching with Web Scraping Tools

Playwright and Puppeteer can return massive HTML responses. Configure them to return only needed elements.

Performance Comparison

Here’s what I’ve seen in practice:

Without optimization:

  • 10 MCP servers connected
  • 200 tool schemas loaded upfront
  • 15,000 tokens per request (before any work)
  • High API costs

With on-demand loading + gateway filtering:

  • 10 MCP servers connected
  • Gateway exposes 20 relevant tools
  • 500 tokens for tool definitions
  • Tools load only when invoked
  • 90% reduction in baseline token cost

The MCP ecosystem has evolved beyond simple tool exposure. Understanding these patterns helps:

Tool Discovery Pattern

MCP servers expose tool metadata first. Full schemas load on demand. This is similar to how modern IDEs show function signatures before loading full documentation.

Gateway Pattern

MCP gateways provide a middle layer between clients and servers. Think of them as API gateways for AI tools. They handle:

  • Authentication and authorization
  • Rate limiting
  • Tool filtering
  • Audit logging

Stateless Pattern

Stateless clients prevent context accumulation. Each invocation starts fresh, executes, and cleans up. This is the default in langchain-mcp-adapters and recommended practice.

When Context Bloat Actually Happens

Context bloat in MCP is real in specific scenarios:

  1. Tool returns massive output - A web scraper returning full HTML
  2. No filtering configured - Connecting 50 servers with no gateway
  3. Tool accumulation across sessions - Using stateful clients improperly
  4. Large file reads - Reading 10MB files into context

In all these cases, the bloat comes from tool outputs or misconfiguration, not from MCP’s design.

Summary

In this post, I explained why MCP’s reputation for causing context bloat is largely undeserved. The protocol includes sophisticated mechanisms to prevent token waste:

  1. On-demand loading ensures only invoked tools consume context
  2. MCP gateways provide filtering for scoped, efficient tool access
  3. Stateless client patterns prevent context accumulation between calls

The real culprit behind “bloat” is usually unfiltered tool outputs (like full HTML pages from web scrapers), not the MCP protocol itself. By using MCP gateways and configuring tools for minimal, structured responses, developers can enjoy MCP’s flexibility without the token costs.

The Reddit thread that sparked my investigation revealed a common misunderstanding. MCP doesn’t load everything upfront. It loads what you need, when you need it.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments