How MCP Tool Definitions Inflate Your AI Agent Token Costs

Apr 24, 2026

I connected four MCP servers to my Claude Code agent and watched my token costs explode. Before I typed a single prompt, 7,000 tokens were already consumed. A colleague with a heavy configuration burned through 66,000 tokens—nearly a third of Claude Sonnet’s 200k context window—before the conversation even started.

The culprit? MCP’s automatic tool schema injection on every conversation turn, whether you use those tools or not.

The Hidden Cost of Tool Discovery

When you connect an MCP server to Claude Code or any MCP-compatible agent, the protocol automatically loads all tool schemas into context. This includes:

Tool name and description
Input parameter definitions (JSON schema)
Output format specifications
Usage examples and constraints

Critical insight: These definitions reload on every conversation turn, not just when you invoke the tool.

Let me show you the real token costs I measured:

MCP Server	Tools	Token Overhead	Per-Tool Avg
Playwright	22	~3,442	156
Gmail	7	~2,640	377
Jira	varies	~17,000	-
mcp-omnisearch	20	~14,100	705
GitHub MCP	varies	~8,000-12,000	-
Codex	2	610	305
SQLite	6	385	64

Notice the pattern? A single Gmail tool (gmail_create_draft) costs 820 tokens alone—more than the entire Codex server with 2 tools. Complex tools with nested parameters cost 5-10x more than simple ones.

Within the same server, I found a 6x difference: Playwright’s screenshot tool costs 370 tokens, while close browser costs just 59 tokens.

The Cumulative Impact Over Time

The real cost compounds over conversation turns:

Scenario	Tools	Turns	Total Schema Tokens
Light usage	30	15	54,450
Moderate	80	20	193,240
Heavy	120	25	362,350
Enterprise API	200 endpoints	25	358,425

One developer reported 66,000 tokens consumed at conversation start—a third of Claude Sonnet’s 200k context window gone before typing the first prompt.

Why This Matters: Three Hidden Costs

1. Direct Token Billing

At Claude Sonnet 4.6 pricing ($3/MTok input):

100 tools × 20 turns × 1,000 daily conversations = 242M tokens/day
Monthly cost: ~$21,000 just for schema overhead

2. Context Window Contamination

Excessive tool definitions crowd out actual task context:

Less room for code snippets, documents, conversation history
Model reasoning quality degrades with “noisy” context

3. Latency Impact

Larger prompts take longer to process:

Each turn processes thousands of unused tool definitions
Round-trip latency increases proportionally

Three Strategies to Reduce Overhead

Strategy 1: Tool Search/Deferral (Built-in Optimization)

Claude Code implements automatic tool deferral when schemas exceed a threshold (default: 10% of context window). Deferred tools load on-demand only when invoked.

Measured savings: 13.2k tokens saved in a typical session.

This is the easiest approach—just ensure you’re running the latest Claude Code version.

Strategy 2: Code Execution Mode (Anthropic’s Official Approach)

Instead of direct tool calls, write code that calls tools programmatically. Anthropic reports up to 98.7% reduction in context overhead.

Here’s how it works:

# Instead of Claude knowing all tools upfront
# Write code that discovers tools on demand
import subprocess

# List available tools (~16 tokens per tool)
tools = subprocess.run(['mcp-tool', '--list'], capture_output=True, text=True)

# Get help for specific tool when needed (~120 tokens)
help = subprocess.run(['mcp-tool', '--help', 'gmail_create_draft'], capture_output=True, text=True)

The agent only loads tool schemas when code explicitly requests them.

Strategy 3: CLI-Based On-Demand Discovery (mcp2cli)

A newer approach called mcp2cli replaces preloaded schemas with CLI-based discovery:

--list shows tool names (~16 tokens per tool)
--help fetches full schema only when needed (~120 tokens)

Token savings measured:

Tools	Turns	Native MCP	mcp2cli	Savings
30	15	54,525	2,309	96%
80	20	193,240	3,871	98%
120	25	362,350	5,181	99%

Strategy 4: Workflow Layer Orchestration

Move tool routing outside the prompt entirely:

Agent emits structured output (intent, action type)
LangGraph/workflow layer routes to appropriate tool
Agent never sees 40+ tool definitions

This pattern from the Reddit discussion:

“Running through Latenode, the agent stays lean. The reliability lives outside the prompt.”

The agent doesn’t need to “know” about 40 tools—it emits structured output, and the graph routes execution.

When MCP Overhead Is Worth It

Despite the costs, MCP provides value when:

Tool count is low (<20 tools, simple schemas)
Tools are frequently used (most tools invoked on most turns)
Standardization matters (ecosystem compatibility, shared servers)
Discovery is needed (agent must choose from available tools dynamically)

Decision Matrix: Which Approach to Use

Tool Count	Turns/Session	Recommended Approach
<20	Any	Native MCP (costs acceptable)
20-50	<10	Native MCP with deferral
20-50	>10	Code execution or mcp2cli
50-100	>15	Code execution (official approach)
>100	>20	Workflow routing + mcp2cli hybrid
Any	Any (rarely used)	Disconnect, use native CLI

Common Mistakes to Avoid

Connecting every available MCP server: I did this initially. Disconnect unused servers immediately.
Ignoring tool schema complexity: Complex nested parameters cost 5-10x more. Audit your tool schemas.
Not measuring: Use /context to see actual token breakdown. I was shocked by my first measurement.
Over-optimizing for small setups: mcp2cli adds overhead for <20 tools. Don’t over-engineer.

How to Measure Your Current Overhead

In Claude Code, use the /context command to see your actual token breakdown:

/context

This shows:

Total tokens consumed
Breakdown by category (tools, conversation, documents)
Current context window usage percentage

I recommend measuring before and after disconnecting servers to see the real impact.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!