What is Code Mode? How It Differs From MCP Tool Calling

Mar 13, 2026

Problem

I’ve been building AI agents that use MCP (Model Context Protocol) tool calling, and I noticed something frustrating: every tool call requires a round-trip through the LLM context. For a simple three-step operation, the model processes intermediate results three times, burning tokens and adding latency.

Then I came across Cloudflare’s “Code Mode” approach, which challenges the traditional tool-calling paradigm. The core question: Why make LLMs select and invoke tools when they’re already trained to write code?

Purpose

I want to understand whether Code Mode genuinely solves the problems with traditional tool calling, or if it’s just another approach with different trade-offs. This post documents my analysis of both approaches, the real-world implications, and when to use each.

The Core Problem with Traditional Tool Calling

When I built my first MCP-based agent, I assumed tool calling was the natural way for LLMs to interact with external systems. But I quickly ran into three issues:

1. Training Data Mismatch

LLMs are trained on terabytes of code. They understand function calls, API patterns, and procedural logic deeply. Tool calling schemas? That’s a newer paradigm that models have to learn post-training or through fine-tuning.

Cloudflare argues this creates a fundamental mismatch:

┌─────────────────────────────────────────┐
│                                         │
│   ████████████████████████████  Code    │
│   ████████ Natural Language            │
│   ███ Tool Calling Schemas             │
│                                         │
└─────────────────────────────────────────┘

I think there’s truth to this. When I ask an LLM to write code that fetches data, it rarely struggles. But when I present it with a complex tool schema and expect it to select the right tool from twenty options, accuracy drops noticeably.

2. The Round-Trip Tax

Here’s what a typical multi-step operation looks like with traditional tool calling:

User Request
    │
    ▼
┌─────────────┐
│    LLM      │ ◄── Tool Call #1
└─────────────┘         │
         ▲              ▼
         │      ┌─────────────┐
         └──────│ MCP Server  │
                └─────────────┘
                      │
                      ▼ Result #1 back to LLM
                ┌─────────────┐
                │    LLM      │ ◄── Tool Call #2
                └─────────────┘         │
                         ▲              ▼
                         │      ┌─────────────┐
                         └──────│ MCP Server  │
                                └─────────────┘
                                      │
                                      ▼ Result #2 back to LLM
                                ┌─────────────┐
                                │    LLM      │
                                └─────────────┘

Each step requires:

LLM processes context
LLM decides next action
LLM formats tool call
Tool executes
Result returns to LLM context
Repeat…

For three tool calls, that’s three full passes through the LLM. Every intermediate result sits in the context window.

3. Context Window Bloat

I tested this with a research agent that needed to:

Search documentation (Context7 MCP)
Search academic papers (ArXiv MCP)
Search code examples (GitHub MCP)

With traditional tool calling, the context looked like this:

Step 1: User query + tool schema          ~2,000 tokens
Step 2: Tool result #1 (documentation)    ~8,000 tokens
Step 3: Tool result #2 (papers)          ~6,000 tokens
Step 4: Tool result #3 (code)            ~10,000 tokens
Step 5: Final synthesis                  ~3,000 tokens
─────────────────────────────────────────────────────
Total:                                   ~29,000 tokens

Almost 70% of the context was intermediate results that the LLM just needed to “carry” to the next step.

What Code Mode Does Differently

Code Mode flips the paradigm. Instead of the LLM selecting tools and making calls, it writes code that directly consumes MCP servers as APIs. Here’s how it works:

The Architecture

// MCP server schema is converted to TypeScript types
interface WeatherAPI {
  fetch(params: { city: string }): Promise<{
    temp: number;
    conditions: string;
  }>;
}

// LLM writes code, not tool calls
async function compareCities() {
  // Direct API-style calls to MCP servers
  const [seattle, portland] = await Promise.all([
    mcp.weather.fetch({ city: "Seattle" }),
    mcp.weather.fetch({ city: "Portland" })
  ]);

  // Process in code
  const difference = Math.abs(seattle.temp - portland.temp);

  // Return only final result to LLM context
  return {
    cities: [seattle, portland],
    difference,
    warmer: seattle.temp > portland.temp ? "Seattle" : "Portland"
  };
}

The key difference: this code executes in a sandbox with direct MCP server access. The LLM doesn’t see intermediate results. It only gets the final output.

The Flow Comparison

User Request
    │
    ▼
┌─────────────┐
│    LLM      │ ──── Write code block
└─────────────┘
                      │
                      ▼
              ┌─────────────────┐
              │  Code Sandbox   │
              │  ┌───────────┐  │
              │  │ Execute  │  │
              │  │ Code      │  │
              │  └───────────┘  │
              │       │         │
              │       ▼         │
              │  ┌───────────┐  │
              │  │ Call MCP  │  │
              │  │ Server #1 │  │
              │  └───────────┘  │
              │       │         │
              │       ▼         │
              │  ┌───────────┐  │
              │  │ Call MCP  │  │
              │  │ Server #2 │  │
              │  └───────────┘  │
              │       │         │
              │       ▼         │
              │  ┌───────────┐  │
              │  │ Process   │  │
              │  │ Results   │  │
              │  └───────────┘  │
              └─────────────────┘
                      │
                      ▼ Final result only
                ┌─────────────┐
                │    LLM      │
                └─────────────┘

One round-trip. Only the final result enters the LLM context.

Token Efficiency

Same research task with Code Mode:

Step 1: User query + API types    ~2,500 tokens
Step 2: Final result only          ~4,000 tokens
─────────────────────────────────────────────────
Total:                             ~6,500 tokens

That’s a 77% reduction in context usage for the same operation.

Real-World Example: Research Agent

Let me show you both approaches side by side.

Traditional Tool Calling Approach

async def research_traditional(agent, topic: str):
    # Round trip 1: Search documentation
    docs = await agent.call_tool(
        "context7_search",
        {"query": topic}
    )
    # docs is now in LLM context

    # Round trip 2: Search papers
    papers = await agent.call_tool(
        "arxiv_search",
        {"query": topic}
    )
    # papers is now in LLM context

    # Round trip 3: Search code examples
    code_examples = await agent.call_tool(
        "github_search_code",
        {"query": topic}
    )
    # code_examples is now in LLM context

    # Round trip 4: Synthesize (LLM processes all above)
    result = await agent.generate(
        f"Synthesize research on {topic} from:\n"
        f"Docs: {docs}\n"
        f"Papers: {papers}\n"
        f"Code: {code_examples}"
    )

    return result

Four round-trips. Every intermediate result passes through the LLM.

Code Mode Approach

// LLM writes this code block
async function researchTopic(topic: string) {
  // Parallel calls - executed in sandbox
  const [docs, papers, examples] = await Promise.all([
    mcp.context7.search({ query: topic }),
    mcp.arxiv.search({ query: topic }),
    mcp.github.searchCode({ query: topic })
  ]);

  // Process results in code, not LLM context
  const synthesized = {
    documentation: docs
      .filter(d => d.verified)
      .map(d => ({
        title: d.title,
        url: d.url,
        relevance: d.score
      })),

    papers: papers
      .slice(0, 5)
      .map(p => ({
        title: p.title,
        authors: p.authors,
        abstract: p.abstract.slice(0, 200)
      })),

    codeExamples: examples
      .slice(0, 3)
      .map(e => ({
        repo: e.repository,
        file: e.path,
        snippet: e.code.slice(0, 500)
      }))
  };

  // Only this returns to LLM context
  return synthesized;
}

One round-trip. The LLM only sees the final, cleaned result.

Counterpoints: Why Traditional Tool Calling Still Matters

After experimenting with both approaches, I don’t think Code Mode is a wholesale replacement. Here’s why:

1. Reasoning Between Steps

Sometimes you need the LLM to reason about intermediate results before deciding the next step:

async def diagnose_issue(agent, error: str):
    # Step 1: Search documentation
    docs = await agent.call_tool("search_docs", {"query": error})

    # LLM needs to analyze docs and decide:
    # - Is this a known issue?
    # - Do I need to search StackOverflow?
    # - Should I check the GitHub issues?

    analysis = await agent.analyze(
        f"Based on these docs: {docs}, "
        f"what's the likely cause of {error}?"
    )

    # LLM decides next action based on analysis
    if analysis.needs_community_help:
        community = await agent.call_tool(
            "search_stackoverflow",
            {"query": error}
        )
        return synthesize(docs, community)

    return docs

Code Mode can’t easily do this because the LLM doesn’t see intermediate results.

2. Error Handling and Retry Logic

With traditional tool calling, the LLM can see errors and adjust:

async def fetch_with_retry(agent, url: str):
    result = await agent.call_tool("fetch", {"url": url})

    if result.error:
        # LLM sees the error, reasons about it
        if result.error == "rate_limit":
            await agent.wait(60)
            return await agent.call_tool("fetch", {"url": url})
        elif result.error == "not_found":
            return None

    return result

In Code Mode, error handling must be pre-programmed in the code block, not dynamically reasoned about.

3. Tool Design Matters More Than Protocol

One Reddit commenter made a sharp observation:

“Blaming the protocol for bad prompt engineering is like blaming HTTP because your API has confusing endpoints.”

I think this is key. A well-designed tool schema with clear names and good documentation will work well with traditional calling. A poorly designed schema will fail regardless of Code Mode or traditional approach.

Implementation: Setting Up Code Mode

If you want to experiment with Code Mode, here’s a basic setup:

Define Your MCP Server Schema

import { z } from "zod";

const weatherServer = {
  name: "weather",
  tools: {
    fetch: {
      description: "Fetch current weather for a city",
      parameters: z.object({
        city: z.string().describe("City name"),
        units: z.enum(["celsius", "fahrenheit"]).optional()
      }),
      returns: z.object({
        temp: z.number(),
        conditions: z.string(),
        humidity: z.number(),
        wind: z.number()
      })
    },

    forecast: {
      description: "Get weather forecast for a city",
      parameters: z.object({
        city: z.string(),
        days: z.number().min(1).max(7)
      }),
      returns: z.array(z.object({
        date: z.string(),
        high: z.number(),
        low: z.number(),
        conditions: z.string()
      }))
    }
  }
};

Generate TypeScript API Types

// Auto-generated from MCP schema
interface WeatherAPI {
  fetch(params: {
    city: string;
    units?: "celsius" | "fahrenheit";
  }): Promise<{
    temp: number;
    conditions: string;
    humidity: number;
    wind: number;
  }>;

  forecast(params: {
    city: string;
    days: number;
  }): Promise<Array<{
    date: string;
    high: number;
    low: number;
    conditions: string;
  }>>;
}

LLM Writes Code Against This API

// The LLM writes this code block
async function planTrip(city: string, days: number) {
  // Current conditions
  const current = await mcp.weather.fetch({
    city,
    units: "fahrenheit"
  });

  // Forecast
  const forecast = await mcp.weather.forecast({
    city,
    days: Math.min(days, 7)
  });

  // Process in code
  const rainyDays = forecast.filter(
    day => day.conditions.includes("rain")
  ).length;

  const avgTemp = forecast.reduce(
    (sum, day) => sum + (day.high + day.low) / 2,
    0
  ) / forecast.length;

  return {
    currentConditions: current.conditions,
    currentTemp: current.temp,
    forecastDays: forecast.length,
    rainyDays,
    averageTemperature: Math.round(avgTemp),
    packingSuggestions: generatePackingList(current, forecast, rainyDays)
  };
}

function generatePackingList(
  current: WeatherData,
  forecast: ForecastDay[],
  rainyDays: number
): string[] {
  const items: string[] = [];

  if (rainyDays > 0) {
    items.push("umbrella", "waterproof jacket");
  }

  const maxTemp = Math.max(...forecast.map(d => d.high));
  if (maxTemp > 80) {
    items.push("sunscreen", "light clothing");
  } else if (maxTemp < 50) {
    items.push("warm layers", "gloves");
  }

  return items;
}

When to Use Each Approach

Based on my experiments, here’s my decision framework:

Use Code Mode When:

Batch Operations: Multiple independent operations that don’t require LLM reasoning between steps
Clear Procedural Logic: When the workflow can be expressed as code
Token Efficiency Matters: Long conversations or large result sets
Latency Sensitive: Need to minimize round-trips

Use Traditional Tool Calling When:

Reasoning Required: Need LLM to analyze intermediate results
Dynamic Decision Making: Next step depends on previous result analysis
Error Recovery: LLM should handle and retry from errors
Simple Operations: Single tool call, no complex workflow

Hybrid Approach

I’ve found the best pattern is to use both:

// Complex research with hybrid approach
async function hybridResearch(topic: string) {
  // Use Code Mode for batch data gathering
  const data = await codeModeExecute(async () => {
    const [docs, papers, news] = await Promise.all([
      mcp.context7.search({ query: topic }),
      mcp.arxiv.search({ query: topic }),
      mcp.news.search({ query: topic })
    ]);

    return { docs, papers, news };
  });

  // Use traditional calling for reasoning steps
  const analysis = await agent.call_tool("analyze", {
    data,
    instruction: "Identify contradictions and knowledge gaps"
  });

  // Code Mode for action
  await codeModeExecute(async () => {
    if (analysis.knowledgeGaps.length > 0) {
      await mcp.tasks.create({
        type: "research",
        gaps: analysis.knowledgeGaps
      });
    }
  });
}

Security Considerations

Code execution requires careful sandboxing. Here’s what I implemented:

const sandboxConfig = {
  // Resource limits
  maxExecutionTime: 30000,      // 30 seconds
  maxMemoryMB: 256,
  maxFileSize: 10 * 1024 * 1024, // 10MB

  // Network restrictions
  allowedDomains: [
    "api.context7.com",
    "export.arxiv.org",
    "api.github.com"
  ],

  // MCP server permissions
  allowedTools: [
    "context7.search",
    "arxiv.search",
    "github.searchCode"
  ],

  // No filesystem access
  filesystem: "none",

  // No subprocess execution
  subprocesses: false
};

Without these restrictions, a malicious prompt could generate code that:

Exfiltrates data
Makes unauthorized API calls
Consumes excessive resources

Common Mistakes

I made these mistakes when first implementing Code Mode:

1. Not Validating MCP Server Responses

// WRONG: Trust everything from MCP server
async function badExample() {
  const result = await mcp.external.fetch({ url: userInput });
  // What if result contains malicious data?
  return eval(result.code);  // NEVER do this
}

// CORRECT: Validate with Zod
async function goodExample() {
  const result = await mcp.external.fetch({ url: userInput });
  const validated = SafeResponseSchema.parse(result);
  return validated;
}

2. Ignoring Rate Limits

// WRONG: Parallel calls might hit rate limits
const results = await Promise.all([
  mcp.api.call({ query: "a" }),
  mcp.api.call({ query: "b" }),
  mcp.api.call({ query: "c" }),
  mcp.api.call({ query: "d" }),
  mcp.api.call({ query: "e" })
]);

// CORRECT: Batch with rate limit awareness
const results = await batchWithRateLimit(
  queries.map(q => () => mcp.api.call({ query: q })),
  { maxConcurrent: 3, delayMs: 100 }
);

3. Over-Engineering Simple Operations

// WRONG: Code Mode for simple single call
async function overEngineered() {
  await codeModeExecute(async () => {
    return await mcp.weather.fetch({ city: "Seattle" });
  });
}

// CORRECT: Traditional calling for simple operations
const weather = await agent.call_tool("weather_fetch", {
  city: "Seattle"
});

Summary

In this post, I explored the difference between Cloudflare’s Code Mode and traditional MCP tool calling. The key insight is that Code Mode treats MCP servers as APIs that LLMs program against, rather than tools they must select and invoke.

Code Mode reduces token waste and latency by executing multiple operations in a single code block without intermediate LLM review. It leverages the fact that LLMs are heavily trained on code patterns.

However, traditional tool calling still has value when you need:

LLM reasoning between steps
Dynamic error handling
Simple, single operations

The best approach is likely hybrid: use Code Mode for batch data gathering and processing, use traditional calling for decision points that require LLM judgment.

The debate isn’t MCP vs Code Mode—they’re complementary. MCP provides the protocol and server ecosystem; Code Mode provides an execution pattern that’s more efficient for certain workloads. As with most engineering decisions, the right choice depends on your specific use case.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Cloudflare Blog: Code Mode
👨‍💻 Reddit Discussion on r/LocalLLaMA

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!