MCP vs CLI for AI Agent Tools: When to Use Which?

Mar 17, 2026

I’ve been building AI agents for a while now. And I kept running into the same question: should my tools use MCP or CLI?

Turns out, the answer matters for both cost and capability. Let me break down what I found.

The Core Tension

MCP (Model Context Protocol) gives you tool discovery and structured I/O. CLI gives you Unix composability and zero overhead.

+------------------+     +------------------+
|   MCP Approach   |     |   CLI Approach   |
+------------------+     +------------------+
| ~100 tokens      |     | 0 tokens         |
| to load schemas  |     | to start         |
|                  |     |                  |
| +50 tokens       |     | Just the command |
| per call after   |     | itself           |
|                  |     |                  |
| Rich metadata    |     | Unix pipes work  |
| Typed JSON       |     | Grep/jq/sed      |
+------------------+     +------------------+

When CLI Wins

I use CLI for three specific scenarios:

1. One-off Queries

If I’m running a single lookup, CLI is strictly cheaper. No schema loading means zero overhead.

#!/usr/bin/env bash
# No MCP setup needed - just run it
curl -s "https://api.example.com/status" | jq '.health'

2. Sub-Agent Tasks

Here’s the key insight: sub-agents cannot access MCP tools.

Only the main orchestrator has MCP access. When you spawn a sub-agent, it can only use CLI commands.

from langchain.agents import create_react_agent

# Sub-agent gets CLI tools only
tools = [
    ShellTool(),      # CLI commands
    BashProcessTool() # Shell pipelines
]

# MCP tools won't work here!
# The sub-agent can't see vault_remember or other MCP tools

3. Pipeline Transformations

When I need grep, awk, sed, or jq in the middle of a workflow, CLI is natural.

# Find all TODO comments, extract file:line, sort
grep -rn "TODO" ./src | awk -F: '{print $1":"$2}' | sort | head -20

When MCP Wins

MCP shines in long-lived orchestrator flows. Here’s why:

1. Tool Discovery

Claude sees all available tools with typed parameters and rich docstrings. No need to explain how a tool works.

from mcp.server import Server
from mcp.types import Tool, TextContent

server = Server("my-tools")

@server.list_tools()
async def list_tools():
    return [
        Tool(
            name="vault_remember",
            description="Store a key-value pair for later retrieval in the session",
            inputSchema={
                "type": "object",
                "properties": {
                    "key": {
                        "type": "string",
                        "description": "Unique identifier for the memory"
                    },
                    "value": {
                        "type": "string",
                        "description": "Content to remember"
                    }
                },
                "required": ["key", "value"]
            }
        )
    ]

Claude reads this schema once (~100 tokens), then knows exactly how to call vault_remember.

2. Structured Output

MCP returns typed JSON. Claude parses it natively. No string munging.

@server.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "search_docs":
        results = await search_index(arguments["query"])

        # Structured JSON output
        return {
            "results": results,
            "count": len(results),
            "query": arguments["query"]
        }

3. Multi-Turn Efficiency

After the initial schema load, each MCP call costs only ~50 tokens overhead.

Session token breakdown:
-------------------------------
Initial schema load:    ~100 tokens
Call #1 (search):       ~50 tokens overhead
Call #2 (remember):     ~50 tokens overhead
Call #3 (retrieve):     ~50 tokens overhead
-------------------------------
Total for 3 calls:      ~250 tokens

For multi-turn sessions, this is efficient.

4. Semantic Clarity

Tool names like vault_remember communicate intent. The model understands what the tool does without explanation.

The Hybrid Architecture

I settled on a hybrid approach:

+------------------------------------------+
|           Main Orchestrator              |
|         (Uses MCP Tools)                 |
+------------------------------------------+
        |                    |
        v                    v
+---------------+    +---------------+
| MCP Tools     |    | Spawn Sub-    |
| - discovery   |    | Agents        |
| - structured  |    | (CLI only)    |
| - context     |    +---------------+
+---------------+           |
                            v
                    +---------------+
                    | CLI Tools     |
                    | - pipelines   |
                    | - one-shots   |
                    | - grep/jq/sed |
                    +---------------+

Main Orchestrator with MCP

from mcp import Client

async def run_orchestrator():
    async with Client("my-mcp-server") as client:
        # MCP for structured, long-lived operations
        docs = await client.call_tool("search_docs", {
            "query": "authentication best practices"
        })

        await client.call_tool("vault_remember", {
            "key": "research_topic",
            "value": docs["results"][0]["title"]
        })

        # Spawn sub-agent for CLI work
        result = await spawn_sub_agent(
            task="Analyze these docs for security issues",
            context=docs
        )

Sub-Agent with CLI

import subprocess

def analyze_with_cli(docs_path: str):
    # CLI advantage: compose with standard tools
    result = subprocess.run([
        "bash", "-c",
        f"cat {docs_path} | grep -i 'security' | wc -l"
    ], capture_output=True, text=True)

    return int(result.stdout.strip())

Decision Matrix

Scenario	Use MCP	Use CLI
Long orchestrator flow	Yes	No
One-shot query	No	Yes
Sub-agent task	N/A (can’t)	Yes
Need grep/jq/sed	No	Yes
Tool discovery needed	Yes	No
Structured output	Yes	Parse yourself
Multi-turn session	Yes	No

Real Token Costs

I measured actual costs in a production system:

MCP Session (10 tool calls):
- Schema load:     100 tokens
- 10 calls @ 50:   500 tokens
- Total:           600 tokens

Equivalent CLI Session:
- 10 shell commands: ~200 tokens
- Output parsing:    ~100 tokens (Claude reads stdout)
- Total:             ~300 tokens

CLI is cheaper for simple operations. But when I factor in:

Claude understanding tool semantics
No string parsing errors
Type safety

The MCP overhead pays for itself in complex flows.

Key Takeaway

A Reddit practitioner put it well:

“MCP for long-lived orchestrator flows, CLI for sub-agents and quick one-shot jobs. Claude handled typed args way better over MCP, while CLI was nicer when I needed pipes, grep, or jq in the middle.”

I follow this rule now. My main agents use MCP. Sub-agents and quick lookups use CLI. Simple, effective, and cost-aware.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!