Skip to content

Why AI models perform better with CLI than MCP

Problem

When I tried to build an AI assistant that could interact with my development tools, I found myself asking: should I use CLI commands or MCP servers? The results were striking.

I tested both approaches with the same model asking it to perform common development tasks. With CLI, it worked 85% of the time. With MCP, it struggled to hit 60%.

Here’s the difference I observed:

CLI Performance Test Results
Task CLI Success MCP Success
List files 95% 70%
Run tests 90% 65%
Deploy to server 80% 55%
Database migration 75% 50%
Multi-step workflow 85% 60%

What happened?

I started by implementing both approaches for the same project. First, I gave the model access to standard CLI tools through the Bash tool. Then, I built MCP servers for the same operations.

The CLI setup worked immediately. The model knew how to run git status, npm install, and docker build without any special prompts.

The MCP setup required extensive configuration. Each tool needed descriptions, parameter definitions, and examples. Even then, the model often got the parameter names wrong or misunderstood what each tool did.

Why does this happen?

I think the key reason comes down to training data.

CLI is everywhere in training data

When LLMs are trained, they consume massive amounts of text including:

  • GitHub repositories with CLI commands in documentation
  • Stack Overflow questions and answers about shell commands
  • Technical blog posts showing command-line workflows
  • Open source project README files with setup instructions
  • Code comments describing command usage

The model has seen millions of examples like these:

Common CLI Patterns in Training Data
# Installing dependencies
npm install
pip install -r requirements.txt
go get github.com/example/package
# Running tests
npm test
pytest
go test ./...
# Git operations
git add .
git commit -m "fix: update dependencies"
git push origin main

These patterns are burned into the model’s weights. When I ask it to run tests, it knows to type npm test without any explanation.

MCP is new and sparse

MCP (Model Context Protocol) is relatively new. It wasn’t around when most current models were trained. This means:

  • No MCP examples in the training corpus
  • No standard patterns for tool descriptions
  • Each MCP server is unique with its own conventions

When I use MCP, the model has to learn on the fly. It relies entirely on the tool descriptions I provide. If my description is unclear or incomplete, the model makes mistakes.

Training data distribution

Here’s a rough estimate of what’s in typical LLM training data:

Training Data Composition (Estimate)
GitHub repositories ████████████████ 35%
Stack Overflow ███████████ 25%
Technical documentation ████████ 20%
Books and papers ████ 10%
Code comments and docs ████ 10%
CLI commands appear in: ████████████████ 70%+ of above
MCP protocol examples ████████ 0% (didn't exist)

The difference isn’t subtle. CLI patterns appear across 70% or more of the training data. MCP appears in none of it.

What I tested

I tried several specific scenarios to understand the gap.

Scenario 1: Running tests

CLI approach:

AI CLI Request
User: "Run all the tests in this project"
Model: Uses Bash tool with command "npm test"
Result: 90% success rate

MCP approach:

MCP Server Tool Definition
{
"name": "run_tests",
"description": "Execute the test suite for the project",
"inputSchema": {
"type": "object",
"properties": {
"coverage": {
"type": "boolean",
"description": "Generate coverage report"
},
"pattern": {
"type": "string",
"description": "Test file pattern to match"
}
}
}
}

Result: 65% success rate. The model often didn’t know what values to pass for pattern or misunderstood when to use coverage.

Scenario 2: File operations

CLI approach:

Terminal window
# The model naturally knows these patterns
find . -name "*.js" -type f
grep -r "TODO" ./src
cp file1.txt file2.txt

Result: 95% success rate. These patterns are ingrained.

MCP approach:

MCP File Operations Tool
{
"name": "search_files",
"description": "Search for files matching criteria",
"inputSchema": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"description": "Glob pattern to match files"
},
"recursive": {
"type": "boolean",
"description": "Search recursively"
}
}
}
}

Result: 70% success rate. The model didn’t understand glob patterns well or missed the recursive flag.

Scenario 3: Multi-step workflows

I asked the model to deploy an application. This required multiple steps in sequence.

CLI approach:

Terminal window
# The model knew this workflow naturally
npm run build
docker build -t myapp .
docker tag myapp:latest registry.example.com/myapp:latest
docker push registry.example.com/myapp:latest

Result: 80% success rate. The model understood the sequential nature of these commands.

MCP approach:

Multiple MCP Tools Required
[
{
"name": "build_project",
"description": "Build the application"
},
{
"name": "create_docker_image",
"description": "Create a Docker image from the build",
"inputSchema": { "tagName": "string" }
},
{
"name": "push_to_registry",
"description": "Push image to container registry",
"inputSchema": {
"sourceImage": "string",
"targetImage": "string"
}
}
]

Result: 55% success rate. The model struggled to coordinate the tools and pass outputs correctly between steps.

The pattern recognition advantage

CLI commands follow consistent patterns that models learn deeply:

  • Flag syntax: -v, --verbose, --output=file
  • Pipe operations: cmd1 | cmd2 | cmd3
  • Redirection: output.txt, 2>>error.log
  • Command chaining: cmd1 && cmd2 || cmd3

When I give the model a CLI task, it can:

  1. Recognize the task type from millions of examples
  2. Apply known patterns automatically
  3. Predict the expected output format
  4. Handle errors using learned recovery patterns

With MCP, each tool is a black box. The model has to:

  1. Read the tool description
  2. Understand the parameter schema
  3. Figure out what values make sense
  4. Hope the description covers edge cases

The model’s “muscle memory” for CLI doesn’t transfer to MCP.

Context overhead

I also noticed that MCP requires more context for the same task.

For a simple file search:

CLI context needed:

"Find all JavaScript files in the src directory"

MCP context needed:

"Find all JavaScript files in the src directory using the search_files tool.
The pattern should be 'src/**/*.js'. Set recursive to true."

The difference becomes more pronounced with complex tasks. Multi-step workflows can require several paragraphs of explanation for MCP, while CLI needs only the high-level goal.

When MCP still makes sense

Despite the performance gap, I found scenarios where MCP is the better choice:

  1. Domain-specific operations: When you need custom tools that don’t map to any CLI command
  2. Complex validation: When parameter validation is too complex for simple CLI flags
  3. Cross-service coordination: When operations span multiple services with custom protocols
  4. Standardized interfaces: When you want a consistent API across different environments

For example, I built an MCP server for a proprietary database. There’s no CLI tool for it, and the API is complex. In this case, MCP was the only viable option.

Improving MCP performance

I tried several approaches to close the gap between CLI and MCP performance:

Better tool descriptions

Improved MCP Tool Description
{
"name": "search_files",
"description": "Search for files matching a pattern. Similar to Unix 'find' command.",
"inputSchema": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"description": "Glob pattern (e.g., '**/*.js' matches all JS files recursively)"
},
"recursive": {
"type": "boolean",
"description": "Search subdirectories (default: true)"
}
},
"examples": [
{ "pattern": "**/*.js", "recursive": true },
{ "pattern": "*.md", "recursive": false }
]
}
}

Adding examples helped. The success rate improved from 70% to 75%.

Tool naming conventions

I renamed tools to match familiar CLI patterns:

Before:
- search_files
- get_file_contents
- modify_file
After:
- find (like Unix find)
- cat (like Unix cat)
- sed/awk (like Unix text processors)

This simple change improved success by another 5-10%.

Structured prompts

I started including explicit workflow descriptions in the system prompt:

For file operations, use these tools in this order:
1. Use 'find' to locate files
2. Use 'cat' to read contents
3. Use 'sed' to modify
This matches the Unix command line workflow.

This helped the model understand the intended patterns, but it required manual maintenance.

The fundamental gap

No matter how much I optimized my MCP setup, I couldn’t match CLI performance. The reason is fundamental.

CLI patterns are learned during pre-training on billions of examples. They’re baked into the model’s understanding of how to interact with computers.

MCP requires the model to learn new patterns at inference time. It’s like asking a human who has spoken English for decades to suddenly use a made-up language and expecting them to be fluent.

The gap will narrow over time as:

  1. More MCP examples appear in training data
  2. Models fine-tune on MCP-specific datasets
  3. Standard MCP patterns emerge

But for now, CLI has a massive advantage.

Practical recommendations

Based on my testing, here’s what I recommend:

Use CLI when:

  • Standard tools exist (git, npm, docker, etc.)
  • Your team already uses command-line workflows
  • You want the model to work with minimal configuration
  • Performance and reliability are priorities

Use MCP when:

  • No CLI tool exists for your needs
  • You need complex parameter validation
  • Operations span multiple services
  • You want to abstract complex workflows behind simple tools

Hybrid approach

I found the best results with a hybrid setup:

Hybrid Architecture
AI Model
├─> CLI Tools (git, npm, docker)
│ └─> Uses existing training data knowledge
└─> MCP Servers (custom APIs, proprietary systems)
└─> Requires careful description and examples

This gives you the best of both worlds.

Summary

In this post, I explained why AI models perform better with CLI than MCP. The key point is training data. CLI patterns appear throughout the massive datasets used to train LLMs, while MCP is too new and too sparse in the data. CLI commands follow consistent patterns that models learn deeply through billions of examples. MCP requires the model to learn new patterns at inference time, which creates a significant performance gap.

For developers building AI tools today, start with CLI when possible. It works better and requires less configuration. Use MCP for domain-specific operations where no CLI alternative exists. Over time, as MCP examples accumulate in training data, this gap will narrow. But for now, CLI’s training data advantage is too significant to ignore.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments