Why AI models perform better with CLI than MCP
Problem
When I tried to build an AI assistant that could interact with my development tools, I found myself asking: should I use CLI commands or MCP servers? The results were striking.
I tested both approaches with the same model asking it to perform common development tasks. With CLI, it worked 85% of the time. With MCP, it struggled to hit 60%.
Here’s the difference I observed:
Task CLI Success MCP SuccessList files 95% 70%Run tests 90% 65%Deploy to server 80% 55%Database migration 75% 50%Multi-step workflow 85% 60%What happened?
I started by implementing both approaches for the same project. First, I gave the model access to standard CLI tools through the Bash tool. Then, I built MCP servers for the same operations.
The CLI setup worked immediately. The model knew how to run git status, npm install, and docker build without any special prompts.
The MCP setup required extensive configuration. Each tool needed descriptions, parameter definitions, and examples. Even then, the model often got the parameter names wrong or misunderstood what each tool did.
Why does this happen?
I think the key reason comes down to training data.
CLI is everywhere in training data
When LLMs are trained, they consume massive amounts of text including:
- GitHub repositories with CLI commands in documentation
- Stack Overflow questions and answers about shell commands
- Technical blog posts showing command-line workflows
- Open source project README files with setup instructions
- Code comments describing command usage
The model has seen millions of examples like these:
# Installing dependenciesnpm installpip install -r requirements.txtgo get github.com/example/package
# Running testsnpm testpytestgo test ./...
# Git operationsgit add .git commit -m "fix: update dependencies"git push origin mainThese patterns are burned into the model’s weights. When I ask it to run tests, it knows to type npm test without any explanation.
MCP is new and sparse
MCP (Model Context Protocol) is relatively new. It wasn’t around when most current models were trained. This means:
- No MCP examples in the training corpus
- No standard patterns for tool descriptions
- Each MCP server is unique with its own conventions
When I use MCP, the model has to learn on the fly. It relies entirely on the tool descriptions I provide. If my description is unclear or incomplete, the model makes mistakes.
Training data distribution
Here’s a rough estimate of what’s in typical LLM training data:
GitHub repositories ████████████████ 35%Stack Overflow ███████████ 25%Technical documentation ████████ 20%Books and papers ████ 10%Code comments and docs ████ 10%
CLI commands appear in: ████████████████ 70%+ of aboveMCP protocol examples ████████ 0% (didn't exist)The difference isn’t subtle. CLI patterns appear across 70% or more of the training data. MCP appears in none of it.
What I tested
I tried several specific scenarios to understand the gap.
Scenario 1: Running tests
CLI approach:
User: "Run all the tests in this project"Model: Uses Bash tool with command "npm test"Result: 90% success rateMCP approach:
{ "name": "run_tests", "description": "Execute the test suite for the project", "inputSchema": { "type": "object", "properties": { "coverage": { "type": "boolean", "description": "Generate coverage report" }, "pattern": { "type": "string", "description": "Test file pattern to match" } } }}Result: 65% success rate. The model often didn’t know what values to pass for pattern or misunderstood when to use coverage.
Scenario 2: File operations
CLI approach:
# The model naturally knows these patternsfind . -name "*.js" -type fgrep -r "TODO" ./srccp file1.txt file2.txtResult: 95% success rate. These patterns are ingrained.
MCP approach:
{ "name": "search_files", "description": "Search for files matching criteria", "inputSchema": { "type": "object", "properties": { "pattern": { "type": "string", "description": "Glob pattern to match files" }, "recursive": { "type": "boolean", "description": "Search recursively" } } }}Result: 70% success rate. The model didn’t understand glob patterns well or missed the recursive flag.
Scenario 3: Multi-step workflows
I asked the model to deploy an application. This required multiple steps in sequence.
CLI approach:
# The model knew this workflow naturallynpm run builddocker build -t myapp .docker tag myapp:latest registry.example.com/myapp:latestdocker push registry.example.com/myapp:latestResult: 80% success rate. The model understood the sequential nature of these commands.
MCP approach:
[ { "name": "build_project", "description": "Build the application" }, { "name": "create_docker_image", "description": "Create a Docker image from the build", "inputSchema": { "tagName": "string" } }, { "name": "push_to_registry", "description": "Push image to container registry", "inputSchema": { "sourceImage": "string", "targetImage": "string" } }]Result: 55% success rate. The model struggled to coordinate the tools and pass outputs correctly between steps.
The pattern recognition advantage
CLI commands follow consistent patterns that models learn deeply:
- Flag syntax:
-v,--verbose,--output=file - Pipe operations:
cmd1 | cmd2 | cmd3 - Redirection:
output.txt,2>>error.log - Command chaining:
cmd1 && cmd2 || cmd3
When I give the model a CLI task, it can:
- Recognize the task type from millions of examples
- Apply known patterns automatically
- Predict the expected output format
- Handle errors using learned recovery patterns
With MCP, each tool is a black box. The model has to:
- Read the tool description
- Understand the parameter schema
- Figure out what values make sense
- Hope the description covers edge cases
The model’s “muscle memory” for CLI doesn’t transfer to MCP.
Context overhead
I also noticed that MCP requires more context for the same task.
For a simple file search:
CLI context needed:
"Find all JavaScript files in the src directory"MCP context needed:
"Find all JavaScript files in the src directory using the search_files tool.The pattern should be 'src/**/*.js'. Set recursive to true."The difference becomes more pronounced with complex tasks. Multi-step workflows can require several paragraphs of explanation for MCP, while CLI needs only the high-level goal.
When MCP still makes sense
Despite the performance gap, I found scenarios where MCP is the better choice:
- Domain-specific operations: When you need custom tools that don’t map to any CLI command
- Complex validation: When parameter validation is too complex for simple CLI flags
- Cross-service coordination: When operations span multiple services with custom protocols
- Standardized interfaces: When you want a consistent API across different environments
For example, I built an MCP server for a proprietary database. There’s no CLI tool for it, and the API is complex. In this case, MCP was the only viable option.
Improving MCP performance
I tried several approaches to close the gap between CLI and MCP performance:
Better tool descriptions
{ "name": "search_files", "description": "Search for files matching a pattern. Similar to Unix 'find' command.", "inputSchema": { "type": "object", "properties": { "pattern": { "type": "string", "description": "Glob pattern (e.g., '**/*.js' matches all JS files recursively)" }, "recursive": { "type": "boolean", "description": "Search subdirectories (default: true)" } }, "examples": [ { "pattern": "**/*.js", "recursive": true }, { "pattern": "*.md", "recursive": false } ] }}Adding examples helped. The success rate improved from 70% to 75%.
Tool naming conventions
I renamed tools to match familiar CLI patterns:
Before:- search_files- get_file_contents- modify_file
After:- find (like Unix find)- cat (like Unix cat)- sed/awk (like Unix text processors)This simple change improved success by another 5-10%.
Structured prompts
I started including explicit workflow descriptions in the system prompt:
For file operations, use these tools in this order:1. Use 'find' to locate files2. Use 'cat' to read contents3. Use 'sed' to modify
This matches the Unix command line workflow.This helped the model understand the intended patterns, but it required manual maintenance.
The fundamental gap
No matter how much I optimized my MCP setup, I couldn’t match CLI performance. The reason is fundamental.
CLI patterns are learned during pre-training on billions of examples. They’re baked into the model’s understanding of how to interact with computers.
MCP requires the model to learn new patterns at inference time. It’s like asking a human who has spoken English for decades to suddenly use a made-up language and expecting them to be fluent.
The gap will narrow over time as:
- More MCP examples appear in training data
- Models fine-tune on MCP-specific datasets
- Standard MCP patterns emerge
But for now, CLI has a massive advantage.
Practical recommendations
Based on my testing, here’s what I recommend:
Use CLI when:
- Standard tools exist (git, npm, docker, etc.)
- Your team already uses command-line workflows
- You want the model to work with minimal configuration
- Performance and reliability are priorities
Use MCP when:
- No CLI tool exists for your needs
- You need complex parameter validation
- Operations span multiple services
- You want to abstract complex workflows behind simple tools
Hybrid approach
I found the best results with a hybrid setup:
AI Model ├─> CLI Tools (git, npm, docker) │ └─> Uses existing training data knowledge │ └─> MCP Servers (custom APIs, proprietary systems) └─> Requires careful description and examplesThis gives you the best of both worlds.
Summary
In this post, I explained why AI models perform better with CLI than MCP. The key point is training data. CLI patterns appear throughout the massive datasets used to train LLMs, while MCP is too new and too sparse in the data. CLI commands follow consistent patterns that models learn deeply through billions of examples. MCP requires the model to learn new patterns at inference time, which creates a significant performance gap.
For developers building AI tools today, start with CLI when possible. It works better and requires less configuration. Use MCP for domain-specific operations where no CLI alternative exists. Over time, as MCP examples accumulate in training data, this gap will narrow. But for now, CLI’s training data advantage is too significant to ignore.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Model Context Protocol Specification
- 👨💻 Reddit Discussion - CLI is all you need?
- 👨💻 Anthropic Tool Use Documentation
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments