Skip to content

CLI vs MCP Authentication: Why Multiple Agents Hit GitHub Rate Limits

Problem

When I tried running multiple AI agents to process GitHub repositories, I hit rate limits immediately.

HTTP 403: API rate limit exceeded for user ID
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1740933452

Each agent was using gh auth login separately, but they were all authenticated with my personal account. I could not understand why I was hitting limits so quickly when I was only making a few requests per minute.

Environment

  • GitHub CLI v2.50.0
  • Python 3.11 with multiple AI agent processes
  • macOS Darwin 24.6.0
  • Three concurrent agents processing different repos

What happened?

I was building a system where multiple AI agents needed to query GitHub repos simultaneously. I wanted to understand which approach was better: CLI tools or MCP (Model Context Protocol).

Here’s how I had each agent set up:

agent_setup.sh
# Each agent ran this independently
gh auth login --web --scopes repo,read:org
gh repo view owner/repo --json name,description
gh api /repos/owner/repo/issues

The agents were working fine when I ran them one at a time. But when I ran three agents in parallel, they started failing with rate limit errors after just a few hundred total requests.

I thought GitHub’s rate limit was 5,000 requests per hour, so three agents making a few requests each should not be a problem.

But then I checked the rate limit headers more carefully:

check_limits.sh
gh api rate_limit

The output showed that each agent was consuming its own quota from the same token pool. The rate limit was per token, not per agent.

How I approached this

I tried to understand the fundamental difference between CLI and MCP authentication.

Understanding CLI Authentication

CLI authentication is straightforward. You authenticate once, and the CLI stores your credentials.

┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Agent 1 │ │ Agent 2 │ │ Agent 3 │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└──────────────────┴──────────────────┘
┌─────────────┐
│ Your Token │ (5,000/hr limit)
└─────────────┘

Each agent independently accesses your stored credentials. From GitHub’s perspective, all requests come from the same authenticated identity. The rate limit applies to that identity.

The problem is that CLI authentication does not have any awareness of how many agents are using it. It simply provides credentials to whoever asks.

Understanding MCP Authentication

MCP works differently. It uses a server-client model where authentication happens at a central point.

┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Agent 1 │ │ Agent 2 │ │ Agent 3 │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└──────────────────┴──────────────────┘
┌─────────────┐
│ MCP Server│
└──────┬──────┘
┌─────────────┐
│ Central │
│ Auth │
└─────────────┘

The MCP server handles all API calls. It can manage rate limiting centrally, queue requests, and ensure fair distribution among all agents.

Comparing the approaches

AspectCLI AuthenticationMCP Authentication
Setup complexitySimple (one command)Requires server setup
Multi-agent supportEach agent consumes quotaCentral quota management
Rate limitingPer token, unaware of agentsServer can control and queue
Credential managementStored per environmentCentralized
Learning curveNone (standard CLI)Requires MCP knowledge

The solution

I had a few options:

  1. Use separate tokens per agent - This would work but requires managing multiple Personal Access Tokens, which is tedious and insecure if not done carefully.

  2. Implement request queuing - Build a queue system that serializes requests from all agents through a single authenticated client.

  3. Switch to MCP - Use MCP’s server model to handle authentication and rate limiting centrally.

I started by trying option 2, creating a simple queue system:

github_queue.py
import time
from queue import Queue
from threading import Thread
class GitHubRequestQueue:
def __init__(self, token):
self.queue = Queue()
self.token = token
self.running = False
def submit(self, request_func):
future = {}
self.queue.put((request_func, future))
return future
def worker(self):
while self.running or not self.queue.empty():
request_func, future = self.queue.get()
try:
# Rate limiting: 1 request per second
time.sleep(1)
result = request_func(self.token)
future['result'] = result
except Exception as e:
future['error'] = e
self.queue.task_done()
def start(self):
self.running = True
Thread(target=self.worker, daemon=True).start()

This worked better. All agents submitted requests to the queue, and a single worker thread executed them serially with built-in delays to stay within rate limits.

But then I considered MCP. MCP already has this pattern built-in. The MCP server handles all the complexity of authentication, rate limiting, and request management.

The reason

The key reason CLI hits rate limits with multiple agents is that CLI authentication was designed for human use, not multi-agent systems.

When I run gh auth login, I am establishing my identity with GitHub. The CLI stores my credentials and uses them for all my requests. GitHub sees all my requests as coming from one authenticated user and applies rate limits accordingly.

Multiple agents using CLI authentication all appear as the same user from GitHub’s perspective. The rate limit is shared, but there is no coordination between agents about quota usage.

MCP solves this by introducing an intermediary layer. The MCP server becomes the single authenticated identity, and it can intelligently manage how multiple agents use that identity.

This is similar to how database connection pools work. Rather than each application opening its own connection, they all share a pool managed centrally.

Security considerations

CLI security risks

When using CLI authentication with multiple agents, I found several security concerns:

  • Token proliferation: If each agent needs its own token, I have to manage many secrets
  • Credential storage: CLI stores credentials in plain text in some cases as a fallback
  • Log exposure: If agents log request details, tokens can leak into logs

MCP security advantages

MCP offers better security for multi-agent scenarios:

  • Centralized credential management: Only the MCP server needs credentials
  • Least privilege: The MCP server can enforce what each agent can access
  • Better audit trails: All requests go through one point, making logging easier

Here is a comparison of the security posture:

CLI Approach:
Agent 1 ──┐
Agent 2 ──┼──> Direct to GitHub API (each with credentials)
Agent 3 ──┘
MCP Approach:
Agent 1 ──┐
Agent 2 ──┼──> MCP Server ──> GitHub API (single auth)
Agent 3 ──┘

When to use which approach

Based on my experience, here is when each approach makes sense:

Use CLI when:

  • Running a single agent or process
  • Doing quick prototyping or development
  • Working in isolated environments
  • You need direct, low-level control
  • The workflow is human-driven

Use MCP when:

  • Running multiple agents concurrently
  • You need centralized rate limiting
  • Security and audit trails are important
  • You are building production systems
  • You need to manage complex authentication scenarios

Implementation examples

CLI authentication setup

cli_auth.sh
# Authenticate with GitHub
gh auth login --web --scopes repo,read:org
# Verify authentication
gh auth status
# Use token in environment variable
export GH_TOKEN=$(gh auth token)

MCP-style queue approach

If you cannot use MCP but need similar behavior, you can implement a simple request queue:

mcp_style_queue.py
import asyncio
from typing import Callable, Any
import aiohttp
class MCPStyleRequestQueue:
def __init__(self, token: str, max_concurrent: int = 5):
self.token = token
self.semaphore = asyncio.Semaphore(max_concurrent)
self.session = aiohttp.ClientSession()
async def request(self, url: str, **kwargs) -> dict:
async with self.semaphore:
headers = kwargs.pop('headers', {})
headers['Authorization'] = f'token {self.token}'
async with self.session.get(url, headers=headers, **kwargs) as resp:
if resp.status == 403:
# Rate limited - back off
reset_time = int(resp.headers.get('X-RateLimit-Reset', 60))
wait_time = max(0, reset_time - int(time.time()))
await asyncio.sleep(wait_time)
return await self.request(url, **kwargs)
return await resp.json()
async def close(self):
await self.session.close()

This mimics MCP’s centralized authentication while still using the CLI token.

Best practices

Regardless of which approach you choose, I learned these are important:

  1. Never hardcode tokens in code
  2. Use environment variables for all credentials
  3. Implement proper error handling for rate limits
  4. Monitor authentication events
  5. Follow the principle of least privilege
  6. Rotate tokens regularly

Summary

In this post, I explored the authentication trade-offs between CLI and MCP for AI agents. The key point is that CLI authentication is designed for human use and does not handle multi-agent scenarios well, leading to rate limit issues. MCP’s server model provides centralized authentication and rate limiting that is better suited for multi-agent systems.

The choice between CLI and MCP depends on your use case: CLI is simple and direct for single-agent workflows, while MCP offers better coordination and security for multi-agent environments. Understanding these trade-offs helps you choose the right approach for your AI agent infrastructure.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments