CLI vs MCP Authentication: Why Multiple Agents Hit GitHub Rate Limits
Problem
When I tried running multiple AI agents to process GitHub repositories, I hit rate limits immediately.
HTTP 403: API rate limit exceeded for user IDX-RateLimit-Limit: 5000X-RateLimit-Remaining: 0X-RateLimit-Reset: 1740933452Each agent was using gh auth login separately, but they were all authenticated with my personal account. I could not understand why I was hitting limits so quickly when I was only making a few requests per minute.
Environment
- GitHub CLI v2.50.0
- Python 3.11 with multiple AI agent processes
- macOS Darwin 24.6.0
- Three concurrent agents processing different repos
What happened?
I was building a system where multiple AI agents needed to query GitHub repos simultaneously. I wanted to understand which approach was better: CLI tools or MCP (Model Context Protocol).
Here’s how I had each agent set up:
# Each agent ran this independentlygh auth login --web --scopes repo,read:orggh repo view owner/repo --json name,descriptiongh api /repos/owner/repo/issuesThe agents were working fine when I ran them one at a time. But when I ran three agents in parallel, they started failing with rate limit errors after just a few hundred total requests.
I thought GitHub’s rate limit was 5,000 requests per hour, so three agents making a few requests each should not be a problem.
But then I checked the rate limit headers more carefully:
gh api rate_limitThe output showed that each agent was consuming its own quota from the same token pool. The rate limit was per token, not per agent.
How I approached this
I tried to understand the fundamental difference between CLI and MCP authentication.
Understanding CLI Authentication
CLI authentication is straightforward. You authenticate once, and the CLI stores your credentials.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Agent 1 │ │ Agent 2 │ │ Agent 3 │└──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ └──────────────────┴──────────────────┘ │ ▼ ┌─────────────┐ │ Your Token │ (5,000/hr limit) └─────────────┘Each agent independently accesses your stored credentials. From GitHub’s perspective, all requests come from the same authenticated identity. The rate limit applies to that identity.
The problem is that CLI authentication does not have any awareness of how many agents are using it. It simply provides credentials to whoever asks.
Understanding MCP Authentication
MCP works differently. It uses a server-client model where authentication happens at a central point.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Agent 1 │ │ Agent 2 │ │ Agent 3 │└──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ └──────────────────┴──────────────────┘ │ ▼ ┌─────────────┐ │ MCP Server│ └──────┬──────┘ │ ▼ ┌─────────────┐ │ Central │ │ Auth │ └─────────────┘The MCP server handles all API calls. It can manage rate limiting centrally, queue requests, and ensure fair distribution among all agents.
Comparing the approaches
| Aspect | CLI Authentication | MCP Authentication |
|---|---|---|
| Setup complexity | Simple (one command) | Requires server setup |
| Multi-agent support | Each agent consumes quota | Central quota management |
| Rate limiting | Per token, unaware of agents | Server can control and queue |
| Credential management | Stored per environment | Centralized |
| Learning curve | None (standard CLI) | Requires MCP knowledge |
The solution
I had a few options:
-
Use separate tokens per agent - This would work but requires managing multiple Personal Access Tokens, which is tedious and insecure if not done carefully.
-
Implement request queuing - Build a queue system that serializes requests from all agents through a single authenticated client.
-
Switch to MCP - Use MCP’s server model to handle authentication and rate limiting centrally.
I started by trying option 2, creating a simple queue system:
import timefrom queue import Queuefrom threading import Thread
class GitHubRequestQueue: def __init__(self, token): self.queue = Queue() self.token = token self.running = False
def submit(self, request_func): future = {} self.queue.put((request_func, future)) return future
def worker(self): while self.running or not self.queue.empty(): request_func, future = self.queue.get() try: # Rate limiting: 1 request per second time.sleep(1) result = request_func(self.token) future['result'] = result except Exception as e: future['error'] = e self.queue.task_done()
def start(self): self.running = True Thread(target=self.worker, daemon=True).start()This worked better. All agents submitted requests to the queue, and a single worker thread executed them serially with built-in delays to stay within rate limits.
But then I considered MCP. MCP already has this pattern built-in. The MCP server handles all the complexity of authentication, rate limiting, and request management.
The reason
The key reason CLI hits rate limits with multiple agents is that CLI authentication was designed for human use, not multi-agent systems.
When I run gh auth login, I am establishing my identity with GitHub. The CLI stores my credentials and uses them for all my requests. GitHub sees all my requests as coming from one authenticated user and applies rate limits accordingly.
Multiple agents using CLI authentication all appear as the same user from GitHub’s perspective. The rate limit is shared, but there is no coordination between agents about quota usage.
MCP solves this by introducing an intermediary layer. The MCP server becomes the single authenticated identity, and it can intelligently manage how multiple agents use that identity.
This is similar to how database connection pools work. Rather than each application opening its own connection, they all share a pool managed centrally.
Security considerations
CLI security risks
When using CLI authentication with multiple agents, I found several security concerns:
- Token proliferation: If each agent needs its own token, I have to manage many secrets
- Credential storage: CLI stores credentials in plain text in some cases as a fallback
- Log exposure: If agents log request details, tokens can leak into logs
MCP security advantages
MCP offers better security for multi-agent scenarios:
- Centralized credential management: Only the MCP server needs credentials
- Least privilege: The MCP server can enforce what each agent can access
- Better audit trails: All requests go through one point, making logging easier
Here is a comparison of the security posture:
CLI Approach:Agent 1 ──┐Agent 2 ──┼──> Direct to GitHub API (each with credentials)Agent 3 ──┘
MCP Approach:Agent 1 ──┐Agent 2 ──┼──> MCP Server ──> GitHub API (single auth)Agent 3 ──┘When to use which approach
Based on my experience, here is when each approach makes sense:
Use CLI when:
- Running a single agent or process
- Doing quick prototyping or development
- Working in isolated environments
- You need direct, low-level control
- The workflow is human-driven
Use MCP when:
- Running multiple agents concurrently
- You need centralized rate limiting
- Security and audit trails are important
- You are building production systems
- You need to manage complex authentication scenarios
Implementation examples
CLI authentication setup
# Authenticate with GitHubgh auth login --web --scopes repo,read:org
# Verify authenticationgh auth status
# Use token in environment variableexport GH_TOKEN=$(gh auth token)MCP-style queue approach
If you cannot use MCP but need similar behavior, you can implement a simple request queue:
import asynciofrom typing import Callable, Anyimport aiohttp
class MCPStyleRequestQueue: def __init__(self, token: str, max_concurrent: int = 5): self.token = token self.semaphore = asyncio.Semaphore(max_concurrent) self.session = aiohttp.ClientSession()
async def request(self, url: str, **kwargs) -> dict: async with self.semaphore: headers = kwargs.pop('headers', {}) headers['Authorization'] = f'token {self.token}' async with self.session.get(url, headers=headers, **kwargs) as resp: if resp.status == 403: # Rate limited - back off reset_time = int(resp.headers.get('X-RateLimit-Reset', 60)) wait_time = max(0, reset_time - int(time.time())) await asyncio.sleep(wait_time) return await self.request(url, **kwargs) return await resp.json()
async def close(self): await self.session.close()This mimics MCP’s centralized authentication while still using the CLI token.
Best practices
Regardless of which approach you choose, I learned these are important:
- Never hardcode tokens in code
- Use environment variables for all credentials
- Implement proper error handling for rate limits
- Monitor authentication events
- Follow the principle of least privilege
- Rotate tokens regularly
Summary
In this post, I explored the authentication trade-offs between CLI and MCP for AI agents. The key point is that CLI authentication is designed for human use and does not handle multi-agent scenarios well, leading to rate limit issues. MCP’s server model provides centralized authentication and rate limiting that is better suited for multi-agent systems.
The choice between CLI and MCP depends on your use case: CLI is simple and direct for single-agent workflows, while MCP offers better coordination and security for multi-agent environments. Understanding these trade-offs helps you choose the right approach for your AI agent infrastructure.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 GitHub CLI Authentication Documentation
- 👨💻 Model Context Protocol (MCP) Specification
- 👨💻 GitHub API Rate Limits
- 👨💻 Azure DefaultAzureCredential Chain
- 👨💻 OAuth 2.0 Best Practices
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments