Skip to content

OpenAI API Rate Limits Explained: RPM, TPM, and Usage Quotas

When I first started working with the OpenAI API, I hit a confusing wall of acronyms and limits. My API calls were failing with 429 errors, and the documentation mentioned RPM, TPM, TPD, and something about quotas. I was lost.

Was I hitting a rate limit? A usage quota? Both? And what’s the difference between RPM and TPM anyway?

After digging through the official docs and learning the hard way through trial and error, I finally understood how OpenAI’s rate limiting works. Let me break it down for you.

The Core Concept: Speed vs. Capacity

The most important distinction to understand is this:

Rate limits control SPEED. Usage quotas control CAPACITY.

Rate limits are like speed limits on a highway. They restrict how fast you can make requests or process tokens within a specific time window. Hitting a rate limit means you need to slow down or wait.

Usage quotas are like a fuel tank. They represent your total allowance for a billing period. Hitting your quota means you’ve used up your allocated resources and need to wait for the reset (usually monthly).

Understanding the Metrics

OpenAI uses three primary rate limit metrics:

RPM (Requests Per Minute)

RPM measures the number of API calls you can make in a 60-second window. Each request counts as one, regardless of how many tokens it processes.

For example, if your RPM limit is 500, you can make 500 separate API calls within one minute. This is straightforward for simple applications, but it becomes tricky when you’re processing large batches or making concurrent requests.

TPM (Tokens Per Minute)

TPM measures the total number of tokens processed per minute, including both input tokens (your prompt) and output tokens (the model’s response).

This is where things get interesting. A single request with a 4,000-token prompt could consume more of your TPM budget than 50 small requests combined. The model processes:

  • Your input tokens
  • The generated output tokens
  • Any tokens from the system prompt

TPD (Tokens Per Day)

TPD is exactly what it sounds like: your total token allowance per day. This is often tied to your usage tier and subscription level.

Here’s a comparison of these metrics:

Rate Limit Metrics Comparison
| Metric | Stands For | Time Window | What It Measures |
|--------|-----------|-------------|------------------|
| RPM | Requests Per Minute | 60 seconds | Number of API calls |
| TPM | Tokens Per Minute | 60 seconds | Total tokens processed |
| TPD | Tokens Per Day | 24 hours | Daily token allowance |

Subscription vs. API Rate Limits: A Common Confusion

This is where I got tripped up initially. There are actually two different types of rate limits:

  1. API Rate Limits: Technical constraints on your API access (RPM, TPM, TPD)
  2. Subscription Rate Limits: Usage allowances tied to your ChatGPT Plus/Pro subscription

A Reddit comment perfectly captured this confusion:

“Your link are API rate limits, which has nothing to do with what they meant with rate limits for subscriptions.”

When someone on a Reddit thread asked about “TPD or tokens per day,” the confusion was evident. API rate limits apply to developers using the API, while subscription limits apply to ChatGPT users on Plus or Pro plans.

Here’s how they differ:

API vs. Subscription Rate Limits
| Aspect | API Rate Limits | Subscription Limits |
|--------|----------------|---------------------|
| Applies to | API developers | ChatGPT web users |
| Measured in | RPM, TPM, TPD | Messages per time period |
| Tiers | Based on usage tier | Based on subscription plan |
| Reset | Rolling windows | Varies by plan |
| Visibility | Dashboard + headers | In-app messaging |

Practical Implications

When you hit a rate limit, the API returns a 429 status code with headers that tell you when you can retry:

  • x-ratelimit-limit-requests: Your RPM limit
  • x-ratelimit-limit-tokens: Your TPM limit
  • x-ratelimit-remaining-requests: How many requests you have left
  • x-ratelimit-remaining-tokens: How many tokens you have left
  • x-ratelimit-reset-requests: When your request window resets
  • x-ratelimit-reset-tokens: When your token window resets

Handling Rate Limits in Code

Here’s a Python example showing how to implement exponential backoff when hitting rate limits:

Rate Limit Handling with Exponential Backoff
import openai
import time
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def call_openai_with_retry(
messages: list,
model: str = "gpt-4",
max_retries: int = 5,
initial_delay: float = 1.0
) -> dict:
"""
Call OpenAI API with automatic retry on rate limit errors.
Args:
messages: List of message dicts for the chat completion
model: Model identifier (e.g., "gpt-4", "gpt-3.5-turbo")
max_retries: Maximum number of retry attempts
initial_delay: Initial delay in seconds (doubles each retry)
Returns:
API response dict
Raises:
Exception: If all retries are exhausted
"""
delay = initial_delay
for attempt in range(max_retries):
try:
response = openai.ChatCompletion.create(
model=model,
messages=messages
)
return response
except openai.error.RateLimitError as e:
if attempt == max_retries - 1:
logger.error(f"Max retries ({max_retries}) exceeded. Giving up.")
raise
# Check for retry-after header
retry_after = getattr(e, 'retry_after', None)
if retry_after:
wait_time = float(retry_after)
else:
wait_time = delay
delay *= 2 # Exponential backoff
logger.warning(
f"Rate limit hit on attempt {attempt + 1}/{max_retries}. "
f"Waiting {wait_time:.1f}s before retry..."
)
time.sleep(wait_time)
except Exception as e:
logger.error(f"Unexpected error: {e}")
raise
raise Exception("Failed to complete API call after retries")
# Example usage
if __name__ == "__main__":
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain rate limiting in one sentence."}
]
try:
result = call_openai_with_retry(messages)
print(result.choices[0].message.content)
except Exception as e:
print(f"Failed: {e}")

This code implements exponential backoff, which is the recommended approach for handling rate limits. It waits progressively longer between retries, giving the rate limit window time to reset.

Checking Your Current Limits

You can find your current rate limits in the OpenAI dashboard:

  1. Navigate to Settings → Limits
  2. View your usage tier and corresponding limits
  3. Check your current usage against limits

Your usage tier depends on your spending history and account age. Higher tiers unlock higher rate limits.

Sample Usage Tiers and Limits
| Tier | RPM | TPM | TPD |
|------|-----|-----|-----|
| Free | 3 | 40,000 | N/A |
| Tier 1 | 500 | 200,000 | 1M |
| Tier 2 | 5,000 | 2,000,000 | 10M |
| Tier 3+ | Higher | Higher | Higher |

Note: These are illustrative examples. Check the official documentation for current values.

Key Takeaways

  1. RPM limits how many requests you can make per minute
  2. TPM limits how many tokens you can process per minute
  3. TPD limits your daily token usage
  4. Rate limits (speed) are different from usage quotas (capacity)
  5. API rate limits differ from ChatGPT subscription limits
  6. Always implement retry logic with exponential backoff
  7. Monitor the response headers to understand your current limit status

When to Care About Each Metric

  • RPM matters when: You’re making many small requests (e.g., processing individual items in a queue)
  • TPM matters when: You’re processing large documents or generating long responses
  • TPD matters when: You’re running batch jobs that consume significant daily tokens

For most applications, TPM is the limiting factor. A single request with a large context window can consume thousands of tokens quickly.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments