API Rate Limit Comparison — Every Major AI API's Limits
Complete comparison of rate limits, quotas, and throttling policies for OpenAI, Anthropic, Google, Mistral, Cohere, and more across free, paid, and enterprise tiers. Based on official documentation and developer community data (250K+ Stack Overflow views on rate limiting topics).
By Michael Lip · Updated April 2026
Methodology
Rate limit data sourced directly from official API documentation for each provider (Anthropic docs, OpenAI platform docs, Google AI Studio, Mistral API docs, Cohere docs, Amazon Bedrock docs). Cross-referenced with Stack Overflow developer discussions (20 threads, 250K+ combined views on API rate limiting). Enterprise tier limits are approximate ranges based on published case studies and developer reports. All data verified against live API responses as of April 2026. RPM = requests per minute, TPM = tokens per minute, RPD = requests per day.
| Provider | Model | Tier | RPM | TPM | RPD | HTTP on Limit | Retry Header |
|---|---|---|---|---|---|---|---|
| Anthropic | Claude Opus | Free | 5 | 25,000 | — | 429 | Retry-After |
| Anthropic | Claude Opus | Build | 50 | 50,000 | — | 429 | Retry-After |
| Anthropic | Claude Opus | Scale | 4,000 | 400,000 | — | 429 | Retry-After |
| Anthropic | Claude Sonnet | Free | 5 | 25,000 | — | 429 | Retry-After |
| Anthropic | Claude Sonnet | Build | 1,000 | 100,000 | — | 429 | Retry-After |
| Anthropic | Claude Sonnet | Scale | 4,000 | 400,000 | — | 429 | Retry-After |
| Anthropic | Claude Haiku | Free | 5 | 25,000 | — | 429 | Retry-After |
| Anthropic | Claude Haiku | Build | 2,000 | 200,000 | — | 429 | Retry-After |
| Anthropic | Claude Haiku | Scale | 4,000 | 400,000 | — | 429 | Retry-After |
| OpenAI | GPT-4o | Tier 1 ($5+) | 500 | 30,000 | — | 429 | Retry-After |
| OpenAI | GPT-4o | Tier 2 ($50+) | 5,000 | 450,000 | — | 429 | Retry-After |
| OpenAI | GPT-4o | Tier 5 ($1K+) | 10,000 | 30,000,000 | — | 429 | Retry-After |
| OpenAI | GPT-4o-mini | Tier 1 | 500 | 200,000 | 10,000 | 429 | Retry-After |
| OpenAI | GPT-4o-mini | Tier 5 | 30,000 | 150,000,000 | — | 429 | Retry-After |
| OpenAI | o1 | Tier 1 | 500 | 30,000 | — | 429 | Retry-After |
| Gemini Pro | Free | 15 | 32,000 | 1,500 | 429 | Retry-After | |
| Gemini Pro | Pay-as-you-go | 360 | 120,000 | 30,000 | 429 | Retry-After | |
| Gemini Ultra | Pay-as-you-go | 60 | 60,000 | — | 429 | Retry-After | |
| Mistral | Mistral Large | Free | 1 | 500,000 | — | 429 | Retry-After |
| Mistral | Mistral Large | Paid | 5 | 2,000,000 | — | 429 | Retry-After |
| Mistral | Mistral Small | Paid | 5 | 2,000,000 | — | 429 | Retry-After |
| Cohere | Command R+ | Trial | 20 | — | 1,000 | 429 | X-RateLimit-Reset |
| Cohere | Command R+ | Production | 10,000 | — | — | 429 | X-RateLimit-Reset |
| Cohere | Command R | Trial | 20 | — | 1,000 | 429 | X-RateLimit-Reset |
| Amazon | Bedrock (Claude) | On-Demand | Region-based | Region-based | — | 429 | Retry-After |
| Amazon | Bedrock (Claude) | Provisioned | Custom | Custom | — | 429 | Retry-After |
| Azure | OpenAI Service | Standard | Deployment-based | Deployment-based | — | 429 | Retry-After |
| Azure | OpenAI Service | Provisioned | PTU-based | PTU-based | — | 429 | Retry-After |
Key Findings
OpenAI's tier system offers the most aggressive scaling — Tier 5 users get 30M TPM for GPT-4o-mini, dwarfing all other providers. Anthropic's Scale tier provides strong throughput (4K RPM, 400K TPM) with simpler tier progression. Google Gemini's free tier is the most generous for prototyping at 15 RPM and 1,500 RPD. Cohere's production tier is notable for extremely high RPM (10K) with no token-per-minute caps. Stack Overflow data shows rate limiting is a top developer concern across all APIs, with GitHub alone generating 94K+ views on rate limit questions.
Rate Limit Strategies from Developer Community
Analysis of 20 Stack Overflow threads (250K+ combined views) reveals the most common rate limit challenges: GitHub API limits reset timing (8.5K views), queuing requests with Retrofit (18.9K views), and OkHttp interceptor-based throttling (7.8K views). The most upvoted solutions consistently recommend exponential backoff with jitter, client-side token buckets, and response caching as the three essential strategies for handling rate limits in production.
Frequently Asked Questions
What are the Anthropic Claude API rate limits?
Anthropic's Claude API rate limits vary by tier. Free tier: 5 RPM, 25K tokens per minute. Build tier (credit-based): 50 RPM and 50K TPM for Opus, up to 2,000 RPM and 200K TPM for Haiku. Scale tier: 4,000 RPM, 400K TPM across all models. Enterprise limits are custom and negotiable based on your usage needs.
How do OpenAI rate limits compare to Anthropic?
OpenAI uses a spending-based tiered system. Tier 1 ($5+ spent): 500 RPM for GPT-4o, 30K TPM. Tier 5 ($1,000+ spent): 10,000 RPM, 30M TPM. Anthropic's Build tier is comparable to OpenAI's Tier 1-2, while Anthropic Scale matches OpenAI's Tier 4-5. OpenAI scales more aggressively at the top tiers for smaller models.
What happens when you hit an API rate limit?
Most AI APIs return HTTP 429 (Too Many Requests) with a Retry-After header indicating when to retry. OpenAI and Anthropic both use this standard pattern. Best practice is to implement exponential backoff: wait 1s, then 2s, then 4s between retries, adding random jitter to prevent thundering herd. Some APIs also return X-RateLimit-Remaining headers so you can proactively throttle.
Which AI API has the most generous free tier?
Google Gemini offers the most generous free tier at 15 RPM for Gemini Pro and 1,500 requests per day with a 32K TPM limit. Anthropic's free tier offers 5 RPM with 25K TPM. Mistral offers 1 RPM but with a very high 500K TPM limit. For hobby projects and prototyping, Google Gemini's free tier provides the most practical headroom for testing and development.
How can I handle rate limiting in production applications?
Implement these strategies: (1) Token bucket or leaky bucket rate limiters in your client code, (2) Exponential backoff with jitter on 429 responses, (3) Request queuing to smooth burst traffic, (4) Caching responses for identical or near-identical requests, (5) Load balancing across multiple API keys or providers for redundancy. For critical applications, consider provisioned throughput options from AWS Bedrock or Azure OpenAI.