Documentation Index
Fetch the complete documentation index at: https://mintlify.com/get-convex/rate-limiter/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The token bucket algorithm limits requests by continuously adding tokens to a bucket at a fixed rate. Each request consumes tokens, and requests are denied when insufficient tokens are available.A token bucket limits the rate of requests by continuously adding tokens to be consumed when servicing requests. The
rate is the number of tokens added per period. The capacity is the maximum number of tokens that can accumulate.How It Works
Continuous Token Addition
Unlike fixed window, tokens are added continuously rather than in bulk:- Tokens are added at a rate of
rate / periodtokens per millisecond - The bucket can hold up to
capacitytokens (defaults torate) - When a request arrives, the system:
- Calculates how many tokens have been added since the last request
- Adds those tokens to the current value (up to capacity)
- Attempts to consume the requested number of tokens
Capacity and Rollover
Thecapacity parameter controls how many tokens can accumulate:
- When capacity equals rate: No burst allowance, strict rate limiting
- When capacity exceeds rate: Allows bursts if user has been inactive
Visual Explanation
Here’s how tokens accumulate over time:Tokens are added continuously based on elapsed time, not in discrete intervals. This provides the smoothest possible rate limiting.
Configuration
Type Definition
Fromsrc/shared.ts:
Parameters
rate (required)
The number of tokens added per period.
period (required)
The time period in milliseconds. Use the provided constants:
capacity (optional)
Maximum tokens that can accumulate. Defaults to rate.
maxReserved (optional)
Maximum tokens that can be reserved into the future.
shards (optional)
Number of shards for high-throughput scenarios. See Scaling with Shards.
Real Code Examples
Basic Message Rate Limiting
LLM API Rate Limiting
Failed Login Attempts
Implementation Details
The token bucket calculation fromsrc/shared.ts:
elapsed: Time since last update in millisecondsrate: Tokens per millisecond (rate / period)- New value: Previous value + (elapsed × rate), capped at capacity
retryAfter: Calculated as the time needed to accumulate missing tokens
Use Cases
Smooth Rate Limiting
Token bucket is ideal when you want the smoothest possible rate limiting without sudden boundaries:LLM and Streaming APIs
Many LLM APIs (OpenAI, Anthropic) use token-based limits:User Actions with Burst Allowance
Allow users to perform bursts of actions if they’ve been inactive:Advantages
- Smoothest rate limiting: Tokens added continuously, not in chunks
- Flexible burst handling: Capacity can be tuned independently of rate
- Fair to bursty traffic: Inactive users accumulate tokens for later use
- Works well with reservations: Can calculate exact retry times
Limitations
- No predictable resets: Tokens are always being added, no fixed “reset time”
- Less intuitive: “10 per minute” doesn’t mean “10 at the start of each minute”
- Requires calculation: Token count must be calculated on every check
Next Steps
Fixed Window
Learn about the alternative fixed window strategy
Basic Usage
Start using token bucket rate limiting
Scaling with Shards
Handle high throughput scenarios
Reservations
Reserve capacity ahead of time