Global rate limiting
LiteLLM allows you to apply requests per minute (RPM) and tokens per minute (TPM) limits globally across all users, teams, and models through the LiteLLM configuration file. These global limits ensure that traffic is controlled across all requests, regardless of individual user or team limits.
Note
You do not need to configure a database URL or use the LLM master key to apply this configuration, making it simpler for deployments where per-user tracking is not required.
Sample configuration
model_list:
- model_name: claude-3
litellm_params:
model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
aws_region_name: $AWS_REGION
rpm: 2
tpm: 200
router_settings:
enable_pre_call_checks: true # 1. Enable pre-call checks
Steps to configure
- Use the configuration file
$BEDROCK_LITELLM_DIR/litellm/config/proxy_config_global_rate_limit.yaml
- To apply the new configuration, follow the steps outlined in Apply configuration changes.
Steps to test
- To test the global rate limit, make three or more API requests within one minute. After the second request, you should start receiving an error indicating that the limit has been reached. The limit will reset after one minute. If the rate limit is exceeded, you should receive an error response similar to the one below: