Skip to main content
Public Preview: Gravix Layer is currently in public preview. API endpoints, rate limits, and models are being updated frequently. The service is free to try.

Gravix Layer enforces API rate limits to ensure stability, fairness, and security for all users. This page explains how limits work, what happens if you exceed them, and how to design your integration for reliability.

What Are Rate Limits?

Rate limits control how many API requests or tokens you can use in a set time period. They:
  • Keep the service stable
  • Ensure fair access
  • Prevent abuse

Types of Limits

AbbreviationWhat It Means
RPMRequests per minute
RPDRequests per day
TPMTokens per minute
TPDTokens per day

Free Plan Rate Limits

Limit TypeDescriptionFree Plan
RPMRequests per Minute25
RPDRequests per Day1000
TPMTokens per Minute10,000
TPDTokens per Day100,000

How Limits Work

  • Limits are enforced at the organization level (not per user)
  • All API keys in your org share the same pool
  • Multiple users with different keys count toward the same limits
You hit a limit when either requests or tokens reach the cap. Limits reset at the start of each window (minute/day).

What Happens If You Exceed a Limit?

  • The API returns HTTP 429 Too Many Requests
  • The response includes a Retry-After header with wait time
  • You should implement automatic retry logic with backoff

Best Practices for Staying Within Limits

  • Monitor both requests and tokens
  • Use exponential backoff on 429 errors
  • Batch multiple operations into single requests
  • Cache responses to reduce duplicate calls
  • Write concise prompts to minimize token usage

Example: Hitting a Limit

Suppose your limits are:
Limit TypeValue
RPM25
TPM10,000
If you send 25 requests with 200 tokens each in a minute:
MetricValueLimitStatus
Requests sent2525Limit reached
Tokens used5,00010,000OK
Even though you are below the token limit, you still hit the request-per-minute limit and will receive a 429 error if you send more requests in that minute.
I