API Rate Limits

Public Preview: Gravix Layer is currently in public preview. API endpoints, rate limits, and models are being updated frequently. The service is free to try.

Gravix Layer enforces API rate limits to ensure stability, fairness, and security for all users. This page explains how limits work, what happens if you exceed them, and how to design your integration for reliability.

What Are Rate Limits?

Rate limits control how many API requests or tokens you can use in a set time period. They:

Keep the service stable
Ensure fair access
Prevent abuse

Types of Limits

Abbreviation	What It Means
RPM	Requests per minute
RPD	Requests per day
TPM	Tokens per minute
TPD	Tokens per day

Free Plan Rate Limits

Limit Type	Description	Free Plan
RPM	Requests per Minute	25
RPD	Requests per Day	1000
TPM	Tokens per Minute	10,000
TPD	Tokens per Day	100,000

How Limits Work

Limits are enforced at the organization level (not per user)
All API keys in your org share the same pool
Multiple users with different keys count toward the same limits

You hit a limit when either requests or tokens reach the cap. Limits reset at the start of each window (minute/day).

What Happens If You Exceed a Limit?

The API returns HTTP 429 Too Many Requests
The response includes a Retry-After header with wait time
You should implement automatic retry logic with backoff

Best Practices for Staying Within Limits

Monitor both requests and tokens
Use exponential backoff on 429 errors
Batch multiple operations into single requests
Cache responses to reduce duplicate calls
Write concise prompts to minimize token usage

Example: Hitting a Limit

Suppose your limits are:

Limit Type	Value
RPM	25
TPM	10,000

If you send 25 requests with 200 tokens each in a minute:

Metric	Value	Limit	Status
Requests sent	25	25	Limit reached
Tokens used	5,000	10,000	OK

Even though you are below the token limit, you still hit the request-per-minute limit and will receive a 429 error if you send more requests in that minute.

Introduction

AI Inference

Files

Vectors

Memory

AgentBox

Integrations

Policies

API Rate Limits

What Are Rate Limits?

Types of Limits

Free Plan Rate Limits

How Limits Work

What Happens If You Exceed a Limit?

Best Practices for Staying Within Limits

Example: Hitting a Limit

Introduction

AI Inference

Files

Vectors

Memory

AgentBox

Integrations

Policies

​What Are Rate Limits?

​Types of Limits

​Free Plan Rate Limits

​How Limits Work

​What Happens If You Exceed a Limit?

​Best Practices for Staying Within Limits

​Example: Hitting a Limit

What Are Rate Limits?

Types of Limits

Free Plan Rate Limits

How Limits Work

What Happens If You Exceed a Limit?

Best Practices for Staying Within Limits

Example: Hitting a Limit