Create Deployment

curl --request POST \
  --url https://api.gravixlayer.com/v1/deployments/create \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "deployment_name": "my-llm",
  "model_name": "mistralai/mistral-nemo-instruct-2407",
  "gpu_model": "NVIDIA_T4_16GB",
  "gpu_count": 1,
  "min_replicas": 1,
  "max_replicas": 3,
  "hw_type": "dedicated"
}
'

{
  "deployment_id": "bab28606-d34d-48cd-b876-6e30724c5628",
  "deployment_name": "test-sdk-fix",
  "request_id": "7bb798b1-5f21-488e-8b27-0eab4127adc2",
  "status": "creating",
  "created_at": "2025-11-24T17:54:42Z"
}

POST

deployments

create

Create Deployment

curl --request POST \
  --url https://api.gravixlayer.com/v1/deployments/create \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "deployment_name": "my-llm",
  "model_name": "mistralai/mistral-nemo-instruct-2407",
  "gpu_model": "NVIDIA_T4_16GB",
  "gpu_count": 1,
  "min_replicas": 1,
  "max_replicas": 3,
  "hw_type": "dedicated"
}
'

{
  "deployment_id": "bab28606-d34d-48cd-b876-6e30724c5628",
  "deployment_name": "test-sdk-fix",
  "request_id": "7bb798b1-5f21-488e-8b27-0eab4127adc2",
  "status": "creating",
  "created_at": "2025-11-24T17:54:42Z"
}

Authorizations

Authorization

string

header

required

API key authentication. Get your API key from the Gravix Layer Dashboard.

Body

application/json

deployment_name

string

required

Unique deployment name

model_name

enum<string>

required

Model identifier to deploy

Available options:

llama3.2-1b-instruct,

qwen2-5-vl-3b-instruct,

qwen3-4b-instruct-2507,

qwen3-4b-thinking-2507,

deepseek-r1-distill-qwen-1.5b,

qwen3-4b,

qwen3-1.7b,

qwen3-0.6b

gpu_model

enum<string>

required

GPU hardware model

Available options:

NVIDIA_T4_16GB

gpu_count

enum<integer>

required

Number of GPUs (1, 2, 4, or 8)

Available options:

1,

2,

4,

8

min_replicas

integer

default:1

Minimum replicas for autoscaling

max_replicas

integer

default:1

Maximum replicas for autoscaling

hw_type

enum<string>

default:dedicated

Hardware type: "dedicated" or "shared"

Available options:

dedicated

Response

Deployment created successfully

deployment_id

string

deployment_name

string

request_id

string

status

string

created_at

string<date-time>

Search Vectors (Raw)List Deployments

⌘I

Getting Started

AI Inference

Embeddings

Files

Vectors

Deployments

AgentBox

Create Deployment

Authorizations

Body

Response