Skip to main content
POST
/
v1
/
deployments
/
create
Create Deployment
curl --request POST \
  --url https://api.gravixlayer.com/v1/deployments/create \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "deployment_name": "my-llm",
  "model_name": "mistralai/mistral-nemo-instruct-2407",
  "gpu_model": "NVIDIA_T4_16GB",
  "gpu_count": 1,
  "min_replicas": 1,
  "max_replicas": 3,
  "hw_type": "dedicated"
}
'
{
  "deployment_id": "bab28606-d34d-48cd-b876-6e30724c5628",
  "deployment_name": "test-sdk-fix",
  "request_id": "7bb798b1-5f21-488e-8b27-0eab4127adc2",
  "status": "creating",
  "created_at": "2025-11-24T17:54:42Z"
}

Authorizations

Authorization
string
header
required

API key authentication. Get your API key from the Gravix Layer Dashboard.

Body

application/json
deployment_name
string
required

Unique deployment name

model_name
enum<string>
required

Model identifier to deploy

Available options:
llama3.2-1b-instruct,
qwen2-5-vl-3b-instruct,
qwen3-4b-instruct-2507,
qwen3-4b-thinking-2507,
deepseek-r1-distill-qwen-1.5b,
qwen3-4b,
qwen3-1.7b,
qwen3-0.6b
gpu_model
enum<string>
required

GPU hardware model

Available options:
NVIDIA_T4_16GB
gpu_count
enum<integer>
required

Number of GPUs (1, 2, 4, or 8)

Available options:
1,
2,
4,
8
min_replicas
integer
default:1

Minimum replicas for autoscaling

max_replicas
integer
default:1

Maximum replicas for autoscaling

hw_type
enum<string>
default:dedicated

Hardware type: "dedicated" or "shared"

Available options:
dedicated

Response

Deployment created successfully

deployment_id
string
deployment_name
string
request_id
string
status
string
created_at
string<date-time>