Querying Dedicated Deployments

Dedicated deployments provide isolated, scalable model instances with guaranteed capacity and enterprise-grade features

info

Public Preview: Gravix Layer is currently in Public preview. Features are experimental and may have issues or break as ongoing updates to API endpoints and models continue.

Overview

Dedicated deployments allow you to:

Guaranteed Capacity: Reserved compute resources for your workloads
Consistent Performance: Dedicated GPUs only (shared GPU support coming soon)
Custom Scaling: Configure replicas based on your needs

Prerequisites

Before deploying models, you need to set up your API key:

note

API Key Required: You must export your GravixLayer API key in your terminal before creating deployments. All deployment operations are tied to your API key and account.

Set your API key:

Windows (CMD)
Windows (PowerShell)
Linux/macOS

set GRAVIXLAYER_API_KEY=your_api_key_here

$env:GRAVIXLAYER_API_KEY="your_api_key_here"

export GRAVIXLAYER_API_KEY=your_api_key_here

Supported Models

Text Models

The following models are available for dedicated deployment:

Model Name	Model ID	Provider	Parameters	Context Length
Qwen: Qwen3 4B Instruct 2507	`qwen3-4b-instruct-2507`	Qwen	4B	32768
Qwen: Qwen3 4B Thinking 2507	`qwen3-4b-thinking-2507`	Qwen	4B	32768
Qwen: Qwen3 4B	`qwen3-4b`	Qwen	4B	32768
Qwen: Qwen3 1.7B	`qwen3-1.7b`	Qwen	1.7B	32768
Qwen: Qwen3 0.6B	`qwen3-0.6b`	Qwen	0.6B	32768

Available Hardware

Currently supported GPU configurations:

Accelerator	Hardware String	Memory	Pricing
NVIDIA T4	`nvidia-t4-16gb-pcie_1`	16GB	$0.39/hour

Quick Start Guide

1. Check Available Hardware

First, list available GPU options:

gravixlayer deployments gpu --list

Example Output:

Available GPUs (1 found):
Accelerator     Hardware String                     Memory
------------------------------------------------------------
t4              nvidia-t4-16gb-pcie_1               16GB

2. Create a Deployment

Create a new dedicated deployment:

gravixlayer deployments create \
  --deployment_name "test_model" \
  --hardware "nvidia-t4-16gb-pcie_1" \
  --model_name "qwen3-1-7b" \
  --wait

Example Output:

Creating deployment 'test_model' with model 'qwen3-1-7b'...
✅ Deployment created successfully!

Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Status: creating
Model: qwen3-1-7b
Hardware: nvidia-t4-16gb-pcie_1

⏳ Waiting for deployment 'test_model' to be ready...
Press Ctrl+C to stop monitoring (deployment will continue in background)

Status: creating
Status: running

🚀 Deployment is now ready!
Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Status: running
Model: qwen3-1-7b
Hardware: nvidia-t4-16gb-pcie_1

3. List Your Deployments

View all your active deployments:

gravixlayer deployments list

Example Output:

Found 1 deployment(s):

Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Model: qwen3-1-7b
Status: running
Hardware: nvidia-t4-16gb-pcie_1
Replicas: 1
Created: 2025-09-03T15:57:47.021738Z

4. Test Your Deployment

Use your deployment for inference:

gravixlayer --model "test_model" --user "Hello, how are you?"

Deployment Management

Creating Deployments

CLI
Python SDK

Basic Creation:

gravixlayer deployments create \
  --deployment_name "production_model" \
  --hw_type "dedicated" \
  --hardware "nvidia-t4-16gb-pcie_1" \
  --min_replicas 2 \
  --model_name "qwen3-4b-instruct-2507" \

With Wait Flag (Recommended):

gravixlayer deployments create \
  --deployment_name "production_model" \
  --hw_type "dedicated" \
  --hardware "nvidia-t4-16gb-pcie_1" \
  --min_replicas 2 \
  --model_name "qwen3-4b-instruct-2507" \
  --wait

import os
from gravixlayer import GravixLayer

# Initialize the client
client = GravixLayer(api_key=os.environ.get("GRAVIXLAYER_API_KEY"))

# Create a deployment

deployment = client.deployments.create(
    deployment_name="production_model",
    hw_type="dedicated",
    hardware="nvidia-t4-16gb-pcie_1",
    min_replicas=2,
    model_name="qwen3-4b-instruct-2507"
)

print(f"Created deployment: {deployment.id}")

Listing Deployments

CLI
Python SDK

gravixlayer deployments list

Example Output:

Found 1 deployment(s):

Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Model: qwen3-1-7b
Status: running
Hardware: nvidia-t4-16gb-pcie_1
Replicas: 1
Created: 2025-09-03T15:57:47.021738Z

# List all deployments
deployments = client.deployments.list()
for deployment in deployments:
    print(f"Deployment: {deployment.name} - Status: {deployment.status}")
    print(f"ID: {deployment.id}")
    print(f"Model: {deployment.model}")
    print(f"Hardware: {deployment.hardware}")
    print(f"Replicas: {deployment.replicas}")
    print("---")

Hardware Information

CLI
Python SDK

List Available Hardware:

gravixlayer deployments gpu --list

Example Output:

Available GPUs (1 found):
Accelerator     Hardware String                     Memory
------------------------------------------------------------
t4              nvidia-t4-16gb-pcie_1               16GB

Get Hardware as JSON:

gravixlayer deployments gpu --list --json

Example JSON Output:

[
  {
    "accelerator_id": "NVIDIA_T4_16GB",
    "pricing": 0.39,
    "hw_model": "T4",
    "hw_link": "pcie",
    "hw_memory": 16,
    "provider": "NVIDIA",
    "status": "available",
    "updated_at": "2025-08-31T12:13:47Z",
    "hardware_string": "nvidia-t4-16gb-pcie_1"
  }
]

# List available hardware
hardware_options = client.deployments.list_hardware()
for hardware in hardware_options:
    print(f"Hardware: {hardware.hardware_string}")
    print(f"Model: {hardware.hw_model}")
    print(f"Memory: {hardware.hw_memory}GB")
    print(f"Pricing: ${hardware.pricing}/hour")
    print(f"Status: {hardware.status}")
    print("---")

# Get hardware as JSON
hardware_json = client.deployments.list_hardware(format="json")
print(hardware_json)

Deleting Deployments

CLI
Python SDK

gravixlayer deployments delete a7283154-5ab2-42a4-b221-03c61664fa22

Example Output:

Deleting deployment a7283154-5ab2-42a4-b221-03c61664fa22...
Deployment deleted successfully!
Response: {'message': 'deployment deleted successfully', 'status': 'success'}

# Delete a deployment
result = client.deployments.delete(deployment_id="a7283154-5ab2-42a4-b221-03c61664fa22")
print(f"Deletion result: {result}")

Using Your Deployment

Chat Completions

Once your deployment is running, you can use it like any other model by referencing the deployment name:

CLI
Python SDK

Basic Chat:

gravixlayer --model "test_model" --user "Hello, how are you?"

Streaming Chat:

gravixlayer --model "test_model" --user "Tell me a story" --stream

With System Message:

gravixlayer --model "test_model" --system "You are a helpful assistant" --user "Explain quantum computing"

Text Completion Mode:

gravixlayer --mode completions --model "test_model" --prompt "The future of AI is"

Streaming Completion:

gravixlayer --mode completions --model "test_model" --prompt "Write a poem about" --stream

# Basic chat completion
response = client.chat.completions.create(
    model="test_model",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)
print(response.choices[0].message.content)

# Streaming chat
stream = client.chat.completions.create(
    model="test_model",
    messages=[
        {"role": "user", "content": "Tell me a story"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

# With system message
response = client.chat.completions.create(
    model="test_model",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Explain quantum computing"}
    ]
)

CLI Commands Reference

Deployment Management Commands

Command	Description	Example
`deployments create`	Create a new deployment	`gravixlayer deployments create --deployment_name "my_model" --hardware "nvidia-t4-16gb-pcie_1" --model_name "qwen3-1-7b"`
`deployments list`	List all deployments	`gravixlayer deployments list`
`deployments delete`	Delete a deployment	`gravixlayer deployments delete <deployment_id>`
`deployments gpu --list`	List available hardware	`gravixlayer deployments gpu --list`
`deployments gpu --list --json`	Get hardware info as JSON	`gravixlayer deployments gpu --list --json`

Inference Commands

Command	Description	Example
Basic chat	Send a chat message	`gravixlayer --model "deployment_name" --user "Hello!"`
Streaming chat	Get streaming response	`gravixlayer --model "deployment_name" --user "Tell me a story" --stream`
Text completion	Use completion mode	`gravixlayer --mode completions --model "deployment_name" --prompt "Complete this"`
With system message	Include system prompt	`gravixlayer --model "deployment_name" --system "You are a poet" --user "Write a haiku"`

Troubleshooting

Common Issues

Deployment Stuck in "Creating" Status:

Wait 5-10 minutes for initialization
Check hardware availability with gravixlayer deployments gpu --list
Verify model name is correct

Connection Errors:

Ensure deployment status is "running" before making requests
Verify deployment name matches exactly
Check API key configuration

Overview​

Prerequisites​

Supported Models​

Text Models​

Available Hardware​

Quick Start Guide​

1. Check Available Hardware​

2. Create a Deployment​

3. List Your Deployments​

4. Test Your Deployment​

Deployment Management​

Creating Deployments​

Listing Deployments​

Hardware Information​

Deleting Deployments​

Using Your Deployment​

Chat Completions​

CLI Commands Reference​

Deployment Management Commands​

Inference Commands​

Troubleshooting​

Common Issues​

Overview

Prerequisites

Supported Models

Text Models

Available Hardware

Quick Start Guide

1. Check Available Hardware

2. Create a Deployment

3. List Your Deployments

4. Test Your Deployment

Deployment Management

Creating Deployments

Listing Deployments

Hardware Information

Deleting Deployments

Using Your Deployment

Chat Completions

CLI Commands Reference

Deployment Management Commands

Inference Commands

Troubleshooting

Common Issues