Querying Dedicated Deployments
Dedicated deployments provide isolated, scalable model instances with guaranteed capacity and enterprise-grade features
Public Preview: Gravix Layer is currently in Public preview. Features are experimental and may have issues or break as ongoing updates to API endpoints and models continue.
Overview
Dedicated deployments allow you to:
- Guaranteed Capacity: Reserved compute resources for your workloads
- Consistent Performance: Dedicated GPUs only (shared GPU support coming soon)
- Custom Scaling: Configure replicas based on your needs
Prerequisites
Before deploying models, you need to set up your API key:
API Key Required: You must export your GravixLayer API key in your terminal before creating deployments. All deployment operations are tied to your API key and account.
Set your API key:
- Windows (CMD)
- Windows (PowerShell)
- Linux/macOS
set GRAVIXLAYER_API_KEY=your_api_key_here
$env:GRAVIXLAYER_API_KEY="your_api_key_here"
export GRAVIXLAYER_API_KEY=your_api_key_here
Supported Models
Text Models
The following models are available for dedicated deployment:
Model Name | Model ID | Provider | Parameters | Context Length |
---|---|---|---|---|
Qwen: Qwen3 4B Instruct 2507 | qwen3-4b-instruct-2507 | Qwen | 4B | 32768 |
Qwen: Qwen3 4B Thinking 2507 | qwen3-4b-thinking-2507 | Qwen | 4B | 32768 |
Qwen: Qwen3 4B | qwen3-4b | Qwen | 4B | 32768 |
Qwen: Qwen3 1.7B | qwen3-1.7b | Qwen | 1.7B | 32768 |
Qwen: Qwen3 0.6B | qwen3-0.6b | Qwen | 0.6B | 32768 |
Available Hardware
Currently supported GPU configurations:
Accelerator | Hardware String | Memory | Pricing |
---|---|---|---|
NVIDIA T4 | nvidia-t4-16gb-pcie_1 | 16GB | $0.39/hour |
Quick Start Guide
1. Check Available Hardware
First, list available GPU options:
gravixlayer deployments gpu --list
Example Output:
Available GPUs (1 found):
Accelerator Hardware String Memory
------------------------------------------------------------
t4 nvidia-t4-16gb-pcie_1 16GB
2. Create a Deployment
Create a new dedicated deployment:
gravixlayer deployments create \
--deployment_name "test_model" \
--hardware "nvidia-t4-16gb-pcie_1" \
--model_name "qwen3-1-7b" \
--wait
Example Output:
Creating deployment 'test_model' with model 'qwen3-1-7b'...
✅ Deployment created successfully!
Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Status: creating
Model: qwen3-1-7b
Hardware: nvidia-t4-16gb-pcie_1
⏳ Waiting for deployment 'test_model' to be ready...
Press Ctrl+C to stop monitoring (deployment will continue in background)
Status: creating
Status: running
🚀 Deployment is now ready!
Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Status: running
Model: qwen3-1-7b
Hardware: nvidia-t4-16gb-pcie_1
3. List Your Deployments
View all your active deployments:
gravixlayer deployments list
Example Output:
Found 1 deployment(s):
Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Model: qwen3-1-7b
Status: running
Hardware: nvidia-t4-16gb-pcie_1
Replicas: 1
Created: 2025-09-03T15:57:47.021738Z
4. Test Your Deployment
Use your deployment for inference:
gravixlayer --model "test_model" --user "Hello, how are you?"
Deployment Management
Creating Deployments
- CLI
- Python SDK
Basic Creation:
gravixlayer deployments create \
--deployment_name "production_model" \
--hw_type "dedicated" \
--hardware "nvidia-t4-16gb-pcie_1" \
--min_replicas 2 \
--model_name "qwen3-4b-instruct-2507" \
With Wait Flag (Recommended):
gravixlayer deployments create \
--deployment_name "production_model" \
--hw_type "dedicated" \
--hardware "nvidia-t4-16gb-pcie_1" \
--min_replicas 2 \
--model_name "qwen3-4b-instruct-2507" \
--wait
import os
from gravixlayer import GravixLayer
# Initialize the client
client = GravixLayer(api_key=os.environ.get("GRAVIXLAYER_API_KEY"))
# Create a deployment
deployment = client.deployments.create(
deployment_name="production_model",
hw_type="dedicated",
hardware="nvidia-t4-16gb-pcie_1",
min_replicas=2,
model_name="qwen3-4b-instruct-2507"
)
print(f"Created deployment: {deployment.id}")
Listing Deployments
- CLI
- Python SDK
gravixlayer deployments list
Example Output:
Found 1 deployment(s):
Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Model: qwen3-1-7b
Status: running
Hardware: nvidia-t4-16gb-pcie_1
Replicas: 1
Created: 2025-09-03T15:57:47.021738Z
# List all deployments
deployments = client.deployments.list()
for deployment in deployments:
print(f"Deployment: {deployment.name} - Status: {deployment.status}")
print(f"ID: {deployment.id}")
print(f"Model: {deployment.model}")
print(f"Hardware: {deployment.hardware}")
print(f"Replicas: {deployment.replicas}")
print("---")
Hardware Information
- CLI
- Python SDK
List Available Hardware:
gravixlayer deployments gpu --list
Example Output:
Available GPUs (1 found):
Accelerator Hardware String Memory
------------------------------------------------------------
t4 nvidia-t4-16gb-pcie_1 16GB
Get Hardware as JSON:
gravixlayer deployments gpu --list --json
Example JSON Output:
[
{
"accelerator_id": "NVIDIA_T4_16GB",
"pricing": 0.39,
"hw_model": "T4",
"hw_link": "pcie",
"hw_memory": 16,
"provider": "NVIDIA",
"status": "available",
"updated_at": "2025-08-31T12:13:47Z",
"hardware_string": "nvidia-t4-16gb-pcie_1"
}
]
# List available hardware
hardware_options = client.deployments.list_hardware()
for hardware in hardware_options:
print(f"Hardware: {hardware.hardware_string}")
print(f"Model: {hardware.hw_model}")
print(f"Memory: {hardware.hw_memory}GB")
print(f"Pricing: ${hardware.pricing}/hour")
print(f"Status: {hardware.status}")
print("---")
# Get hardware as JSON
hardware_json = client.deployments.list_hardware(format="json")
print(hardware_json)
Deleting Deployments
- CLI
- Python SDK
gravixlayer deployments delete a7283154-5ab2-42a4-b221-03c61664fa22
Example Output:
Deleting deployment a7283154-5ab2-42a4-b221-03c61664fa22...
Deployment deleted successfully!
Response: {'message': 'deployment deleted successfully', 'status': 'success'}
# Delete a deployment
result = client.deployments.delete(deployment_id="a7283154-5ab2-42a4-b221-03c61664fa22")
print(f"Deletion result: {result}")
Using Your Deployment
Chat Completions
Once your deployment is running, you can use it like any other model by referencing the deployment name:
- CLI
- Python SDK
Basic Chat:
gravixlayer --model "test_model" --user "Hello, how are you?"
Streaming Chat:
gravixlayer --model "test_model" --user "Tell me a story" --stream
With System Message:
gravixlayer --model "test_model" --system "You are a helpful assistant" --user "Explain quantum computing"
Text Completion Mode:
gravixlayer --mode completions --model "test_model" --prompt "The future of AI is"
Streaming Completion:
gravixlayer --mode completions --model "test_model" --prompt "Write a poem about" --stream
# Basic chat completion
response = client.chat.completions.create(
model="test_model",
messages=[
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message.content)
# Streaming chat
stream = client.chat.completions.create(
model="test_model",
messages=[
{"role": "user", "content": "Tell me a story"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
# With system message
response = client.chat.completions.create(
model="test_model",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Explain quantum computing"}
]
)
CLI Commands Reference
Deployment Management Commands
Command | Description | Example |
---|---|---|
deployments create | Create a new deployment | gravixlayer deployments create --deployment_name "my_model" --hardware "nvidia-t4-16gb-pcie_1" --model_name "qwen3-1-7b" |
deployments list | List all deployments | gravixlayer deployments list |
deployments delete | Delete a deployment | gravixlayer deployments delete <deployment_id> |
deployments gpu --list | List available hardware | gravixlayer deployments gpu --list |
deployments gpu --list --json | Get hardware info as JSON | gravixlayer deployments gpu --list --json |
Inference Commands
Command | Description | Example |
---|---|---|
Basic chat | Send a chat message | gravixlayer --model "deployment_name" --user "Hello!" |
Streaming chat | Get streaming response | gravixlayer --model "deployment_name" --user "Tell me a story" --stream |
Text completion | Use completion mode | gravixlayer --mode completions --model "deployment_name" --prompt "Complete this" |
With system message | Include system prompt | gravixlayer --model "deployment_name" --system "You are a poet" --user "Write a haiku" |
Troubleshooting
Common Issues
Deployment Stuck in "Creating" Status:
- Wait 5-10 minutes for initialization
- Check hardware availability with
gravixlayer deployments gpu --list
- Verify model name is correct
Connection Errors:
- Ensure deployment status is "running" before making requests
- Verify deployment name matches exactly
- Check API key configuration