Skip to main content

Querying Dedicated Deployments

Dedicated deployments provide isolated, scalable model instances with guaranteed capacity and enterprise-grade features

info

Public Preview: Gravix Layer is currently in Public preview. Features are experimental and may have issues or break as ongoing updates to API endpoints and models continue.

Overview

Dedicated deployments allow you to:

  • Guaranteed Capacity: Reserved compute resources for your workloads
  • Consistent Performance: Dedicated GPUs only (shared GPU support coming soon)
  • Custom Scaling: Configure replicas based on your needs

Prerequisites

Before deploying models, you need to set up your API key:

note

API Key Required: You must export your GravixLayer API key in your terminal before creating deployments. All deployment operations are tied to your API key and account.

Set your API key:

set GRAVIXLAYER_API_KEY=your_api_key_here

Supported Models

Text Models

The following models are available for dedicated deployment:

Model NameModel IDProviderParametersContext Length
Qwen: Qwen3 4B Instruct 2507qwen3-4b-instruct-2507Qwen4B32768
Qwen: Qwen3 4B Thinking 2507qwen3-4b-thinking-2507Qwen4B32768
Qwen: Qwen3 4Bqwen3-4bQwen4B32768
Qwen: Qwen3 1.7Bqwen3-1.7bQwen1.7B32768
Qwen: Qwen3 0.6Bqwen3-0.6bQwen0.6B32768

Available Hardware

Currently supported GPU configurations:

AcceleratorHardware StringMemoryPricing
NVIDIA T4nvidia-t4-16gb-pcie_116GB$0.39/hour

Quick Start Guide

1. Check Available Hardware

First, list available GPU options:

gravixlayer deployments gpu --list

Example Output:

Available GPUs (1 found):
Accelerator Hardware String Memory
------------------------------------------------------------
t4 nvidia-t4-16gb-pcie_1 16GB

2. Create a Deployment

Create a new dedicated deployment:

gravixlayer deployments create \
--deployment_name "test_model" \
--hardware "nvidia-t4-16gb-pcie_1" \
--model_name "qwen3-1-7b" \
--wait

Example Output:

Creating deployment 'test_model' with model 'qwen3-1-7b'...
✅ Deployment created successfully!

Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Status: creating
Model: qwen3-1-7b
Hardware: nvidia-t4-16gb-pcie_1

⏳ Waiting for deployment 'test_model' to be ready...
Press Ctrl+C to stop monitoring (deployment will continue in background)

Status: creating
Status: running

🚀 Deployment is now ready!
Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Status: running
Model: qwen3-1-7b
Hardware: nvidia-t4-16gb-pcie_1

3. List Your Deployments

View all your active deployments:

gravixlayer deployments list

Example Output:

Found 1 deployment(s):

Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Model: qwen3-1-7b
Status: running
Hardware: nvidia-t4-16gb-pcie_1
Replicas: 1
Created: 2025-09-03T15:57:47.021738Z

4. Test Your Deployment

Use your deployment for inference:

gravixlayer --model "test_model" --user "Hello, how are you?"

Deployment Management

Creating Deployments

Basic Creation:

gravixlayer deployments create \
--deployment_name "production_model" \
--hw_type "dedicated" \
--hardware "nvidia-t4-16gb-pcie_1" \
--min_replicas 2 \
--model_name "qwen3-4b-instruct-2507" \

With Wait Flag (Recommended):

gravixlayer deployments create \
--deployment_name "production_model" \
--hw_type "dedicated" \
--hardware "nvidia-t4-16gb-pcie_1" \
--min_replicas 2 \
--model_name "qwen3-4b-instruct-2507" \
--wait

Listing Deployments

gravixlayer deployments list

Example Output:

Found 1 deployment(s):

Deployment ID: a7283154-5ab2-42a4-b221-03c61664fa22
Deployment Name: test_model
Model: qwen3-1-7b
Status: running
Hardware: nvidia-t4-16gb-pcie_1
Replicas: 1
Created: 2025-09-03T15:57:47.021738Z

Hardware Information

List Available Hardware:

gravixlayer deployments gpu --list

Example Output:

Available GPUs (1 found):
Accelerator Hardware String Memory
------------------------------------------------------------
t4 nvidia-t4-16gb-pcie_1 16GB

Get Hardware as JSON:

gravixlayer deployments gpu --list --json

Example JSON Output:

[
{
"accelerator_id": "NVIDIA_T4_16GB",
"pricing": 0.39,
"hw_model": "T4",
"hw_link": "pcie",
"hw_memory": 16,
"provider": "NVIDIA",
"status": "available",
"updated_at": "2025-08-31T12:13:47Z",
"hardware_string": "nvidia-t4-16gb-pcie_1"
}
]

Deleting Deployments

gravixlayer deployments delete a7283154-5ab2-42a4-b221-03c61664fa22

Example Output:

Deleting deployment a7283154-5ab2-42a4-b221-03c61664fa22...
Deployment deleted successfully!
Response: {'message': 'deployment deleted successfully', 'status': 'success'}

Using Your Deployment

Chat Completions

Once your deployment is running, you can use it like any other model by referencing the deployment name:

Basic Chat:

gravixlayer --model "test_model" --user "Hello, how are you?"

Streaming Chat:

gravixlayer --model "test_model" --user "Tell me a story" --stream

With System Message:

gravixlayer --model "test_model" --system "You are a helpful assistant" --user "Explain quantum computing"

Text Completion Mode:

gravixlayer --mode completions --model "test_model" --prompt "The future of AI is"

Streaming Completion:

gravixlayer --mode completions --model "test_model" --prompt "Write a poem about" --stream

CLI Commands Reference

Deployment Management Commands

CommandDescriptionExample
deployments createCreate a new deploymentgravixlayer deployments create --deployment_name "my_model" --hardware "nvidia-t4-16gb-pcie_1" --model_name "qwen3-1-7b"
deployments listList all deploymentsgravixlayer deployments list
deployments deleteDelete a deploymentgravixlayer deployments delete <deployment_id>
deployments gpu --listList available hardwaregravixlayer deployments gpu --list
deployments gpu --list --jsonGet hardware info as JSONgravixlayer deployments gpu --list --json

Inference Commands

CommandDescriptionExample
Basic chatSend a chat messagegravixlayer --model "deployment_name" --user "Hello!"
Streaming chatGet streaming responsegravixlayer --model "deployment_name" --user "Tell me a story" --stream
Text completionUse completion modegravixlayer --mode completions --model "deployment_name" --prompt "Complete this"
With system messageInclude system promptgravixlayer --model "deployment_name" --system "You are a poet" --user "Write a haiku"

Troubleshooting

Common Issues

Deployment Stuck in "Creating" Status:

  • Wait 5-10 minutes for initialization
  • Check hardware availability with gravixlayer deployments gpu --list
  • Verify model name is correct

Connection Errors:

  • Ensure deployment status is "running" before making requests
  • Verify deployment name matches exactly
  • Check API key configuration