Skip to main content
Dedicated deployments provide isolated, scalable model instances with guaranteed capacity and enterprise-grade features.
Public Preview: Gravix Layer is currently in Public preview. Features are experimental and may have issues or break as ongoing updates to API endpoints and models continue.

Overview

Dedicated deployments allow you to:
  • Guaranteed Capacity: Reserved compute resources for your workloads
  • Consistent Performance: Dedicated GPUs only (shared GPU support coming soon)
  • Custom Scaling: Configure replicas based on your needs

Prerequisites

Before deploying models, you need to set up your API key:
API Key Required: You must export your GravixLayer API key in your terminal before creating deployments. All deployment operations are tied to your API key and account.
Set your API key:
  • Windows (CMD)
  • Windows (PowerShell)
  • Linux/macOS
set GRAVIXLAYER_API_KEY=your_api_key_here

Supported Models

Text Models

The following models are available for dedicated deployment:
Model NameModel IDProviderParametersContext Length
Qwen: Qwen2.5-VL-3B-Instructqwen2-5-vl-3b-instructQwen3B32,768
Qwen: Qwen3-4B-Instruct-2507qwen3-4b-instruct-2507Qwen4B262,144
Qwen: Qwen3-4B-Thinking-2507qwen3-4b-thinking-2507Qwen4B262,144
DeepSeek: DeepSeek-R1-Distill-Qwen-1.5Bdeepseek-r1-distill-qwen-1.5bDeepSeek1.5B32,768
Qwen: Qwen3-4Bqwen3-4bQwen4B32,768
Qwen: Qwen3-1.7Bqwen3-1.7bQwen1.7B32,768
Qwen: Qwen3-0.6Bqwen3-0.6bQwen0.6B32,768

Available Hardware

Currently supported GPU configurations:
AcceleratorGPU ModelMemoryPricing
NVIDIA T4NVIDIA_T4_16GB16GB$0.39/hour

GPU Count Validation

The --gpu_count parameter only accepts the following values: 1, 2, 4, 8 If you provide any other value, you’ll receive an error:
❌ Error: GPU count must be one of: 1, 2, 4, 8. You provided: 3
Only these GPU counts are supported.
I