Public Preview: Gravix Layer is currently in Public preview. Features are experimental and may have issues or break as ongoing updates to API endpoints and models continue.
Overview
Dedicated deployments allow you to:- Guaranteed Capacity: Reserved compute resources for your workloads
- Consistent Performance: Dedicated GPUs only (shared GPU support coming soon)
- Custom Scaling: Configure replicas based on your needs
Prerequisites
Before deploying models, you need to set up your API key:API Key Required: You must export your GravixLayer API key in your terminal before creating deployments. All deployment operations are tied to your API key and account.
- Windows (CMD)
- Windows (PowerShell)
- Linux/macOS
Supported Models
Text Models
The following models are available for dedicated deployment:| Model Name | Model ID | Provider | Parameters | Context Length |
|---|---|---|---|---|
| Qwen: Qwen2.5-VL-3B-Instruct | qwen2-5-vl-3b-instruct | Qwen | 3B | 32,768 |
| Qwen: Qwen3-4B-Instruct-2507 | qwen3-4b-instruct-2507 | Qwen | 4B | 262,144 |
| Qwen: Qwen3-4B-Thinking-2507 | qwen3-4b-thinking-2507 | Qwen | 4B | 262,144 |
| DeepSeek: DeepSeek-R1-Distill-Qwen-1.5B | deepseek-r1-distill-qwen-1.5b | DeepSeek | 1.5B | 32,768 |
| Qwen: Qwen3-4B | qwen3-4b | Qwen | 4B | 32,768 |
| Qwen: Qwen3-1.7B | qwen3-1.7b | Qwen | 1.7B | 32,768 |
| Qwen: Qwen3-0.6B | qwen3-0.6b | Qwen | 0.6B | 32,768 |
Available Hardware
Currently supported GPU configurations:| Accelerator | GPU Model | Memory | Pricing |
|---|---|---|---|
| NVIDIA T4 | NVIDIA_T4_16GB | 16GB | $0.39/hour |
GPU Count Validation
The--gpu_count parameter only accepts the following values: 1, 2, 4, 8
If you provide any other value, you’ll receive an error:

