Check Available Hardware
First, list available GPU options:- CLI
- Python SDK
- JavaScript SDK
Create a Deployment
Create a new dedicated deployment:- CLI
- Python SDK
- JavaScript SDK
Basic deployment:With additional parameters:Example Output:
Deployment Parameters
Parameter breakdown for readability:| Parameter | Value | Description |
|---|---|---|
--deployment_name | "test_model" | Unique name for your deployment |
--model_name | "qwen3-1.7b" | Model to deploy |
--gpu_model | "NVIDIA_T4_16GB" | GPU type |
--gpu_count | 1 | Number of GPUs |
--min_replicas | 1 | Minimum number of replicas |
--max_replicas | 3 | Maximum number of replicas |
--wait | (flag) | Wait for deployment to be ready |
Advanced Creation Options
- CLI
- Python SDK
- JavaScript SDK
With Auto-Retry:With Wait Flag (Recommended):
Create Parameters Reference
| Parameter | Type | Required | Description |
|---|---|---|---|
--deployment_name | string | Yes | Unique name for the deployment |
--model_name | string | Yes | Model name to deploy |
--gpu_model | string | Yes | GPU model (e.g., NVIDIA_T4_16GB) |
--gpu_count | int | No | Number of GPUs (supported: 1, 2, 4, 8) |
--min_replicas | int | No | Minimum replicas (default: 1) |
--max_replicas | int | No | Maximum replicas (default: 1) |
--hw_type | string | No | Hardware type (default: dedicated) |
--auto-retry | flag | No | Auto-retry with unique name if name exists |
--wait | flag | No | Wait for deployment to be ready before exiting |
Auto-Retry Feature
Use the--auto-retry flag to automatically generate a unique deployment name if the specified name already exists:
my-model-1234abcd if my-model already exists.
Wait for Deployment
Use the--wait flag to monitor deployment status and wait until it’s ready:

