Skip to main content
Integrate and interact with your deployed AI models through various interfaces including API calls, SDKs, and direct endpoints.

Test Your Deployment

Use your deployment for inference:
  • CLI
  • Python SDK
  • JavaScript SDK
gravixlayer chat --model "test_model" --user "Hello, how are you?"
Example Output:
Hello! I'm doing well, thank you for asking. How can I assist you today?

Chat Completions

Once your deployment is running, you can use it like any other model by referencing the deployment name:
  • CLI
  • Python SDK
  • JavaScript SDK
Basic Chat:
gravixlayer chat --model "test_model" --user "Hello, how are you?"
Streaming Chat:
gravixlayer chat --model "test_model" --user "Tell me a story" --stream
With System Message:
gravixlayer chat --model "test_model" --system "You are a helpful assistant" --user "Explain quantum computing"
Text Completion Mode:
gravixlayer chat --mode completions --model "test_model" --prompt "The future of AI is"
Streaming Completion:
gravixlayer chat --mode completions --model "test_model" --prompt "Write a poem about" --stream

Advanced Usage Examples

Batch Processing

  • Python SDK
  • JavaScript SDK
import os
from gravixlayer import GravixLayer

client = GravixLayer()

# Process multiple prompts with your deployment
prompts = [
    "Explain machine learning",
    "What is artificial intelligence?",
    "How do neural networks work?",
    "Describe deep learning"
]

responses = []
for prompt in prompts:
    response = client.chat.completions.create(
        model="test_model",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=100
    )
    responses.append({
        "prompt": prompt,
        "response": response.choices[0].message.content
    })

for item in responses:
    print(f"Q: {item['prompt']}")
    print(f"A: {item['response']}")
    print("---")

Performance Monitoring

  • Python SDK
  • JavaScript SDK
import os
import time
from gravixlayer import GravixLayer

client = GravixLayer()

def benchmark_deployment(model_name, test_prompts, iterations=3):
    """Benchmark deployment performance"""
    results = []
    
    for prompt in test_prompts:
        prompt_results = []
        
        for i in range(iterations):
            start_time = time.time()
            
            response = client.chat.completions.create(
                model=model_name,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=50
            )
            
            end_time = time.time()
            response_time = (end_time - start_time) * 1000  # Convert to ms
            
            prompt_results.append({
                "iteration": i + 1,
                "response_time_ms": response_time,
                "response_length": len(response.choices[0].message.content)
            })
        
        avg_time = sum(r["response_time_ms"] for r in prompt_results) / len(prompt_results)
        
        results.append({
            "prompt": prompt,
            "avg_response_time_ms": avg_time,
            "iterations": prompt_results
        })
    
    return results

# Benchmark your deployment
test_prompts = [
    "Hello, how are you?",
    "Explain AI in one sentence",
    "What is the weather like?"
]

benchmark_results = benchmark_deployment("test_model", test_prompts)

for result in benchmark_results:
    print(f"Prompt: {result['prompt']}")
    print(f"Average response time: {result['avg_response_time_ms']:.2f}ms")
    print("---")

Troubleshooting

Common Issues

Deployment Stuck in “Creating” Status:
  • Wait 5-10 minutes for initialization
  • Check hardware availability with gravixlayer deployments gpu --list
  • Verify model name is correct
Connection Errors:
  • Ensure deployment status is “running” before making requests
  • Verify deployment name matches exactly
  • Check API key configuration
Performance Issues:
  • Monitor deployment status and resource usage
  • Consider scaling up replicas for higher throughput
  • Check if model size matches your use case
I