Skip to main content
This guide covers the fundamental features of the GravixLayer SDKs, including setup, authentication, and key concepts.

Features Supported

The GravixLayer SDKs currently support:
  • Chat Completions - Text generation and conversation handling
  • Completions - Text completion and generation endpoint (prompt-based)
  • Embeddings API - Text embeddings for semantic search and similarity
  • Streaming Responses - Real-time response streaming for both chat and completions
  • Asynchronous Support - Non-blocking operations with async/await
  • Function Calling - Tool integration and function calling capabilities
  • CLI Interface - Command-line interface for all endpoints
  • Deployment Management - Create, manage, and deploy dedicated model instances
  • Logging - Built-in logging capabilities
  • HTTP Support - Support for HTTP requests for local development

GravixLayer Client

Working with the GravixLayer API is as simple as instantiating the client and making requests:
  • Python SDK
  • JavaScript SDK
import os
from gravixlayer import GravixLayer

# Make sure to set your API key in the terminal before running this script:
# export GRAVIXLAYER_API_KEY=<API_KEY>

client = GravixLayer()

response = client.chat.completions.create(
    model="meta-llama/llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "Hello, world!"}]
)

print(response.choices[0].message.content)

Completions

For simple prompt-based text generation, use the completions endpoint:
  • Python SDK
  • JavaScript SDK
import os
from gravixlayer import GravixLayer

client = GravixLayer()

# Use the completions endpoint (prompt-based)
completion = client.completions.create(
    model="meta-llama/llama-3.1-8b-instruct",
    prompt="What are the three most popular programming languages?",
    max_tokens=150,
    temperature=0.7,
)

# Print the generated text
print(completion.choices[0].text.strip())

Streaming Completions

  • Python SDK
  • JavaScript SDK
# Streaming text completion
stream = client.completions.create(
    model="meta-llama/llama-3.1-8b-instruct",
    prompt="Write a poem about the ocean",
    max_tokens=200,
    temperature=0.8,
    stream=True
)

for chunk in stream:
    if chunk.choices[0].text is not None:
        print(chunk.choices[0].text, end="", flush=True)

Chat Completions

You can customize various parameters to take full advantage of the available models:
  • Python SDK
  • JavaScript SDK
response = client.chat.completions.create(
    model="meta-llama/llama-3.1-8b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing"}
    ],
    temperature=0.7,
    max_tokens=200,
    stream=False
)

Streaming Responses

For real-time applications, you can stream responses as they’re generated:
  • Python SDK
  • JavaScript SDK
stream = client.chat.completions.create(
    model="meta-llama/llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

Async Support

The SDKs provide full async support for high-performance applications:
  • Python SDK
  • JavaScript SDK
import asyncio
from gravixlayer import AsyncGravixLayer
import os

# Make sure to set your API key in the terminal before running this script:
# export GRAVIXLAYER_API_KEY=<API_KEY>

async def main():
    client = AsyncGravixLayer()
    
    # Chat completions
    response = await client.chat.completions.create(
        model="meta-llama/llama-3.1-8b-instruct",
        messages=[{"role": "user", "content": "Hello, async world!"}]
    )
    
    print(response.choices[0].message.content)
    
    # Text completions
    completion = await client.completions.create(
        model="meta-llama/llama-3.1-8b-instruct",
        prompt="The future of AI is",
        max_tokens=100
    )
    
    print(completion.choices[0].text.strip())

asyncio.run(main())

CLI Interface

The GravixLayer SDK includes a powerful command-line interface for quick testing and automation:
CLI Structure: The CLI uses subcommands for different operations:
  • gravixlayer chat - For chat completions, text completions, and embeddings
  • gravixlayer deployments - For deployment management
  • gravixlayer files - For file management

Chat Completions via CLI

# Basic chat completion
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Hello, how are you?"

# Streaming chat completion
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Tell me a story" --stream

# Chat with system message
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --system "You are a helpful assistant" --user "Explain AI"

Text Completions via CLI

# Text completion
gravixlayer chat --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "The future of AI is"

# Streaming text completion  
gravixlayer chat --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "Write a poem" --stream

# Completion with custom parameters
gravixlayer chat --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "Explain machine learning" --max-tokens 200 --temperature 0.8

Embeddings via CLI

# Generate embeddings
gravixlayer chat --mode embeddings --model "text-embedding-ada-002" --text "Hello world"

# Multiple texts
gravixlayer chat --mode embeddings --model "text-embedding-ada-002" --text "Text 1" --text "Text 2"

Deployments via CLI

The CLI also supports deployment management for dedicated model instances:
# List available hardware options
gravixlayer deployments gpu --list

# Create a new deployment
gravixlayer deployments create --deployment_name "my-model" --model_name "qwen3-1.7b" --gpu_model "NVIDIA_T4_16GB" --gpu_count 1 --min_replicas 1 --hw_type "dedicated"

# List all deployments
gravixlayer deployments list

# Delete a deployment
gravixlayer deployments delete <deployment_id>

# Get hardware info as JSON
gravixlayer deployments gpu --list --json

CLI Options

Main Commands:
  • chat: Chat and completion operations
  • deployments: Deployment management
  • files: File management
Chat Command Options:
  • --mode: Operation mode (chat, completions, embeddings)
  • --model: Model to use
  • --user: User message for chat mode
  • --prompt: Prompt for completions mode
  • --text: Text for embeddings mode
  • --system: System message for chat mode
  • --stream: Enable streaming output
  • --max-tokens: Maximum tokens to generate
  • --temperature: Sampling temperature (0.0-2.0)
  • --top-p: Nucleus sampling parameter
  • --frequency-penalty: Frequency penalty (-2.0 to 2.0)
  • --presence-penalty: Presence penalty (-2.0 to 2.0)

Key Concepts

Models

GravixLayer provides access to various language models. Specify the model using the model identifier:
  • Python SDK
  • JavaScript SDK
# Different models for different use cases
models = [
    "meta-llama/llama-3.1-8b-instruct",    # Fast, efficient model
    "deepseek-r1:1.5b",   # More capable, larger model
]

for model in models:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(f"{model}: {response.choices[0].message.content}")

Message Format (Chat Completions)

Messages follow the OpenAI chat format:
  • Python SDK
  • JavaScript SDK
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather like?"},
    {"role": "assistant", "content": "I don't have access to real-time weather data."},
    {"role": "user", "content": "How can I check the weather?"}
]

Prompt Format (Completions)

For the completions endpoint, use a simple string prompt:
  • Python SDK
  • JavaScript SDK
prompt = "Complete this sentence: The most important skill for a programmer is"

Temperature and Parameters

Control model behavior with various parameters:
  • Python SDK
  • JavaScript SDK
# More deterministic responses (Chat)
response = client.chat.completions.create(
    model="meta-llama/llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    temperature=0.1,  # Low temperature for factual responses
    max_tokens=50
)

# More creative responses (Completions)
completion = client.completions.create(
    model="meta-llama/llama-3.1-8b-instruct",
    prompt="Write a creative story about",
    temperature=0.9,  # High temperature for creativity
    max_tokens=200
)
I