Basics of the GravixLayer SDK
Features Supported
The GravixLayer Python SDK currently supports:
- Chat Completions - Text generation and conversation handling
- Completions - Text completion and generation endpoint (prompt-based)
- Embeddings API - Text embeddings for semantic search and similarity
- Streaming Responses - Real-time response streaming for both chat and completions
- Asynchronous Support - Non-blocking operations with async/await
- Function Calling - Tool integration and function calling capabilities
- CLI Interface - Command-line interface for all endpoints
- Deployment Management - Create, manage, and deploy dedicated model instances
- Logging - Built-in logging capabilities
- HTTP Support - Support for HTTP requests for local development
The GravixLayer Client
Working with the GravixLayer API is as simple as instantiating the client and making requests:
import os
from gravixlayer import GravixLayer
# Make sure to set your API key in the terminal before running this script:
# export GRAVIXLAYER_API_KEY=<API_KEY>
client = GravixLayer()
response = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=[{"role": "user", "content": "Hello, world!"}]
)
print(response.choices[0].message.content)
Completions Endpoint
For simple prompt-based text generation, use the completions endpoint:
import os
from gravixlayer import GravixLayer
client = GravixLayer()
# Use the completions endpoint (prompt-based)
completion = client.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
prompt="What are the three most popular programming languages?",
max_tokens=150,
temperature=0.7,
)
# Print the generated text
print(completion.choices[0].text.strip())
Streaming Completions
# Streaming text completion
stream = client.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
prompt="Write a poem about the ocean",
max_tokens=200,
temperature=0.8,
stream=True
)
for chunk in stream:
if chunk.choices[0].text is not None:
print(chunk.choices[0].text, end="", flush=True)
Chat Completions
You can customize various parameters to take full advantage of the available models:
response = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing"}
],
temperature=0.7,
max_tokens=200,
stream=False
)
Streaming Responses
For real-time applications, you can stream responses as they're generated:
stream = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
Async Support
The SDK provides full async support for high-performance applications:
import asyncio
from gravixlayer import AsyncGravixLayer
import os
# Make sure to set your API key in the terminal before running this script:
# export GRAVIXLAYER_API_KEY=<API_KEY>
async def main():
client = AsyncGravixLayer()
# Chat completions
response = await client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=[{"role": "user", "content": "Hello, async world!"}]
)
print(response.choices[0].message.content)
# Text completions
completion = await client.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
prompt="The future of AI is",
max_tokens=100
)
print(completion.choices[0].text.strip())
asyncio.run(main())
CLI Interface
The GravixLayer SDK includes a powerful command-line interface for quick testing and automation:
Chat Completions via CLI
# Basic chat completion
gravixlayer --model "meta-llama/llama-3.1-8b-instruct" --user "Hello, how are you?"
# Streaming chat completion
gravixlayer --model "meta-llama/llama-3.1-8b-instruct" --user "Tell me a story" --stream
# Chat with system message
gravixlayer --model "meta-llama/llama-3.1-8b-instruct" --system "You are a helpful assistant" --user "Explain AI"
Text Completions via CLI
# Text completion
gravixlayer --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "The future of AI is"
# Streaming text completion
gravixlayer --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "Write a poem" --stream
# Completion with custom parameters
gravixlayer --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "Explain machine learning" --max-tokens 200 --temperature 0.8
Embeddings via CLI
# Generate embeddings
gravixlayer --mode embeddings --model "text-embedding-ada-002" --text "Hello world"
# Multiple texts
gravixlayer --mode embeddings --model "text-embedding-ada-002" --text "Text 1" --text "Text 2"
Deployment Management via CLI
The CLI also supports deployment management for dedicated model instances:
# List available hardware options
gravixlayer deployments gpu --list
# Create a new deployment
gravixlayer deployments create --deployment_name "my-model" --hw_type "dedicated" --hardware "nvidia-t4-16gb-pcie_1" --min_replicas 1 --model_name "qwen3-1.7b"
# List all deployments
gravixlayer deployments list
# Delete a deployment
gravixlayer deployments delete <deployment_id>
# Get hardware info as JSON
gravixlayer deployments gpu --list --json
CLI Options
--mode
: Operation mode (chat
,completions
,embeddings
)--model
: Model to use--message
: Message for chat mode--prompt
: Prompt for completions mode--text
: Text for embeddings mode--system
: System message for chat mode--stream
: Enable streaming output--max-tokens
: Maximum tokens to generate--temperature
: Sampling temperature (0.0-2.0)--top-p
: Nucleus sampling parameter--frequency-penalty
: Frequency penalty (-2.0 to 2.0)--presence-penalty
: Presence penalty (-2.0 to 2.0)
Key Concepts
Models
GravixLayer provides access to various language models. Specify the model using the model identifier:
# Different models for different use cases
models = [
"meta-llama/llama-3.1-8b-instruct", # Fast, efficient model
"deepseek-r1:1.5b", # More capable, larger model
]
for model in models:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Hello"}]
)
print(f"{model}: {response.choices[0].message.content}")
Message Format (Chat Completions)
Messages follow the OpenAI chat format:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather like?"},
{"role": "assistant", "content": "I don't have access to real-time weather data."},
{"role": "user", "content": "How can I check the weather?"}
]
Prompt Format (Completions)
For the completions endpoint, use a simple string prompt:
prompt = "Complete this sentence: The most important skill for a programmer is"
Temperature and Parameters
Control model behavior with various parameters:
# More deterministic responses (Chat)
response = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=[{"role": "user", "content": "What is 2+2?"}],
temperature=0.1, # Low temperature for factual responses
max_tokens=50
)
# More creative responses (Completions)
completion = client.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
prompt="Write a creative story about",
temperature=0.9, # High temperature for creativity
max_tokens=200
)