This comprehensive tutorial will guide you through using the GravixLayer SDK for Python and JavaScript SDK. You’ll learn how to perform chat completions, text completions, embeddings, streaming, function calling, and more.
Public Preview: Gravix Layer is currently in Public preview. Features are experimental and may have issues or break as ongoing updates to API endpoints and models continue.
Prerequisites
Before starting, make sure you have:
- Python 3.7+ or Node.js 14+ installed
- A GravixLayer API key
- The GravixLayer SDK installed
Setup
Python SDK
JavaScript SDK
First, install the GravixLayer Python SDK:Set your API key as an environment variable:export GRAVIXLAYER_API_KEY="your_api_key_here"
Your First Request
Python SDK
JavaScript SDK
Create a new file called main.py:import os
from gravixlayer import GravixLayer
client = GravixLayer()
completion = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=[{"role": "user", "content": "Hello, world!"}]
)
print(completion.choices[0].message.content)
Run the script:
You should see a response from the model!
Chat Completions
Simple conversation with the AI:
Python SDK
JavaScript SDK
from gravixlayer import GravixLayer
client = GravixLayer()
response = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"}
]
)
print(response.choices[0].message.content)
Completions
Simple text completion from a prompt:
Python SDK
JavaScript SDK
from gravixlayer import GravixLayer
client = GravixLayer()
completion = client.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
prompt="The future of artificial intelligence is",
max_tokens=50
)
print(completion.choices[0].text.strip())
Streaming Responses
Get responses in real-time:
Python SDK
JavaScript SDK
from gravixlayer import GravixLayer
client = GravixLayer()
# Chat streaming
stream = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=[{"role": "user", "content": "Tell me a short story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print() # New line
# Completions streaming
stream = client.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
prompt="Write a poem about",
max_tokens=100,
stream=True
)
for chunk in stream:
if chunk.choices[0].text is not None:
print(chunk.choices[0].text, end="", flush=True)
print() # New line
Embeddings
Generate text embeddings:
Python SDK
JavaScript SDK
import os
import json
from gravixlayer import GravixLayer
client = GravixLayer(
api_key=os.environ.get("GRAVIXLAYER_API_KEY"),
)
embedding = client.embeddings.create(
model="meta-llama/llama-3.1-8b-instruct",
input="Why is the sky blue?",
)
print(json.dumps(embedding.model_dump(), indent=2))
Function Calling
Let AI call your functions:
Python SDK
JavaScript SDK
import os
import json
import requests
from gravixlayer import GravixLayer
# Define a simple function
def get_weather(latitude, longitude):
"""Get current temperature for coordinates."""
url = f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}¤t=temperature_2m"
response = requests.get(url)
data = response.json()
return data['current']['temperature_2m']
client = GravixLayer()
# Define the function for the AI
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current temperature for a location",
"parameters": {
"type": "object",
"properties": {
"latitude": {"type": "number"},
"longitude": {"type": "number"}
},
"required": ["latitude", "longitude"]
}
}
}]
# Ask AI to call the function
messages = [{"role": "user", "content": "What's the weather in Paris?"}]
completion = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=messages,
tools=tools,
tool_choice="auto"
)
response_message = completion.choices[0].message
messages.append(response_message)
# Check if AI wants to call function
if response_message.tool_calls:
tool_call = response_message.tool_calls[0]
# Call the function
args = json.loads(tool_call.function.arguments)
result = get_weather(args["latitude"], args["longitude"])
# Send result back to AI
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": "get_weather",
"content": str(result)
})
# Get final response
final_completion = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=messages,
tools=tools
)
print(final_completion.choices[0].message.content)
Building a Simple Chatbot
Create an interactive chatbot with conversation memory:
Python SDK
JavaScript SDK
from gravixlayer import GravixLayer
class SimpleChatbot:
def __init__(self):
self.client = GravixLayer()
self.conversation = []
def chat(self, message):
# Add user message to conversation
self.conversation.append({"role": "user", "content": message})
# Create messages with system prompt
messages = [
{"role": "system", "content": "You are a helpful assistant."}
] + self.conversation
# Get response from the model
completion = self.client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=messages,
temperature=0.7,
max_tokens=150
)
# Add assistant response to conversation
assistant_message = completion.choices[0].message.content
self.conversation.append({"role": "assistant", "content": assistant_message})
return assistant_message
# Use the chatbot
bot = SimpleChatbot()
print(bot.chat("What is Python programming?"))
print(bot.chat("Can you give me an example?"))
Async Operations
Python SDK
JavaScript SDK
import asyncio
import os
from gravixlayer import AsyncGravixLayer
class AsyncChatbot:
def __init__(self):
self.client = AsyncGravixLayer()
async def chat(self, message):
completion = await self.client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
messages=[{"role": "user", "content": message}],
temperature=0.7
)
return completion.choices[0].message.content
async def chat_multiple(self, messages):
"""Handle multiple messages concurrently"""
tasks = [self.chat(msg) for msg in messages]
return await asyncio.gather(*tasks)
async def main():
bot = AsyncChatbot()
# Single request
response = await bot.chat("What is Python?")
print(f"Single: {response}")
# Multiple concurrent requests
messages = [
"What is machine learning?",
"Explain neural networks",
"What is deep learning?"
]
responses = await bot.chat_multiple(messages)
for i, response in enumerate(responses):
print(f"Response {i+1}: {response[:100]}...")
asyncio.run(main())
CLI Usage
Use the command line interface:
Python SDK
JavaScript SDK
CLI Structure: All inference operations now use the chat subcommand: gravixlayer chat [options]
# Install globally
pip install gravixlayer
# Chat completion
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Hello!"
# Text completion
gravixlayer chat --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "The future of AI is"
# Chat with system message
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --system "You are a helpful assistant" --user "Explain AI"
# Streaming chat
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Tell a story" --stream
# Streaming completion
gravixlayer chat --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "Write a poem" --stream
Deployments
Manage dedicated model deployments:
Python SDK
JavaScript SDK
import os
from gravixlayer import GravixLayer
# Initialize the client
client = GravixLayer(api_key=os.environ.get("GRAVIXLAYER_API_KEY"))
# Create a deployment
deployment = client.deployments.create(
deployment_name="custom_model",
model_name="qwen3-1.7b",
gpu_model="NVIDIA_T4_16GB",
gpu_count=1,
min_replicas=1,
hw_type="dedicated"
)
print(f"Created deployment: {deployment.id}")
# List all deployments
deployments = client.deployments.list()
for deployment in deployments:
print(f"Deployment: {deployment.name} - Status: {deployment.status}")
# Delete a deployment
client.deployments.delete(deployment_id="your_deployment_id")
# List available hardware
hardware_options = client.deployments.list_hardware()
for hardware in hardware_options:
print(f"Hardware: {hardware.name} - Memory: {hardware.memory}")
# Get hardware as JSON
hardware_json = client.deployments.list_hardware(format="json")
print(hardware_json)
Deployment Benefits
Dedicated deployments provide:
- Guaranteed capacity with no cold starts
- Consistent performance and low latency
- Isolated resources for enterprise workloads
- Custom scaling policies and configurations
Conclusion
This tutorial covered the usage of the GravixLayer SDK in both Python and JavaScript SDK:
- Chat Completions - Conversational AI
- Text Completions - Prompt-based text generation
- Embeddings - Text similarity and search
- Streaming - Real-time responses
- Function Calling - AI tool integration
- Chatbot - Interactive conversation with memory
- Async Support - High-performance operations
- Deployment Management - Dedicated model instances
- CLI Interface - Command-line usage
- Memory Management - Mem0-compatible memory system (JavaScript only)
Each example is simple and can be used as a starting point for your applications. Both SDKs provide identical functionality with language-specific optimizations. For more advanced features, check the Dedicated Deployments documentation.