Skip to main content

Chat

The Gravix Layer Chat API is the primary way to interact with our powerful open-source language models. It's designed to be flexible, supporting everything from simple single-turn Q&A to complex, multi-turn conversations that retain context over time.

This guide will walk you through the essential features of the Chat API.

Supported Models

  • llama3.1:8b-instruct-fp16

Quickstart

Get started in minutes with this minimal example. First, ensure you have the OpenAI library installed and your Gravix Layer API key is set as an environment variable.

This example sends a single question to the recommended Meta Llama 3.1 model and prints the response.

import os
from openai import OpenAI

client = OpenAI(
base_url="https://api.gravixlayer.com/v1/inference",
api_key=os.environ.get("GRAVIXLAYER_API_KEY"),
)

completion = client.chat.completions.create(
model="llama3.1:8b-instruct-fp16",
messages=[
{"role": "system", "content": "You are a helpful and friendly assistant."},
{"role": "user", "content": "What are the three most popular programming languages?"}
]
)

print(completion.choices[0].message.content)

Expected Response

The API will return a completion object, and the content of the model's message will look something like this:

Based on recent developer surveys and industry trends from sources like Stack Overflow and GitHub, the three most popular programming languages are typically:

JavaScript: It is the backbone of the web and is essential for front-end development. With frameworks like Node.js, it's also a dominant force in back-end development.

Python: Known for its simplicity and readability, Python is incredibly versatile. It's widely used in data science, machine learning, web development (with frameworks like Django and Flask), and automation.

SQL: While some consider it a query language rather than a general-purpose programming language, SQL is indispensable for managing and interacting with relational databases, making it a critical skill for nearly any developer.

The Conversation: Understanding Messages

The core of the Chat API is the messages array. This array creates a history of the conversation, allowing the model to understand context from previous turns. Each object in the array represents a single message and must have a role and content.

There are three primary roles:

The system Role

The system message sets the overall behavior, personality, and constraints for the AI model. It's the first message in the array and acts as a high-level instruction that guides the entire conversation.

Best Practice: Use a clear and direct system prompt to define the model's persona, tone, and what it should or should not do.

"messages": [
{"role": "system", "content": "You are a pirate captain from the 17th century. You speak in a thick pirate accent and are obsessed with finding treasure."}
]

The user Role

This is the message from your end-user. It contains their questions, prompts, or instructions.

{"role": "user", "content": "What is the best way to store gold?"}

The assistant Role

This is a previous response from the model. By including the assistant's past messages in the messages array for subsequent requests, you provide the model with a memory of the conversation.

Multi-Turn Example

Here's how the messages array grows during a conversation to maintain context:

// The full messages array for the second turn of a conversation
"messages": [
{"role": "system", "content": "You are a helpful assistant."},

// First turn
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},

// Second turn (current request)
{"role": "user", "content": "And what is its most famous landmark?"}
]

By providing the full history, the model knows that "its" refers to Paris.


Streaming Responses

For real-time applications like chatbots, waiting for the full response can feel slow. You can stream the response as it's being generated by the model by setting stream=True in your request. This allows you to display the words to the user token-by-token, creating a much more responsive user experience.

import os
from openai import OpenAI

client = OpenAI(
base_url="https://api.gravixlayer.com/v1/inference",
api_key=os.environ.get("GRAVIXLAYER_API_KEY"),
)

stream = client.chat.completions.create(
model="llama3.1:8b-instruct-fp16",
messages=[{"role": "user", "content": "Write a short poem about the ocean."}],
stream=True,
)

for chunk in stream:
if chunk.choices and len(chunk.choices) > 0 and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

print() # For a final newline

Note on Terminal Display: When testing streaming in different terminal environments, you may notice that text appears to arrive in visual "blocks" rather than character-by-character. This is due to terminal buffering and rendering optimizations, not an issue with the streaming functionality. The API is still delivering content in real-time chunks (typically 100+ per response). To verify streaming is working, you can add debug output to count chunks as they arrive.


Choosing a Model

You can select any of our supported chat models by specifying its identifier in the model parameter. For most use cases, we recommend starting with llama3.1:8b-instruct-fp16 as it provides an excellent balance of performance, speed, and cost.

For a complete and up-to-date list of all available chat models and their capabilities, please refer to our main Models Page.


Next Steps

  • Explore Advanced Features: Learn about Structured Outputs to get structured responses from the model.
  • API Reference: Dive into all available parameters in the Chat API Reference.