Skip to main content
The Gravix Layer Chat API provides a powerful and flexible interface for interacting with open-source language models. It supports a wide range of applications, from single-turn questions and answers to complex, multi-turn conversations that maintain contextual continuity.

Quickstart

Get started with the Chat API using your favorite language or tool. Below you’ll find minimal working examples in cURL, Python (OpenAI), Python (Gravix SDK), and JavaScript. Before running these, make sure your Gravix Layer API key is set as an environment variable, and the required libraries are installed for your chosen language.
  • cURL
  • Python - OpenAI
  • Python - Gravix SDK
  • JavaScript
  • JavaScript - Gravix SDK
curl https://api.gravixlayer.com/v1/chat/completions \
    -H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/llama-3.1-8b-instruct",
        "messages": [
            {"role": "system", "content": "You are a helpful and friendly assistant."},
            {"role": "user", "content": "What are the three most popular programming languages?"}
        ]
    }'

Expected Response

The API returns a completion object. The content of the model’s message will be similar to the following:
Based on recent developer surveys and industry trends from sources like Stack Overflow and GitHub, the three most popular programming languages are typically:

**JavaScript**: It is the backbone of the web and is essential for front-end development. With frameworks like Node.js, it's also a dominant force in back-end development.

**Python**: Known for its simplicity and readability, Python is incredibly versatile. It's widely used in data science, machine learning, web development (with frameworks like Django and Flask), and automation.

**SQL**: While some consider it a query language rather than a general-purpose programming language, SQL is indispensable for managing and interacting with relational databases, making it a critical skill for nearly any developer.

Message Structure

The messages array is the core component of the Chat API, providing the conversational history that enables the model to maintain context. Each object in the array represents a single message and must include a role and content. There are three distinct roles:

System Role

The system message establishes the model’s behavior, personality, and operational constraints. As the first message in the array, it serves as a high-level instruction that guides the entire conversation. Best Practice: Use a clear and direct system prompt to define the model’s persona, tone, and operational boundaries.
{
  "messages": [
    {
      "role": "system", 
      "content": "You are a pirate captain from the 17th century. You speak in a thick pirate accent and are obsessed with finding treasure."
    }
  ]
}

User Role

The user message contains the input from the end-user, such as questions, prompts, or instructions.
{
  "role": "user", 
  "content": "What is the best way to store gold?"
}

Assistant Role

The assistant message represents a previous response from the model. Including the assistant’s past messages in subsequent requests provides the model with a conversational memory.

Multi-Turn Conversation Example

The following example illustrates how the messages array expands to maintain context across multiple turns:
{
  "messages": [
    {
      "role": "system", 
      "content": "You are a helpful assistant."
    },
    // First turn
    {
      "role": "user", 
      "content": "What is the capital of France?"
    },
    {
      "role": "assistant", 
      "content": "The capital of France is Paris."
    },
    // Second turn (current request)
    {
      "role": "user", 
      "content": "And what is its most famous landmark?"
    }
  ]
}
By providing the full conversational history, the model can correctly infer that “its” refers to Paris.

Streaming Responses

For real-time applications such as chatbots, latency can be minimized by streaming responses as they are generated. By setting stream=True in your request, you can receive the response token-by-token, creating a more responsive user experience.
  • cURL
  • Python - OpenAI
  • Python - Gravix SDK
  • JavaScript
  • JavaScript - Gravix SDK
curl https://api.gravixlayer.com/v1/inference/chat/completions \
    -H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/llama-3.1-8b-instruct",
        "messages": [
            {"role": "user", "content": "Write a short poem about the ocean."}
        ],
        "stream": true
    }'
Note on Terminal Display: When testing streaming in different terminal environments, you may notice that text appears to arrive in visual “blocks” rather than character-by-character. This is due to terminal buffering and rendering optimizations, not an issue with the streaming functionality. The API is still delivering content in real-time chunks (typically 100+ per response). To verify streaming is working, you can add debug output to count chunks as they arrive.

Choosing a Model

You can select any of our supported chat models by specifying its identifier in the model parameter. For most use cases, we recommend starting with meta-llama/llama-3.1-8b-instruct as it provides an excellent balance of performance, speed, and cost.
For a complete and up-to-date list of all available chat models and their capabilities, please refer to our main Models Page.

Next Steps

  • Explore Advanced Features: Learn about Structured Outputs to get structured responses from the model.
I