Quickstart
Get started with the Chat API using your favorite language or tool. Below you’ll find minimal working examples in cURL, Python (OpenAI), Python (Gravix SDK), and JavaScript. Before running these, make sure your Gravix Layer API key is set as an environment variable, and the required libraries are installed for your chosen language.- cURL
- Python - OpenAI
- Python - Gravix SDK
- JavaScript
- JavaScript - Gravix SDK
Expected Response
The API returns a completion object. Thecontent of the model’s message will be similar to the following:
Message Structure
Themessages array is the core component of the Chat API, providing the conversational history that enables the model to maintain context. Each object in the array represents a single message and must include a role and content.
There are three distinct roles:
System Role
Thesystem message establishes the model’s behavior, personality, and operational constraints. As the first message in the array, it serves as a high-level instruction that guides the entire conversation.
Best Practice: Use a clear and direct system prompt to define the model’s persona, tone, and operational boundaries.
User Role
Theuser message contains the input from the end-user, such as questions, prompts, or instructions.
Assistant Role
Theassistant message represents a previous response from the model. Including the assistant’s past messages in subsequent requests provides the model with a conversational memory.
Multi-Turn Conversation Example
The following example illustrates how themessages array expands to maintain context across multiple turns:
Streaming Responses
For real-time applications such as chatbots, latency can be minimized by streaming responses as they are generated. By settingstream=True in your request, you can receive the response token-by-token, creating a more responsive user experience.
- cURL
- Python - OpenAI
- Python - Gravix SDK
- JavaScript
- JavaScript - Gravix SDK
Note on Terminal Display: When testing streaming in different terminal environments, you may notice that text appears to arrive in visual “blocks” rather than character-by-character. This is due to terminal buffering and rendering optimizations, not an issue with the streaming functionality. The API is still delivering content in real-time chunks (typically 100+ per response). To verify streaming is working, you can add debug output to count chunks as they arrive.
Choosing a Model
You can select any of our supported chat models by specifying its identifier in themodel parameter. For most use cases, we recommend starting with meta-llama/llama-3.1-8b-instruct as it provides an excellent balance of performance, speed, and cost.
For a complete and up-to-date list of all available chat models and their capabilities, please refer to our main Models Page.
Next Steps
- Explore Advanced Features: Learn about Structured Outputs to get structured responses from the model.

