Vision
Vision-language models (VLMs) can process both text and images in a single request, enabling you to ask questions about visual content or get descriptions of images. Common use-cases include image captioning, visual question answering, document analysis, chart interpretation, OCR, and content moderation.
This guide shows you how to use Gravix Layer VLMs through our API to analyze images alongside text prompts.
Query Models with an HTTP URL Image
You can provide images to vision models using publicly accessible HTTP URLs.
- cURL
- Python - OpenAI
- Python - Gravix SDK
- JavaScript
curl -X POST https://api.gravixlayer.com/v1/inference/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
-d '{
"model": "google/gemma-3-12b-it",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Can you describe this image?" },
{ "type": "image_url", "image_url": { "url": "https://images.unsplash.com/photo-1720884413532-59289875c3e1?q=80&w=3024&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D" } }
]
}
]
}'
import json
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.gravixlayer.com/v1/inference",
api_key=os.environ.get("GRAVIXLAYER_API_KEY"),
)
response = client.chat.completions.create(
model="google/gemma-3-12b-it",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Can you describe this image?"},
{"type": "image_url", "image_url": {"url": "https://images.unsplash.com/photo-1720884413532-59289875c3e1?q=80&w=3024&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"}}
],
}],
)
print(response.choices[0].message.content)
import os
from gravixlayer import GravixLayer
# Make sure to export your API key in the environment
# export GRAVIXLAYER_API_KEY=your_api_key_here
client = GravixLayer()
response = client.chat.completions.create(
model="google/gemma-3-12b-it",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Can you describe this image?"},
{"type": "image_url", "image_url": {"url": "https://images.unsplash.com/photo-1720884413532-59289875c3e1?q=80&w=3024&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"}}
]
}],
)
print(response.choices[0].message.content)
import { OpenAI } from 'openai';
const client = new OpenAI({
baseURL: "https://api.gravixlayer.com/v1/inference",
apiKey: process.env.GRAVIXLAYER_API_KEY,
});
const response = await client.chat.completions.create({
model: "google/gemma-3-12b-it",
messages: [{
role: "user",
content: [{
type: "text",
text: "Can you describe this image?",
}, {
type: "image_url",
image_url: {
url: "https://images.unsplash.com/photo-1720884413532-59289875c3e1?q=80&w=3024&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"
},
}],
}],
});
console.log(response.choices[0].message.content);
Response
The image depicts a natural landscape featuring a serene waterfall and a bright, inviting aquamarine pool. Water cascades down from a rocky cliff, creating a beautiful contrast between the crystal-clear water and the rugged rocks. The pool is surrounded by large, stone-like rocks of various shapes and sizes, adding to the natural beauty of the scene. In the background, there are lush green trees and bushes, with a clear blue sky overhead, suggesting a pleasant, sunny day. The overall ambiance of the image is tranquil and picturesque, offering a stark contrast to the challenging terrain of the waterfall.
Query Models with a Local Image
If you want to query models with a local image, you can also provide the string representation of the base64 encoding of the images, prefixed with MIME types.
- cURL
- Python - OpenAI
- Python - Gravix SDK
- JavaScript
# First, encode your image to base64
# BASE64_IMAGE=$(base64 -i "/path/to/your/image.jpg")
curl -X POST "https://api.gravixlayer.com/v1/inference/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
-d '{
"model": "google/gemma-3-12b-it",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "what is in the image"},
{"type": "image_url", "image_url": {"url": "_IMAGE"}}
]
}
],
"stream": true
}'
import os
import base64
from openai import OpenAI
client = OpenAI(
base_url="https://api.gravixlayer.com/v1/inference",
api_key=os.environ.get("GRAVIXLAYER_API_KEY"),
)
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
image_path = "/path/to/your/image.jpg"
base64_image = encode_image(image_path)
stream = client.chat.completions.create(
model="google/gemma-3-12b-it",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "what is in the image"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
],
}],
stream=True,
)
for chunk in stream:
if chunk.choices and len(chunk.choices) > 0 and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print() # For a final newline
import os
import base64
from gravixlayer import GravixLayer
# Make sure to export your API key in the environment
# export GRAVIXLAYER_API_KEY=your_api_key_here
client = GravixLayer()
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
image_path = "/path/to/your/image.jpg"
base64_image = encode_image(image_path)
stream = client.chat.completions.create(
model="google/gemma-3-12b-it",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "what is in the image"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
],
}],
stream=True,
)
for chunk in stream:
if chunk.choices and len(chunk.choices) > 0 and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print() # For a final newline
import { OpenAI } from 'openai';
import fs from 'fs';
const client = new OpenAI({
baseURL: "https://api.gravixlayer.com/v1/inference",
apiKey: process.env.GRAVIXLAYER_API_KEY,
});
const getDescriptionPrompt = "what is in the image";
const imagePath = "/path/to/your/image.jpg";
function encodeImage(imagePath) {
const imageBuffer = fs.readFileSync(imagePath);
return imageBuffer.toString('base64');
}
const base64Image = encodeImage(imagePath);
const stream = await client.chat.completions.create({
model: "google/gemma-3-12b-it",
messages: [
{
role: "user",
content: [
{ type: "text", text: getDescriptionPrompt },
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${base64Image}`,
},
},
],
}
],
stream: true,
});
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
Response
The image shows a beautiful and serene natural landscape. In the foreground, there is a clear, turquoise blue pool of water, into which a waterfall cascades from a height. The waterfall is surrounded by large, moss-covered rocks, and the sound of the water creates a tranquil atmosphere. Lush green trees and plants frame the scene, and the sunlight filters through the leaves, casting a dappled light on the ground. The sky is a bright blue with a few fluffy clouds, completing the picture of a perfect, peaceful day in nature.
Next Steps
- Explore Other Features: Learn about Function Calling to integrate vision models with external tools.