Vision

Vision-language models (VLMs) can process both text and images in a single request, enabling you to ask questions about visual content or get descriptions of images. Common use-cases include image captioning, visual question answering, document analysis, chart interpretation, OCR, and content moderation.

This guide shows you how to use Gravix Layer VLMs through our API to analyze images alongside text prompts.

Supported Models

gemma3:12b (Gemma-3-12B-IT)
qwen2.5vl:7b (Qwen2.5-VL-7B-Instruct)

Query Models with an HTTP URL Image

You can provide images to vision models using publicly accessible HTTP URLs.

cURL
Python
JavaScript

curl -X POST https://api.gravixlayer.com/v1/inference/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
  -d '{
    "model": "gemma3:12b",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Can you describe this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://images.unsplash.com/photo-1720884413532-59289875c3e1?q=80&w=3024&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"
            }
          }
        ]
      }
    ]
  }'

import json
import os
from openai import OpenAI

# Initialize the OpenAI client
client = OpenAI(
    base_url="https://api.gravixlayer.com/v1/inference",
    api_key=os.environ.get("GRAVIXLAYER_API_KEY"),
)

response = client.chat.completions.create(
    model="gemma3:12b",
    messages=[{
        "role": "user",
        "content": [{
            "type": "text",
            "text": "Can you describe this image?",
        }, {
            "type": "image_url",
            "image_url": {
                "url": "https://images.unsplash.com/photo-1720884413532-59289875c3e1?q=80&w=3024&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"
            },
        }],
    }],
)
print(response.choices[0].message.content)

import { OpenAI } from 'openai';

const client = new OpenAI({
    baseURL: "https://api.gravixlayer.com/v1/inference",
    apiKey: process.env.GRAVIXLAYER_API_KEY,
});

const response = await client.chat.completions.create({
    model: "gemma3:12b",
    messages: [{
        role: "user",
        content: [{
            type: "text",
            text: "Can you describe this image?",
        }, {
            type: "image_url",
            image_url: {
                url: "https://images.unsplash.com/photo-1720884413532-59289875c3e1?q=80&w=3024&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"
            },
        }],
    }],
});

console.log(response.choices[0].message.content);

Response

The image depicts a natural landscape featuring a serene waterfall and a bright, inviting aquamarine pool. Water cascades down from a rocky cliff, creating a beautiful contrast between the crystal-clear water and the rugged rocks. The pool is surrounded by large, stone-like rocks of various shapes and sizes, adding to the natural beauty of the scene. In the background, there are lush green trees and bushes, with a clear blue sky overhead, suggesting a pleasant, sunny day. The overall ambiance of the image is tranquil and picturesque, offering a stark contrast to the challenging terrain of the waterfall.

Query Models with a Local Image

If you want to query models with a local image, you can also provide the string representation of the base64 encoding of the images, prefixed with MIME types.

cURL
Python
JavaScript

# First, encode your image to base64
# BASE64_IMAGE=$(base64 -i "/path/to/your/image.jpg")

# Make the API call with clean content output
curl -X POST "https://api.gravixlayer.com/v1/inference/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
  -d "{
    \"model\": \"qwen2.5vl:7b\",
    \"messages\": [
      {
        \"role\": \"user\",
        \"content\": [
          {
            \"type\": \"text\",
            \"text\": \"what is in the image\"
          },
          {
            \"type\": \"image_url\",
            \"image_url\": {
              \"url\": \"data:image/jpeg;base64,$BASE64_IMAGE\"
            }
          }
        ]
      }
    ],
    \"stream\": false
  }" \
  --max-time 60 \
  --retry 2 | jq -r '.choices[0].message.content'

import json
import os
import base64
from openai import OpenAI

# Initialize the OpenAI client
client = OpenAI(
    base_url="https://api.gravixlayer.com/v1/inference",
    api_key=os.environ.get("GRAVIXLAYER_API_KEY"),
)

getDescriptionPrompt = "what is in the image"
imagePath = "/path/to/your/image.jpg"

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

base64_image = encode_image(imagePath)

stream = client.chat.completions.create(
    model="qwen2.5vl:7b",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": getDescriptionPrompt},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "" if chunk.choices else "", end="", flush=True)

import { OpenAI } from 'openai';
import fs from 'fs';

const client = new OpenAI({
    baseURL: "https://api.gravixlayer.com/v1/inference",
    apiKey: process.env.GRAVIXLAYER_API_KEY,
});

const getDescriptionPrompt = "what is in the image";
const imagePath = "/path/to/your/image.jpg";

function encodeImage(imagePath) {
    const imageBuffer = fs.readFileSync(imagePath);
    return imageBuffer.toString('base64');
}

const base64Image = encodeImage(imagePath);

const stream = await client.chat.completions.create({
    model: "qwen2.5vl:7b",
    messages: [
        {
            role: "user",
            content: [
                { type: "text", text: getDescriptionPrompt },
                {
                    type: "image_url",
                    image_url: {
                        url: `data:image/jpeg;base64,${base64Image}`,
                    },
                },
            ],
        }
    ],
    stream: true,
});

for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
        process.stdout.write(chunk.choices[0].delta.content);
    }
}

Response

The image depicts a serene natural landscape featuring a small, cascading waterfall that flows into a picturesque pool of water. The water in the pool is a vibrant turquoise blue, contrasting beautifully with the surrounding environment. The surrounding area is lush with green trees and vegetation, and there are large rocks and boulders scattered around the pool and along the sides of the waterfall.

The sky above is clear and blue, indicating a bright, sunny day. The overall scenery is peaceful and idyllic, suggesting a tranquil setting in a natural, possibly forested, location.

Next Steps

Explore Other Features: Learn about Function Calling to integrate vision models with external tools.
API Reference: For detailed parameter information, see the Chat Completions API Reference.

Supported Models​

Query Models with an HTTP URL Image​

Response​

Query Models with a Local Image​

Response​

Next Steps​

Supported Models

Query Models with an HTTP URL Image

Response

Query Models with a Local Image

Response

Next Steps