Skip to main content

Vision

Vision-language models (VLMs) can process both text and images in a single request, enabling you to ask questions about visual content or get descriptions of images. Common use-cases include image captioning, visual question answering, document analysis, chart interpretation, OCR, and content moderation.

This guide shows you how to use Gravix Layer VLMs through our API to analyze images alongside text prompts.

Supported Models

  • gemma3:12b (Gemma-3-12B-IT)
  • qwen2.5vl:7b (Qwen2.5-VL-7B-Instruct)

Query Models with an HTTP URL Image

You can provide images to vision models using publicly accessible HTTP URLs.

curl -X POST https://api.gravixlayer.com/v1/inference/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
-d '{
"model": "gemma3:12b",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Can you describe this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://images.unsplash.com/photo-1720884413532-59289875c3e1?q=80&w=3024&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"
}
}
]
}
]
}'

Response

The image depicts a natural landscape featuring a serene waterfall and a bright, inviting aquamarine pool. Water cascades down from a rocky cliff, creating a beautiful contrast between the crystal-clear water and the rugged rocks. The pool is surrounded by large, stone-like rocks of various shapes and sizes, adding to the natural beauty of the scene. In the background, there are lush green trees and bushes, with a clear blue sky overhead, suggesting a pleasant, sunny day. The overall ambiance of the image is tranquil and picturesque, offering a stark contrast to the challenging terrain of the waterfall.

Query Models with a Local Image

If you want to query models with a local image, you can also provide the string representation of the base64 encoding of the images, prefixed with MIME types.

# First, encode your image to base64
# BASE64_IMAGE=$(base64 -i "/path/to/your/image.jpg")

# Make the API call with clean content output
curl -X POST "https://api.gravixlayer.com/v1/inference/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
-d "{
\"model\": \"qwen2.5vl:7b\",
\"messages\": [
{
\"role\": \"user\",
\"content\": [
{
\"type\": \"text\",
\"text\": \"what is in the image\"
},
{
\"type\": \"image_url\",
\"image_url\": {
\"url\": \"data:image/jpeg;base64,$BASE64_IMAGE\"
}
}
]
}
],
\"stream\": false
}" \
--max-time 60 \
--retry 2 | jq -r '.choices[0].message.content'

Response

The image depicts a serene natural landscape featuring a small, cascading waterfall that flows into a picturesque pool of water. The water in the pool is a vibrant turquoise blue, contrasting beautifully with the surrounding environment. The surrounding area is lush with green trees and vegetation, and there are large rocks and boulders scattered around the pool and along the sides of the waterfall.

The sky above is clear and blue, indicating a bright, sunny day. The overall scenery is peaceful and idyllic, suggesting a tranquil setting in a natural, possibly forested, location.

Next Steps