Skip to main content

Vision

Vision-language models (VLMs) can process both text and images in a single request, enabling you to ask questions about visual content or get descriptions of images. Common use-cases include image captioning, visual question answering, document analysis, chart interpretation, OCR, and content moderation.

This guide shows you how to use Gravix Layer VLMs through our API to analyze images alongside text prompts.


Query Models with an HTTP URL Image

You can provide images to vision models using publicly accessible HTTP URLs.

curl -X POST https://api.gravixlayer.com/v1/inference/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
-d '{
"model": "google/gemma-3-12b-it",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Can you describe this image?" },
{ "type": "image_url", "image_url": { "url": "https://images.unsplash.com/photo-1720884413532-59289875c3e1?q=80&w=3024&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D" } }
]
}
]
}'

Response

The image depicts a natural landscape featuring a serene waterfall and a bright, inviting aquamarine pool. Water cascades down from a rocky cliff, creating a beautiful contrast between the crystal-clear water and the rugged rocks. The pool is surrounded by large, stone-like rocks of various shapes and sizes, adding to the natural beauty of the scene. In the background, there are lush green trees and bushes, with a clear blue sky overhead, suggesting a pleasant, sunny day. The overall ambiance of the image is tranquil and picturesque, offering a stark contrast to the challenging terrain of the waterfall.

Query Models with a Local Image

If you want to query models with a local image, you can also provide the string representation of the base64 encoding of the images, prefixed with MIME types.

# First, encode your image to base64
# BASE64_IMAGE=$(base64 -i "/path/to/your/image.jpg")

curl -X POST "https://api.gravixlayer.com/v1/inference/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GRAVIXLAYER_API_KEY" \
-d '{
"model": "google/gemma-3-12b-it",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "what is in the image"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,BASE64_IMAGE"}}
]
}
],
"stream": true
}'

Response

The image shows a beautiful and serene natural landscape. In the foreground, there is a clear, turquoise blue pool of water, into which a waterfall cascades from a height. The waterfall is surrounded by large, moss-covered rocks, and the sound of the water creates a tranquil atmosphere. Lush green trees and plants frame the scene, and the sunlight filters through the leaves, casting a dappled light on the ground. The sky is a bright blue with a few fluffy clouds, completing the picture of a perfect, peaceful day in nature.

Next Steps

  • Explore Other Features: Learn about Function Calling to integrate vision models with external tools.