Core Capabilities
Gravix AI Inference is built around two deployment models and five core capabilities, giving you complete control over your AI workloads:- Serverless: Pay-as-you-go with automatic scaling. Perfect for development, prototyping, and variable workloads with built-in rate limiting and cost optimization.
- Dedicated: Private GPU instances with guaranteed performance and predictable costs. Ideal for production workloads requiring consistent latency and throughput.
Common Use Cases
AI Inference powers a wide range of intelligent applications:- Conversational AI: Build chatbots, virtual assistants, and multi-turn dialogue systems with advanced language understanding and generation capabilities.
- Content Generation: Create articles, summaries, code, and creative content with fine-tuned control over style, format, and structure.
- Semantic Search & RAG: Generate embeddings for vector databases and implement retrieval-augmented generation for knowledge-based applications.
Unified API Features
All deployment models support the same powerful capabilities:| Feature | Description | Use Cases |
|---|---|---|
| Chat | Multi-turn conversations | Chatbots, assistants, dialogue systems |
| Embeddings | High-dimensional vectors | Search, RAG, clustering, similarity |
| Vision | Image analysis | Multimodal apps, content moderation |
| Structured Outputs | JSON schema enforcement | API integrations, data extraction |
| Function Calling | External tool integration | Agents, workflow automation |
Getting Started in 3 Steps
Building with Gravix AI Inference is straightforward:- Choose Deployment: Select serverless for flexibility or dedicated for guaranteed performance based on your requirements.
- Select Model: Pick from our curated collection of state-of-the-art models optimized for different tasks and use cases.
- Make Requests: Use our OpenAI-compatible API to integrate AI capabilities into your applications seamlessly.
- cURL
- Python - OpenAI
- Python - Gravix SDK
- JavaScript - OpenAI
- JavaScript - Gravix SDK
Quick Access
Serverless
Pay-as-you-go with automatic scaling for development and variable workloads
Dedicated
Private GPU instances with guaranteed performance for production workloads

