Deployment Models
Serverless
Pay-as-you-go with automatic scaling. Ideal for development and low-traffic applications.
Dedicated
Private GPU instances with guaranteed performance. Perfect for production workloads.
Unified Capabilities
All endpoints, whether Serverless or Dedicated, support a unified set of powerful features:- Chat: Build stateful, multi-turn conversational applications.
- Embeddings: Generate high-dimensional vector embeddings for RAG, semantic search, and clustering.
- Vision: Analyze and interpret image inputs in multimodal applications.
- Structured Outputs: Enforce valid JSON or other structured data formats for reliable API integrations.
- Function Calling: Enable models to interact with external tools and APIs to execute complex tasks.

