Skip to main content
Gravix Layer’s Inference provides managed endpoints for running and scaling generative AI models. Choose between two deployment models—Serverless and Dedicated—to match your application’s performance, cost, and scalability needs.

Deployment Models

Unified Capabilities

All endpoints, whether Serverless or Dedicated, support a unified set of powerful features:
  • Chat: Build stateful, multi-turn conversational applications.
  • Embeddings: Generate high-dimensional vector embeddings for RAG, semantic search, and clustering.
  • Vision: Analyze and interpret image inputs in multimodal applications.
  • Structured Outputs: Enforce valid JSON or other structured data formats for reliable API integrations.
  • Function Calling: Enable models to interact with external tools and APIs to execute complex tasks.
I