Skip to main content
Access GravixLayer’s powerful AI capabilities directly from your terminal with comprehensive command-line tools and options. Use the command line interface:
CLI Structure: All inference operations now use the chat subcommand: gravixlayer chat [options]

Basic Commands

# Chat completion
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Hello!"

# Text completion
gravixlayer chat --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "The future of AI is"

# Chat with system message
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --system "You are a helpful assistant" --user "Explain AI"

# Streaming chat
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Tell a story" --stream

# Streaming completion
gravixlayer chat --mode completions --model "meta-llama/llama-3.1-8b-instruct" --prompt "Write a poem" --stream

# Generate embeddings
gravixlayer chat --mode embeddings --model "text-embedding-ada-002" --text "Hello world"

CLI Options

Main Commands:
  • chat: Chat and completion operations
  • deployments: Deployment management
  • files: File management
  • vectors: Vector database operations
Chat Command Options:
  • --mode: Operation mode (chat, completions, embeddings)
  • --model: Model to use
  • --user: User message for chat mode
  • --prompt: Prompt for completions mode
  • --text: Text for embeddings mode
  • --system: System message for chat mode
  • --stream: Enable streaming output
  • --max-tokens: Maximum tokens to generate
  • --temperature: Sampling temperature (0.0-2.0)
  • --top-p: Nucleus sampling parameter
  • --frequency-penalty: Frequency penalty (-2.0 to 2.0)
  • --presence-penalty: Presence penalty (-2.0 to 2.0)

Advanced Parameters

Temperature Control

# More creative responses (higher temperature)
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Write a creative story" --temperature 1.2

# More focused responses (lower temperature)
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Explain quantum physics" --temperature 0.3

Token Limits

# Limit response length
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Summarize AI" --max-tokens 50

# Longer responses
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Write a detailed essay" --max-tokens 500

Penalty Parameters

# Reduce repetition
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Tell me about space" --frequency-penalty 0.5

# Encourage diverse topics
gravixlayer chat --model "meta-llama/llama-3.1-8b-instruct" --user "Discuss technology" --presence-penalty 0.3
I