Features

Auto Model Selection

Offload the decision-making process directly to Hyperion. Auto Mode dynamically chooses the most appropriate foundational model for each prompt, balancing cost, latency, and required reasoning capabilities across all allowed providers.

How It Works

When you pass auto as the model string in your API request, the Hyperion Intelligence Engine intercepts the prompt before routing it to any provider. The engine performs a microscopic, sub-millisecond structural analysis of the text.

Hyperion evaluates several dimensions simultaneously:
• Prompt Complexity: Is the user asking a simple factual question, or does the prompt involve complex code generation, mathematical derivations, or JSON structuring?
• Context Length: How massive is the input payload? Extremely large contexts require different provider routing to optimize costs.
• Budget Awareness: Tightly integrated with the billing module, Auto Mode will aggressively favor cheaper models if the active tenant is approaching their configured monthly or daily spend limits.

Based on this analysis, simple tasks like summarization are automatically routed to lightning-fast, highly economical models (like Gemini Flash or Claude Haiku). Conversely, complex analytical queries are escalated to frontier models (like GPT-4o or Claude 3.5 Sonnet), ensuring you never overpay for basic compute.

Implementation

Using Auto Model Selection requires zero configuration on the client side. Simply inject auto as the model string. The gateway handles the upstream provider translation transparently and returns the response in standard OpenAI-compatible format.

curl -X POST "https://gateway.hyperionhq.co/v1/chat/completions" \
  -H "Authorization: Bearer $HYPERION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'