Routing & Orchestration
Hyperion's gateway intelligently orchestrates LLM traffic. It evaluates prompt complexity in real-time to optimize model selection, and provides robust fallback mechanisms for seamless high availability.
Smart Routing
By default, Hyperion acts as a strict passthrough proxy. If you request a specific model, the gateway strictly honors that choice. However, by enabling smart_routing in the request body, the gateway's intelligence engine activates.
When enabled, Hyperion analyzes the structural complexity of the prompt—such as detecting code generation, mathematical derivations, or massive contexts length. If the requested model is overkill for a simple task, Hyperion automatically routes the request to a cheaper, faster alternative within the same provider family, saving compute budget without sacrificing quality.
from hyperion import Hyperion
client = Hyperion()
# Setting smart_routing=True enables intelligent prompt evaluation.
# The gateway analyzes complexity in real-time and routes accordingly.
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Analyze this dataset."}],
extra_body={
"smart_routing": True
}
)Auto Model
If your system architecture handles highly diverse user queries and you do not want to pin a specific model at all, you can use the auto model shorthand. Passing auto implicitly enables smart routing across all available providers.
Hyperion will evaluate the prompt against organizational budget constraints and model capabilities, dynamically choosing between providers like OpenAI, Google, or Anthropic to execute the transaction with maximum efficiency.
from hyperion import Hyperion
client = Hyperion()
# 'auto' is a shorthand that inherently triggers smart routing
# across all allowed providers without pinning a specific model.
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What is the capital of France?"}]
)Fallback Execution
Completely separate from optimal model selection, Hyperion offers robust high-availability via Fallbacks. Defined in the native hyperion configuration block, an array of fallback models acts as a safety net.
If the primary model experiences a provider outage, or if executing the primary model would exceed your immediate token budget, Hyperion seamlessly redirects the request down the fallback chain to prevent total system failure and maintain uptime for your end-users.
from hyperion import Hyperion
client = Hyperion()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Generate a report."}],
# Use the hyperion config block to define fallbacks
hyperion={
"fallbacks": ["gpt-4o-mini", "claude-haiku-4-5"]
}
)Model Normalization
Hyperion automatically normalizes shorthand models or generic aliases. For instance, requesting gpt-4o will automatically resolve to the latest available semantic deployment mapped on the provider's end. This ensures your prompts reliably hit the correct upstream model across different environments.