Hyperion routes requests universally using configurable pipeline policies designed around your business constraints: cost, latency, quality, or custom ML classifiers.
Strategic Routing Modes
- Priority Routing: Explicitly prefer one provider or model for an endpoint, with immediate fallback configured if the primary becomes unhealthy.
- Cost-Aware Routing: Dynamically pick a cheaper model if the gateway's ML classifier determines the prompt satisfies an "easy" threshold (e.g., standard summarization vs mathematical reasoning).
- Latency-Aware Routing: Route traffic strictly to the provider responding with the lowest TTFT (Time To First Token) historically over the last 5 minutes.
- Quality Routing: Route traffic based on human-in-the-loop or automated unit test benchmarking for specific prompt variations over time.
- Hybrid Weighted Routing: Combine all above modes into weighted A/B/C testing or gradual rollout policies.
Example Configuration (YAML)
routing:
default:
- case: latency < 100ms and cost < 0.005
model: provider-a/gpt-fast
- case: user_tier == enterprise
model: provider-b/gpt-quality
- else:
model: openai/gpt-4o
failover:
max_retries: 2
backoff: exponentialSmart Classification & Resilience
Hyperion's lightweight ML classifier detects "cheapable" requests instantly. You can route simple CRUD-like AI extraction tasks dynamically without littering your core application codebase with complex logic loops.
On top of this, Hyperion guarantees resilience with automatic active-passive failover. Every provider gets measured health checks (monitoring latency averages and error rates in real-time). You can rotate load across multiple keys to bypass arbitrary provider rate-limits, and configure strict circuit breakers to pause entirely anomalous API traffic originating internally.
Routing FAQs
Through routing policies that evaluate latency, cost, and quality metrics and consult a routing classifier.
Ready to bulletproof your AI stack?
Hyperion provides instant, out-of-the-box active-passive failover and circuit breaking for all major model providers without changing your application code.