Hyperion Documentation
The Operating System
for your LLM stack.
Hyperion is a production-grade AI gateway designed to unify, cache, and secure your infrastructure. A microscopic, zero-overhead proxy sitting between your application and your intelligence providers.
Multi-Layer Caching
Exact and semantic vector caching to reduce latency by up to 90% and save API costs.
Smart Routing
Intent-based routing picking the right model automatically.
Go Architecture
5µs median latency. 20k RPS. Zero GC pauses.
Observability
Deep telemetry and spend tracking across all models via the Admin API.
Resilience
Automatic fallbacks and retries for zero-downtime AI.
Native SDKs
High-performance Python and TypeScript wrappers.
Get Started
Quick Start Guide
Deploy locally in under two minutes.
Deep Dive
Performance Benchmarks
See why Hyperion hits 20k RPS at 5µs.