Hyperion Documentation

The Operating System for your LLM stack.

Hyperion is a production-grade AI gateway designed to unify, cache, and secure your infrastructure. A microscopic, zero-overhead proxy sitting between your application and your intelligence providers.

Multi-Layer Caching

Exact and semantic vector caching to reduce latency by up to 90% and save API costs.

Smart Routing

Intent-based routing picking the right model automatically.

Go Architecture

5µs median latency. 20k RPS. Zero GC pauses.

Observability

Deep telemetry and spend tracking across all models via the Admin API.

Resilience

Automatic fallbacks and retries for zero-downtime AI.

Native SDKs

High-performance Python and TypeScript wrappers.

Get Started
Quick Start Guide

Deploy locally in under two minutes.

Deep Dive
Performance Benchmarks

See why Hyperion hits 20k RPS at 5µs.

Last updated: Feb 22, 2026