Hyperion Documentation

The Operating System
for your LLM stack.

Hyperion is a production-grade AI gateway designed to unify, cache, and secure your infrastructure. A microscopic, zero-overhead proxy sitting between your application and your intelligence providers.

Multi-Layer Caching

Exact and semantic vector caching to reduce latency by up to 90% and save API costs.

Smart Routing

Intent-based routing picking the right model automatically.

Go Architecture

5µs median latency. 20k RPS. Zero GC pauses.

Observability

Deep telemetry and spend tracking across all models via the Admin API.

Resilience

Automatic fallbacks and retries for zero-downtime AI.

Native SDKs

High-performance Python and TypeScript wrappers.

Get Started

Quick Start Guide

Deploy locally in under two minutes.

Deep Dive

Performance Benchmarks

See why Hyperion hits 20k RPS at 5µs.