Architecture

Request Lifecycle

Hyperion is a cache-first gateway. Each request goes through auth, policy checks, cache lookup, routing, and provider execution. This page defines the runtime flow and the purpose of each stage.

The Hyperion Pipeline

Every request entering the Hyperion gateway is processed through a strictly defined execution pipeline. Designed in Go, this pipeline uses non-blocking I/O to ensure that security, caching, and routing checks add negligible overhead.

Identity & Context

API keys are validated, and the tenant context (org, team, user) is resolved from the database.

Policy Enforcement

Rate limits and budget quotas are checked in Redis. Requests exceeding limits are rejected immediately.

Multi-Tier Cache Lookup

L1 exact match is performed. If missed, and L2 is enabled, a vector similarity search is executed.

Routing Decision

The router selects the optimal provider/model combination based on intent or explicit overrides.

Execution & Stream

The request is dispatched to the upstream provider. Response chunks are streamed back to the client.

Audit & Writeback

Metrics are emitted to the analytics engine, and the response is stored in the cache if eligible.

L1 Exact Cache

Deterministic hash-based cache for identical prompts and parameters. Lowest latency path and default cache tier.

L2 Semantic Cache

Similarity-based cache for semantically close prompts. Optional tier designed for higher hit rates with controlled precision.

Operational Notes

Use explicit provider/model for deterministic behavior in production flows.
Use `model=auto` only when smart-routing policy is configured for your org.
Profile latency using debug profiling headers to identify slow stages.
Prefer streaming for user-facing chat to improve perceived latency.

Execution Metadata

These diagnostic HTTP headers are returned directly by the gateway.

Gateway Headers

Diagnostic headers starting with X-Hyperion- or X-Cache- returned directly by the gateway.

route_intent

string

Router intent classification for the request.

cheap_fastbalancedhigh_reasoning

route_decision

string

Final route decision mode used by the gateway.

cache_status

string

Cache result for the request path.

HITMISSBYPASS

Caching Internals

Exact vs semantic cache behavior, TTL, and metadata.

Back

Quick Start

Set up a local gateway and make your first request.