Request Lifecycle
Hyperion is a cache-first gateway. Each request goes through auth, policy checks, cache lookup, routing, and provider execution. This page defines the runtime flow and the purpose of each stage.
The Hyperion Pipeline
Every request entering the Hyperion gateway is processed through a strictly defined execution pipeline. Designed in Go, this pipeline uses non-blocking I/O to ensure that security, caching, and routing checks add negligible overhead.
Identity & Context
API keys are validated, and the tenant context (org, team, user) is resolved from the database.
Policy Enforcement
Rate limits and budget quotas are checked in Redis. Requests exceeding limits are rejected immediately.
Multi-Tier Cache Lookup
L1 exact match is performed. If missed, and L2 is enabled, a vector similarity search is executed.
Routing Decision
The router selects the optimal provider/model combination based on intent or explicit overrides.
Execution & Stream
The request is dispatched to the upstream provider. Response chunks are streamed back to the client.
Audit & Writeback
Metrics are emitted to the analytics engine, and the response is stored in the cache if eligible.
L1 Exact Cache
Deterministic hash-based cache for identical prompts and parameters. Lowest latency path and default cache tier.
L2 Semantic Cache
Similarity-based cache for semantically close prompts. Optional tier designed for higher hit rates with controlled precision.
Operational Notes
- Use explicit provider/model for deterministic behavior in production flows.
- Use `model=auto` only when smart-routing policy is configured for your org.
- Profile latency using debug profiling headers to identify slow stages.
- Prefer streaming for user-facing chat to improve perceived latency.
Execution Metadata
These diagnostic HTTP headers are returned directly by the gateway.
Gateway Headers
Router intent classification for the request.
Final route decision mode used by the gateway.
Cache result for the request path.
Exact vs semantic cache behavior, TTL, and metadata.
Set up a local gateway and make your first request.