Features

Budgets & Limits

Control your LLM spend with extreme granularity. Hyperion utilizes hyper-fast Lua scripts natively in Redis to ensure race-free, atomic budget enforcement across your entire organization.

01 — GRANULARITY

Multi-Level Scoping

Budgets are not just per-tenant. You can set specific USD spend limits at the Organization, individual User, or explicit API Key level.

02 — ATOMICITY

Race-Free Execution

Cost estimation and budget reservation happen synchronously before the LLM call using Redis Lua scripts, ensuring absolute zero leakage.

03 — ANALYTICS

Real-Time Tracking

The gateway holds a live model pricing catalog, allowing for real-time cost estimation and accurate settlement post-inference.

04 — GOVERNANCE

Alert Tiers & Cutoff

Configure custom threshold alerts (e.g., at 80% and 90% spend). Set an auto-cutoff percentage to definitively halt traffic once limits are breached.

The Hierarchy of Limits

Organization Budgets

The overarching monthly spend limit for all users and keys within the tenant.

User Member Budgets

Cap the inference spending of individual team members or service accounts.

API Key Limits

Enforce strict limits on specific tokens, ideal for exposing keys to external/untrusted clients.

How Settlement Works

Hyperion uses a two-phase commit system for billing. Before an inference request is dispatched, an estimated cost is reserved from the budget based on the model's token limits. Once the provider returns the streamed response, Hyperion calculates the exact token usage and settles the final amount against the budget.