Features
Budgets & Limits
Control your LLM spend with extreme granularity. Hyperion utilizes hyper-fast Lua scripts natively in Redis to ensure race-free, atomic budget enforcement across your entire organization.
01 — GRANULARITY
Multi-Level Scoping
Budgets are not just per-tenant. You can set specific USD spend limits at the Organization, individual User, or explicit API Key level.
02 — ATOMICITY
Race-Free Execution
Cost estimation and budget reservation happen synchronously before the LLM call using Redis Lua scripts, ensuring absolute zero leakage.
03 — ANALYTICS
Real-Time Tracking
The gateway holds a live model pricing catalog, allowing for real-time cost estimation and accurate settlement post-inference.
04 — GOVERNANCE
Alert Tiers & Cutoff
Configure custom threshold alerts (e.g., at 80% and 90% spend). Set an auto-cutoff percentage to definitively halt traffic once limits are breached.
The Hierarchy of Limits
Organization Budgets
The overarching monthly spend limit for all users and keys within the tenant.
User Member Budgets
Cap the inference spending of individual team members or service accounts.
API Key Limits
Enforce strict limits on specific tokens, ideal for exposing keys to external/untrusted clients.
How Settlement Works
Hyperion uses a two-phase commit system for billing. Before an inference request is dispatched, an estimated cost is reserved from the budget based on the model's token limits. Once the provider returns the streamed response, Hyperion calculates the exact token usage and settles the final amount against the budget.