Performance
Benchmark Results
These measurements isolate gateway overhead from LLM generation time. Benchmarks run against a mock upstream so results reflect request parsing, policy checks, routing, and response handling inside Hyperion.
Best Throughput
30,063 RPS
Observed at concurrency 100.
Best P99
0.38 ms
Observed at concurrency 1.
Golden Ratio
25,667 RPS
With 1.72 ms p99 at concurrency 10.
Overhead Floor
5.88 us
Average gateway overhead.
Methodology
- `cbenchmark` runs inside Docker to reduce host-side network variance.
- Mock upstream returns instantly so provider latency does not pollute gateway metrics.
- `0% cache hit ratio` forces full request processing path for honest overhead measurement.
- All scenarios use fixed request count with varied concurrency to expose queueing behavior.
Scenario Matrix
| Profile | Concurrency | Throughput | P99 | Overhead Avg | Overhead P95 | Notes |
|---|---|---|---|---|---|---|
| Sequential | 1 | 6,191 RPS | 0.38 ms | 5.88 us | 11 us | Lowest scheduler contention; baseline for raw gateway cost. |
| Golden Ratio | 10 | 25,667 RPS | 1.72 ms | 10.60 us | 14 us | Best throughput/latency balance for sustained traffic. |
| High Concurrency | 100 | 30,063 RPS | 13.76 ms | 101.81 us | N/A | Throughput ceiling mode; tail latency rises under queue pressure. |
Next
Run Your Own Benchmarks
Reproduce the same benchmark matrix on your hardware.
Explore
Implementation Comparison
Understand runtime tradeoffs and where latency accumulates.