Performance

Benchmark Results

These measurements isolate gateway overhead from LLM generation time. Benchmarks run against a mock upstream so results reflect request parsing, policy checks, routing, and response handling inside Hyperion.

Best Throughput

30,063 RPS

Observed at concurrency 100.

Best P99

0.38 ms

Observed at concurrency 1.

Golden Ratio

25,667 RPS

With 1.72 ms p99 at concurrency 10.

Overhead Floor

5.88 us

Average gateway overhead.

Methodology

`cbenchmark` runs inside Docker to reduce host-side network variance.
Mock upstream returns instantly so provider latency does not pollute gateway metrics.
`0% cache hit ratio` forces full request processing path for honest overhead measurement.
All scenarios use fixed request count with varied concurrency to expose queueing behavior.

Scenario Matrix

Profile	Concurrency	Throughput	P99	Overhead Avg	Overhead P95	Notes
Sequential	1	6,191 RPS	0.38 ms	5.88 us	11 us	Lowest scheduler contention; baseline for raw gateway cost.
Golden Ratio	10	25,667 RPS	1.72 ms	10.60 us	14 us	Best throughput/latency balance for sustained traffic.
High Concurrency	100	30,063 RPS	13.76 ms	101.81 us	N/A	Throughput ceiling mode; tail latency rises under queue pressure.

Run Your Own Benchmarks

Reproduce the same benchmark matrix on your hardware.

Explore

Implementation Comparison

Understand runtime tradeoffs and where latency accumulates.