Performance

Benchmark Results

These measurements isolate gateway overhead from LLM generation time. Benchmarks run against a mock upstream so results reflect request parsing, policy checks, routing, and response handling inside Hyperion.

Best Throughput
30,063 RPS

Observed at concurrency 100.

Best P99
0.38 ms

Observed at concurrency 1.

Golden Ratio
25,667 RPS

With 1.72 ms p99 at concurrency 10.

Overhead Floor
5.88 us

Average gateway overhead.

Methodology

  • `cbenchmark` runs inside Docker to reduce host-side network variance.
  • Mock upstream returns instantly so provider latency does not pollute gateway metrics.
  • `0% cache hit ratio` forces full request processing path for honest overhead measurement.
  • All scenarios use fixed request count with varied concurrency to expose queueing behavior.

Scenario Matrix

ProfileConcurrencyThroughputP99Overhead AvgOverhead P95Notes
Sequential16,191 RPS0.38 ms5.88 us11 usLowest scheduler contention; baseline for raw gateway cost.
Golden Ratio1025,667 RPS1.72 ms10.60 us14 usBest throughput/latency balance for sustained traffic.
High Concurrency10030,063 RPS13.76 ms101.81 usN/AThroughput ceiling mode; tail latency rises under queue pressure.
Next
Run Your Own Benchmarks

Reproduce the same benchmark matrix on your hardware.

Explore
Implementation Comparison

Understand runtime tradeoffs and where latency accumulates.

Last updated: Feb 22, 2026