Performance

Run Your Own Benchmarks

Reproduce official benchmark claims with the same toolchain used in the docs. This flow is designed to isolate gateway overhead from provider generation latency.

Execution Model

  • Benchmark client runs in-container to reduce host networking variance.
  • Mock upstream returns fixed responses immediately.
  • Use unique prompts and `0%` cache hits when measuring pure gateway tax.

1. Start Services

Terminal
# project root
docker compose up -d
docker compose up -d mock-openai

2. Run Baseline Script

Terminal
cd gateway/tools

# Usage: ./benchmark.sh [TOTAL_REQUESTS] [CONCURRENCY]
./benchmark.sh 10000 10

3. Run Scenario Matrix

Terminal
# Baseline (single worker)
./benchmark.sh 10000 1

# Golden ratio
./benchmark.sh 10000 10

# Throughput stress
./benchmark.sh 50000 100

4. Interpret Output

Output
--- Gateway Overhead (dispatch-json_parse-json_marshal) ---
Average: 10.6045us
Median:  5.0000us
p95:     14.0000us
p99:     125.0000us

Average / Median: steady-state gateway overhead.

P95 / P99: tail behavior under scheduler and queue pressure.

Compare with RTT: overhead should remain a small subset of total request latency.

Troubleshooting

  • `401/403`: API key missing, expired, or not scoped to the benchmark tenant.
  • `429`: rate limit path still active for your test key or tenant.
  • `502`: upstream mock unavailable or circuit breaker opened.
  • High variance: run benchmark multiple times and compare median/p95, not only single-run averages.
Back
Official Benchmark Results

Reference matrix and methodology used by docs.

Next
Runtime Comparison

Compare implementation tradeoffs across gateway stacks.

Last updated: Feb 22, 2026