Performance
Run Your Own Benchmarks
Reproduce official benchmark claims with the same toolchain used in the docs. This flow is designed to isolate gateway overhead from provider generation latency.
Execution Model
- Benchmark client runs in-container to reduce host networking variance.
- Mock upstream returns fixed responses immediately.
- Use unique prompts and `0%` cache hits when measuring pure gateway tax.
1. Start Services
Terminal
# project root
docker compose up -d
docker compose up -d mock-openai2. Run Baseline Script
Terminal
cd gateway/tools
# Usage: ./benchmark.sh [TOTAL_REQUESTS] [CONCURRENCY]
./benchmark.sh 10000 103. Run Scenario Matrix
Terminal
# Baseline (single worker)
./benchmark.sh 10000 1
# Golden ratio
./benchmark.sh 10000 10
# Throughput stress
./benchmark.sh 50000 1004. Interpret Output
Output
--- Gateway Overhead (dispatch-json_parse-json_marshal) ---
Average: 10.6045us
Median: 5.0000us
p95: 14.0000us
p99: 125.0000usAverage / Median: steady-state gateway overhead.
P95 / P99: tail behavior under scheduler and queue pressure.
Compare with RTT: overhead should remain a small subset of total request latency.
Troubleshooting
- `401/403`: API key missing, expired, or not scoped to the benchmark tenant.
- `429`: rate limit path still active for your test key or tenant.
- `502`: upstream mock unavailable or circuit breaker opened.
- High variance: run benchmark multiple times and compare median/p95, not only single-run averages.
Back
Official Benchmark Results
Reference matrix and methodology used by docs.
Next
Runtime Comparison
Compare implementation tradeoffs across gateway stacks.