Documentation
¶
Overview ¶
Command bench_disagg benchmarks disaggregated vs collocated serving throughput.
In disaggregated mode, prefill and decode run on separate workers behind a gateway that routes requests via least-loaded scheduling. In collocated mode, a single worker handles both prefill and decode sequentially.
The benchmark measures requests/sec, mean TTFT, and P99 latency for both modes at configurable concurrency levels.
Usage:
bench_disagg [--concurrent 16] [--requests 100] [--tokens 50]
Click to show internal directories.
Click to hide internal directories.