Documentation
¶
Overview ¶
Command bench_batch benchmarks continuous batching vs session pool throughput.
Continuous batching dynamically batches decode steps from multiple concurrent sessions into a single forward pass, amortizing GPU kernel launch and memory transfer overhead. The session pool baseline runs each session independently, serialized on the shared graph mutex.
Usage:
bench_batch --model /path/to/model.gguf [--sessions 8] [--tokens 128] [--backend cuda] [--warmup 2]
Click to show internal directories.
Click to hide internal directories.