bench_batch

command
v1.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 21, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Overview

Command bench_batch benchmarks continuous batching vs session pool throughput.

Continuous batching dynamically batches decode steps from multiple concurrent sessions into a single forward pass, amortizing GPU kernel launch and memory transfer overhead. The session pool baseline runs each session independently, serialized on the shared graph mutex.

Usage:

bench_batch --model /path/to/model.gguf [--sessions 8] [--tokens 128] [--backend cuda] [--warmup 2]

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL