Documentation
¶
Overview ¶
Command bench_spec benchmarks speculative decoding speedup by comparing standalone target model decode against speculative decode (target + draft).
Usage:
bench_spec --model-target /path/to/27B.gguf --model-draft /path/to/1B.gguf [--tokens 200] [--prompts 10] [--backend cuda] [--warmup 2] [--draft-len 4]
Click to show internal directories.
Click to hide internal directories.