Documentation
¶
Overview ¶
Recipe 09: Speculative Decoding
Use a small "draft" model to propose tokens that the large "target" model verifies in parallel. When the draft model guesses correctly (which happens often), multiple tokens are accepted per forward pass, significantly increasing throughput.
Requirements:
- A large target model (e.g. Llama 3 8B)
- A small draft model of the same family (e.g. Llama 3 1B)
Usage:
go run ./docs/cookbook/09-speculative-decoding/ \
--target path/to/llama-8b.gguf \
--draft path/to/llama-1b.gguf
Click to show internal directories.
Click to hide internal directories.