09-speculative-decoding

command

v1.8.0 Latest Latest Go to latest Published: Mar 20, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Documentation ¶

Overview ¶

Recipe 09: Speculative Decoding

Use a small "draft" model to propose tokens that the large "target" model verifies in parallel. When the draft model guesses correctly (which happens often), multiple tokens are accepted per forward pass, significantly increasing throughput.

Requirements:

A large target model (e.g. Llama 3 8B)
A small draft model of the same family (e.g. Llama 3 1B)

Usage:

go run ./docs/cookbook/09-speculative-decoding/ \
    --target path/to/llama-8b.gguf \
    --draft path/to/llama-1b.gguf

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL