09-speculative-decoding

command
v1.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Overview

Recipe 09: Speculative Decoding

Use a small "draft" model to propose tokens that the large "target" model verifies in parallel. When the draft model guesses correctly (which happens often), multiple tokens are accepted per forward pass, significantly increasing throughput.

Requirements:

  • A large target model (e.g. Llama 3 8B)
  • A small draft model of the same family (e.g. Llama 3 1B)

Usage:

go run ./docs/cookbook/09-speculative-decoding/ \
    --target path/to/llama-8b.gguf \
    --draft path/to/llama-1b.gguf

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL