Documentation
¶
Overview ¶
Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp. It reads metadata key-value pairs and tensor descriptors from a GGUF file without loading tensor data into memory.
Index ¶
- Constants
- func ExtractTokenizer(f *File) (*tokenizer.BPETokenizer, error)
- func LoadTensors(f *File, r io.ReadSeeker) (map[string]*tensor.TensorNumeric[float32], error)
- func MapTensorName(arch string, ggufName string) string
- func QuantizeToFP8E4M3(tensors map[string]*tensor.TensorNumeric[float32]) (map[string]*tensor.TensorNumeric[float32], error)
- type File
- type GGMLType
- type ModelConfig
- type TensorInfo
Constants ¶
const ( TypeUint8 uint32 = 0 TypeInt8 uint32 = 1 TypeUint16 uint32 = 2 TypeInt16 uint32 = 3 TypeUint32 uint32 = 4 TypeInt32 uint32 = 5 TypeFloat32 uint32 = 6 TypeBool uint32 = 7 TypeString uint32 = 8 TypeArray uint32 = 9 TypeUint64 uint32 = 10 TypeInt64 uint32 = 11 TypeFloat64 uint32 = 12 )
GGUF metadata value types.
const Magic uint32 = 0x46554747 // "GGUF" in little-endian
Magic is the GGUF file magic number ("GGUF" in little-endian).
Variables ¶
This section is empty.
Functions ¶
func ExtractTokenizer ¶
func ExtractTokenizer(f *File) (*tokenizer.BPETokenizer, error)
ExtractTokenizer builds a BPETokenizer from GGUF metadata. GGUF files store tokenizer data under the "tokenizer.ggml.*" metadata keys.
func LoadTensors ¶
func LoadTensors(f *File, r io.ReadSeeker) (map[string]*tensor.TensorNumeric[float32], error)
LoadTensors reads tensor data from a parsed GGUF file and returns them as float32 tensors keyed by name. Quantized tensors (Q4_0, Q8_0) are stored using their native quantized storage types for memory efficiency.
func MapTensorName ¶
MapTensorName converts a GGUF tensor name to the Zerfoo/HuggingFace canonical name. The arch parameter selects architecture-specific name mappings (e.g., "gemma3" uses different norm names than "llama"). Unknown names pass through unchanged.
func QuantizeToFP8E4M3 ¶
func QuantizeToFP8E4M3(tensors map[string]*tensor.TensorNumeric[float32]) (map[string]*tensor.TensorNumeric[float32], error)
QuantizeToFP8E4M3 converts all tensors in the map from their current storage to FP8 E4M3 format with per-tensor absmax scaling. This reduces memory to 1 byte per element (1/4 of F32) at the cost of reduced precision. The tensors are modified in place — the returned map is the same object.
Types ¶
type File ¶
type File struct {
Version uint32
Metadata map[string]any
Tensors []TensorInfo
DataOffset int64 // byte offset where tensor data begins
}
File represents a parsed GGUF file.
func Parse ¶
func Parse(r io.ReadSeeker) (*File, error)
Parse reads a GGUF file header, metadata, and tensor info from r. It does not read tensor data. The returned File.DataOffset indicates where tensor data begins in the file.
func (*File) GetFloat32 ¶
GetFloat32 returns a metadata float32 value.
type GGMLType ¶
type GGMLType uint32
GGMLType identifies the quantization type of a tensor.
const ( GGMLTypeF32 GGMLType = 0 GGMLTypeF16 GGMLType = 1 GGMLTypeQ4_0 GGMLType = 2 GGMLTypeQ4_1 GGMLType = 3 GGMLTypeQ5_0 GGMLType = 6 GGMLTypeQ5_1 GGMLType = 7 GGMLTypeQ8_0 GGMLType = 8 GGMLTypeQ8_1 GGMLType = 9 GGMLTypeQ2_K GGMLType = 10 GGMLTypeQ3_K GGMLType = 11 GGMLTypeQ4_K GGMLType = 12 GGMLTypeQ5_K GGMLType = 13 GGMLTypeQ6_K GGMLType = 14 GGMLTypeQ8_K GGMLType = 15 GGMLTypeBF16 GGMLType = 30 )
Common GGML tensor types.
type ModelConfig ¶
type ModelConfig struct {
Architecture string
Name string
VocabSize int
HiddenSize int
NumLayers int
NumHeads int
NumKVHeads int
IntermediateSize int
MaxSeqLen int
RopeTheta float64
HeadDim int // explicit head dimension (0 = use HiddenSize/NumHeads)
LogitSoftcap float32 // if > 0, apply logit softcapping: cap * tanh(logit/cap)
LocalRopeTheta float64 // RoPE base for local/sliding-window layers (0 = use RopeTheta)
SlidingWindow int // sliding window size for local attention layers
SlidingWindowPattern int // every Nth layer is global (0 = all global)
RMSNormEps float32 // RMSNorm epsilon (0 = use default 1e-5)
}
ModelConfig holds model configuration extracted from GGUF metadata.
func ExtractModelConfig ¶
func ExtractModelConfig(f *File) (*ModelConfig, error)
ExtractModelConfig reads GGUF metadata and returns a ModelConfig. The architecture field (general.architecture) determines which metadata key prefix to use (e.g., "llama." or "gemma.").