gguf

package
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp. It reads metadata key-value pairs and tensor descriptors from a GGUF file without loading tensor data into memory.

Index

Constants

View Source
const (
	TypeUint8   uint32 = 0
	TypeInt8    uint32 = 1
	TypeUint16  uint32 = 2
	TypeInt16   uint32 = 3
	TypeUint32  uint32 = 4
	TypeInt32   uint32 = 5
	TypeFloat32 uint32 = 6
	TypeBool    uint32 = 7
	TypeString  uint32 = 8
	TypeArray   uint32 = 9
	TypeUint64  uint32 = 10
	TypeInt64   uint32 = 11
	TypeFloat64 uint32 = 12
)

GGUF metadata value types.

View Source
const Magic uint32 = 0x46554747 // "GGUF" in little-endian

Magic is the GGUF file magic number ("GGUF" in little-endian).

Variables

This section is empty.

Functions

func ExtractTokenizer

func ExtractTokenizer(f *File) (*tokenizer.BPETokenizer, error)

ExtractTokenizer builds a BPETokenizer from GGUF metadata. GGUF files store tokenizer data under the "tokenizer.ggml.*" metadata keys.

func LoadTensors

func LoadTensors(f *File, r io.ReadSeeker) (map[string]*tensor.TensorNumeric[float32], error)

LoadTensors reads tensor data from a parsed GGUF file and returns them as float32 tensors keyed by name. Quantized tensors (Q4_0, Q8_0) are stored using their native quantized storage types for memory efficiency.

func MapTensorName

func MapTensorName(arch string, ggufName string) string

MapTensorName converts a GGUF tensor name to the Zerfoo/HuggingFace canonical name. The arch parameter selects architecture-specific name mappings (e.g., "gemma3" uses different norm names than "llama"). Unknown names pass through unchanged.

func QuantizeToFP8E4M3

func QuantizeToFP8E4M3(tensors map[string]*tensor.TensorNumeric[float32]) (map[string]*tensor.TensorNumeric[float32], error)

QuantizeToFP8E4M3 converts all tensors in the map from their current storage to FP8 E4M3 format with per-tensor absmax scaling. This reduces memory to 1 byte per element (1/4 of F32) at the cost of reduced precision. The tensors are modified in place — the returned map is the same object.

Types

type File

type File struct {
	Version    uint32
	Metadata   map[string]any
	Tensors    []TensorInfo
	DataOffset int64 // byte offset where tensor data begins
}

File represents a parsed GGUF file.

func Parse

func Parse(r io.ReadSeeker) (*File, error)

Parse reads a GGUF file header, metadata, and tensor info from r. It does not read tensor data. The returned File.DataOffset indicates where tensor data begins in the file.

func (*File) GetFloat32

func (f *File) GetFloat32(key string) (float32, bool)

GetFloat32 returns a metadata float32 value.

func (*File) GetString

func (f *File) GetString(key string) (string, bool)

GetString returns a metadata string value.

func (*File) GetUint32

func (f *File) GetUint32(key string) (uint32, bool)

GetUint32 returns a metadata uint32 value.

type GGMLType

type GGMLType uint32

GGMLType identifies the quantization type of a tensor.

const (
	GGMLTypeF32  GGMLType = 0
	GGMLTypeF16  GGMLType = 1
	GGMLTypeQ4_0 GGMLType = 2
	GGMLTypeQ4_1 GGMLType = 3
	GGMLTypeQ5_0 GGMLType = 6
	GGMLTypeQ5_1 GGMLType = 7
	GGMLTypeQ8_0 GGMLType = 8
	GGMLTypeQ8_1 GGMLType = 9
	GGMLTypeQ2_K GGMLType = 10
	GGMLTypeQ3_K GGMLType = 11
	GGMLTypeQ4_K GGMLType = 12
	GGMLTypeQ5_K GGMLType = 13
	GGMLTypeQ6_K GGMLType = 14
	GGMLTypeQ8_K GGMLType = 15
	GGMLTypeBF16 GGMLType = 30
)

Common GGML tensor types.

type ModelConfig

type ModelConfig struct {
	Architecture         string
	Name                 string
	VocabSize            int
	HiddenSize           int
	NumLayers            int
	NumHeads             int
	NumKVHeads           int
	IntermediateSize     int
	MaxSeqLen            int
	RopeTheta            float64
	HeadDim              int     // explicit head dimension (0 = use HiddenSize/NumHeads)
	LogitSoftcap         float32 // if > 0, apply logit softcapping: cap * tanh(logit/cap)
	LocalRopeTheta       float64 // RoPE base for local/sliding-window layers (0 = use RopeTheta)
	SlidingWindow        int     // sliding window size for local attention layers
	SlidingWindowPattern int     // every Nth layer is global (0 = all global)
	RMSNormEps           float32 // RMSNorm epsilon (0 = use default 1e-5)
}

ModelConfig holds model configuration extracted from GGUF metadata.

func ExtractModelConfig

func ExtractModelConfig(f *File) (*ModelConfig, error)

ExtractModelConfig reads GGUF metadata and returns a ModelConfig. The architecture field (general.architecture) determines which metadata key prefix to use (e.g., "llama." or "gemma.").

type TensorInfo

type TensorInfo struct {
	Name       string
	Dimensions []uint64
	Type       GGMLType
	Offset     uint64 // relative to DataOffset
}

TensorInfo describes a single tensor in the GGUF file.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL