zerfoo

package module

v1.5.0 Latest Latest Go to latest Published: Mar 18, 2026 License: Apache-2.0 Imports: 21 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

README ¶

zerfoo

Pure Go ML inference framework -- embed any GGUF model in your Go application with go build ./....

245 tok/s on Gemma 3 1B Q4_K_M -- 20% faster than Ollama.

Quick Start

m, _ := zerfoo.Load("google/gemma-3-4b")  // downloads from HuggingFace
defer m.Close()
response, _ := m.Chat("Explain Go interfaces in one sentence.")
fmt.Println(response)

Installation

go get github.com/zerfoo/zerfoo

HuggingFace Download

Load accepts HuggingFace model IDs. Models are downloaded and cached automatically:

// Download by repo ID (defaults to Q4_K_M quantization)
m, err := zerfoo.Load("google/gemma-3-4b")

// Specify a quantization variant
m, err := zerfoo.Load("google/gemma-3-4b/Q8_0")

// Or load a local GGUF file
m, err := zerfoo.Load("./models/gemma-3-1b.gguf")

Streaming

Stream tokens as they are generated via a channel:

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

ch, err := m.ChatStream(context.Background(), "Tell me a joke.")
if err != nil {
    log.Fatal(err)
}
for tok := range ch {
    if !tok.Done {
        fmt.Print(tok.Text)
    }
}
fmt.Println()

Embeddings

Extract L2-normalized embeddings and compute similarity:

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

embeddings, _ := m.Embed([]string{
    "Go is a statically typed language.",
    "Rust has a borrow checker.",
})
score := embeddings[0].CosineSimilarity(embeddings[1])
fmt.Printf("similarity: %.4f\n", score)

Structured Output

Constrain model output to valid JSON matching a schema:

import "github.com/zerfoo/zerfoo/generate/grammar"

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

schema := grammar.JSONSchema{
    Type: "object",
    Properties: map[string]*grammar.JSONSchema{
        "name": {Type: "string"},
        "age":  {Type: "number"},
    },
    Required: []string{"name", "age"},
}

result, _ := m.Generate(context.Background(),
    "Generate a person named Alice who is 30.",
    zerfoo.WithSchema(schema),
)
fmt.Println(result.Text) // {"name": "Alice", "age": 30}

Tool Calling

Detect tool/function calls in model output (OpenAI-compatible):

import "github.com/zerfoo/zerfoo/serve"

m, _ := zerfoo.Load("google/gemma-3-4b")
defer m.Close()

tools := []serve.Tool{{
    Type: "function",
    Function: serve.ToolFunction{
        Name:        "get_weather",
        Description: "Get the current weather for a city",
        Parameters:  json.RawMessage(`{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}`),
    },
}}

result, _ := m.Generate(context.Background(),
    "What is the weather in Paris?",
    zerfoo.WithTools(tools...),
)

for _, tc := range result.ToolCalls {
    fmt.Printf("call %s(%s)\n", tc.FunctionName, tc.Arguments)
}

Supported Models

Model	Format	Status
Gemma 3	GGUF Q4_K	Production (CUDA graph, highest throughput)
Llama 3	GGUF	Working
Qwen 2.5	GGUF	Working
Mistral 7B	GGUF	Working
Phi-3/4	GGUF	Working
SigLIP	GGUF	Vision encoder (parity tested)
Kimi-VL	GGUF	Vision-language (parity tested)

CLI

go install github.com/zerfoo/zerfoo/cmd/zerfoo@latest

zerfoo pull gemma-3-1b-q4          # download a model
zerfoo run gemma-3-1b-q4 "Hello"   # generate text
zerfoo serve gemma-3-1b-q4         # OpenAI-compatible API server
zerfoo list                         # list cached models

Examples

See the examples/ directory for runnable programs:

chat -- interactive chatbot CLI
rag -- retrieval-augmented generation with embeddings
json-output -- grammar-guided structured JSON output
embedding -- embed inference in an HTTP server
api-server -- standalone API server
inference -- basic text generation

Links

Getting Started -- full walkthrough: install, pull a model, run inference via CLI and library
GPU Setup -- configure CUDA, ROCm, or OpenCL for hardware-accelerated inference
Benchmarks -- throughput numbers across models and hardware
Design -- architecture overview and key design decisions
Blog -- development updates and deep dives
CONTRIBUTING.md -- how to contribute

License

Apache 2.0

Documentation ¶

Overview ¶

Package zerfoo provides the core building blocks for creating and training neural networks. It offers a prelude of commonly used types to simplify development and enhance readability of model construction code.

Index ¶

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]
func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]
func NewDefaultTrainer[T tensor.Numeric](g *graph.Graph[T], lossNode graph.Node[T], opt optimizer.Optimizer[T], ...) *training.DefaultTrainer[T]
func NewFloat32Ops() numeric.Arithmetic[float32]
func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]
func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]
func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, ...) (*normalization.RMSNorm[T], error)
func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)
func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])
func UnregisterLayer(opType string)
type Batch
type Embedding
- func (e Embedding) CosineSimilarity(other Embedding) float32
type Engine
type GenerateOption
- func WithGenMaxTokens(n int) GenerateOption
- func WithGenTemperature(t float32) GenerateOption
- func WithGenTopP(p float32) GenerateOption
- func WithSchema(schema grammar.JSONSchema) GenerateOption
- func WithToolChoice(choice serve.ToolChoice) GenerateOption
- func WithTools(tools ...serve.Tool) GenerateOption
type GenerateResult
type Graph
type LayerBuilder
type Model
- func Load(pathOrID string) (*Model, error)
- func (m *Model) Chat(prompt string) (string, error)
- func (m *Model) ChatStream(ctx context.Context, prompt string, opts ...GenerateOption) (<-chan StreamToken, error)
- func (m *Model) Close() error
- func (m *Model) Embed(texts []string) ([]Embedding, error)
- func (m *Model) Generate(ctx context.Context, prompt string, opts ...GenerateOption) (*GenerateResult, error)
type Node
type Numeric
type Parameter
type StreamToken
type Tensor
type ToolCall

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewAdamW ¶

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]

NewAdamW creates a new AdamW optimizer with the given hyperparameters.

Stable.

func NewCPUEngine ¶

func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]

NewCPUEngine creates a new CPU computation engine for the given numeric type.

Stable.

func NewDefaultTrainer ¶

func NewDefaultTrainer[T tensor.Numeric](
	g *graph.Graph[T],
	lossNode graph.Node[T],
	opt optimizer.Optimizer[T],
	strategy training.GradientStrategy[T],
) *training.DefaultTrainer[T]

NewDefaultTrainer creates a new default trainer for the given graph, loss, optimizer, and gradient strategy.

Stable.

func NewFloat32Ops ¶

func NewFloat32Ops() numeric.Arithmetic[float32]

NewFloat32Ops returns the float32 arithmetic operations.

Stable.

func NewGraph ¶

func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]

NewGraph creates a new computation graph builder for the given engine.

Stable.

func NewMSE ¶

func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]

NewMSE creates a new Mean Squared Error loss function.

Stable.

func NewRMSNorm ¶

func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, options ...normalization.RMSNormOption[T]) (*normalization.RMSNorm[T], error)

NewRMSNorm creates a new RMSNorm normalization layer with the given configuration.

Stable.

func NewTensor ¶

func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)

NewTensor creates a new tensor with the given shape and data.

Stable.

func RegisterLayer ¶

func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])

RegisterLayer registers a new layer builder for the given operation type.

Stable.

func UnregisterLayer ¶

func UnregisterLayer(opType string)

UnregisterLayer unregisters the layer builder for the given operation type.

Stable.

Types ¶

type Batch ¶

type Batch[T tensor.Numeric] struct {
	Inputs  map[graph.Node[T]]*tensor.TensorNumeric[T]
	Targets *tensor.TensorNumeric[T]
}

Batch represents a training batch of inputs and targets.

Stable.

type Embedding ¶

type Embedding struct {
	Vector []float32
}

Embedding holds a text embedding vector.

Stable.

func (Embedding) CosineSimilarity ¶

func (e Embedding) CosineSimilarity(other Embedding) float32

CosineSimilarity computes the cosine similarity between two embeddings.

Stable.

type Engine ¶

type Engine[T tensor.Numeric] interface {
	compute.Engine[T]
}

Engine represents a computation engine (e.g., CPU or GPU).

Stable.

type GenerateOption ¶

type GenerateOption func(*generateOptions)

GenerateOption configures the behavior of Model.Generate.

Stable.

func WithGenMaxTokens ¶

func WithGenMaxTokens(n int) GenerateOption

WithGenMaxTokens sets the maximum number of tokens to generate.

Stable.

func WithGenTemperature ¶

func WithGenTemperature(t float32) GenerateOption

WithGenTemperature sets the sampling temperature.

Stable.

func WithGenTopP ¶

func WithGenTopP(p float32) GenerateOption

WithGenTopP sets the top-p (nucleus) sampling parameter.

Stable.

func WithSchema ¶

func WithSchema(schema grammar.JSONSchema) GenerateOption

WithSchema enables grammar-guided decoding.

The model's output will be constrained to valid JSON matching the given schema.

Experimental.

func WithToolChoice ¶

func WithToolChoice(choice serve.ToolChoice) GenerateOption

WithToolChoice sets the tool choice mode for tool call detection.

Experimental.

func WithTools ¶

func WithTools(tools ...serve.Tool) GenerateOption

WithTools configures the tools available for tool call detection.

When tools are provided, Model.Generate will attempt to detect tool calls in the model output and populate [GenerateResult.ToolCalls].

Experimental.

type GenerateResult ¶

type GenerateResult struct {
	Text       string
	TokenCount int
	Duration   time.Duration
	ToolCalls  []ToolCall
}

GenerateResult holds the result of a text generation call.

Stable.

type Graph ¶

type Graph[T tensor.Numeric] struct {
	*graph.Graph[T]
}

Graph represents a computation graph.

Stable.

type LayerBuilder ¶

type LayerBuilder[T tensor.Numeric] func(
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	name string,
	params map[string]*graph.Parameter[T],
	attributes map[string]interface{},
) (graph.Node[T], error)

LayerBuilder is a function that builds a computation graph layer.

Stable.

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

Model is a loaded language model ready for inference.

A Model is created via Load and used for text generation, embedding, and tool-call detection. Model.Close must be called when the model is no longer needed to release GPU and CPU resources.

Stable.

func Load ¶

func Load(pathOrID string) (*Model, error)

Load loads a model from a file path or HuggingFace model ID.

Paths starting with "/", "./" or "../" are treated as local GGUF files. All other strings are treated as HuggingFace model IDs (e.g. "google/gemma-3-4b" or "google/gemma-3-4b/Q8_0"). If the model is not cached locally it will be downloaded from HuggingFace.

Stable.

func (*Model) Chat ¶

func (m *Model) Chat(prompt string) (string, error)

Chat runs a simple one-shot generation and returns the generated text.

Stable.

func (*Model) ChatStream ¶

func (m *Model) ChatStream(ctx context.Context, prompt string, opts ...GenerateOption) (<-chan StreamToken, error)

ChatStream starts streaming generation and returns a receive-only channel that yields StreamToken values as they are generated. The channel is closed when generation completes or ctx is canceled. The error return is non-nil only if startup fails (e.g. the model is not loaded).

Stable.

func (*Model) Close ¶

func (m *Model) Close() error

Close releases model resources.

Stable.

func (*Model) Embed ¶

func (m *Model) Embed(texts []string) ([]Embedding, error)

Embed returns embeddings for the given texts.

Each input string is tokenized, its token embeddings are looked up from the model's embedding table, mean-pooled, and L2-normalized.

Stable.

func (*Model) Generate ¶

func (m *Model) Generate(ctx context.Context, prompt string, opts ...GenerateOption) (*GenerateResult, error)

Generate runs text generation with the given prompt and options.

Stable.

type Node ¶

type Node[T tensor.Numeric] interface {
	graph.Node[T]
}

Node represents a node in the computation graph.

Stable.

type Numeric ¶

type Numeric tensor.Numeric

Numeric represents a numeric type constraint for tensor elements.

Stable.

type Parameter ¶

type Parameter[T tensor.Numeric] struct {
	*graph.Parameter[T]
}

Parameter represents a trainable parameter in the model.

Stable.

type StreamToken ¶

type StreamToken struct {
	Text string
	Done bool
}

StreamToken represents a token received during streaming generation.

Stable.

type Tensor ¶

type Tensor[T tensor.Numeric] struct {
	*tensor.TensorNumeric[T]
}

Tensor represents a multi-dimensional array.

Stable.

type ToolCall ¶

type ToolCall struct {
	ID           string
	FunctionName string
	Arguments    json.RawMessage
}

ToolCall represents a tool invocation detected in model output.

Experimental.

Source Files ¶

View all Source files

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
bench command Command bench runs a standardized benchmark harness for zerfoo models.	Command bench runs a standardized benchmark harness for zerfoo models.
bench-compare command Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.	Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
bench_batch command Command bench_batch benchmarks continuous batching vs session pool throughput.	Command bench_batch benchmarks continuous batching vs session pool throughput.
bench_disagg command Command bench_disagg benchmarks disaggregated vs collocated serving throughput.	Command bench_disagg benchmarks disaggregated vs collocated serving throughput.
bench_mamba command Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates.	Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates.
bench_prefix command Command bench_prefix simulates a multi-turn chat workload to measure prefix cache hit rate and TTFT reduction.	Command bench_prefix simulates a multi-turn chat workload to measure prefix cache hit rate and TTFT reduction.
bench_spec command Command bench_spec benchmarks speculative decoding speedup by comparing standalone target model decode against speculative decode (target + draft).	Command bench_spec benchmarks speculative decoding speedup by comparing standalone target model decode against speculative decode (target + draft).
bench_tps command bench_tps measures tokens-per-second for a local ZMF model.	bench_tps measures tokens-per-second for a local ZMF model.
cli Package cli provides the command-line interface framework for Zerfoo.	Package cli provides the command-line interface framework for Zerfoo.
coverage-gate command Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.	Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
debug-infer command
finetune command Command finetune runs QLoRA fine-tuning on a GGUF model.	Command finetune runs QLoRA fine-tuning on a GGUF model.
train_distributed command Command train_distributed launches distributed training using FSDP.	Command train_distributed launches distributed training using FSDP.
ts_train command Command ts_train trains a PatchTST time-series signal model on offline feature data.	Command ts_train trains a PatchTST time-series signal model on offline feature data.
zerfoo command
zerfoo-predict command
zerfoo-tokenize command
config Package config provides file-based configuration loading with validation and environment variable overrides.	Package config provides file-based configuration loading with validation and environment variable overrides.
data
distributed Package distributed provides multi-node distributed training for the Zerfoo ML framework.	Package distributed provides multi-node distributed training for the Zerfoo ML framework.
coordinator Package coordinator provides a distributed training coordinator.	Package coordinator provides a distributed training coordinator.
fsdp Package fsdp implements Fully Sharded Data Parallelism (FSDP) for distributed training.	Package fsdp implements Fully Sharded Data Parallelism (FSDP) for distributed training.
pb
examples
api-server command Command api-server demonstrates starting an OpenAI-compatible inference server.	Command api-server demonstrates starting an OpenAI-compatible inference server.
chat command Command chat demonstrates a simple interactive chatbot using the zerfoo one-line API.	Command chat demonstrates a simple interactive chatbot using the zerfoo one-line API.
embedding command Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.	Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
inference command Command inference demonstrates loading a GGUF model and generating text.	Command inference demonstrates loading a GGUF model and generating text.
json-output command Command json-output demonstrates grammar-guided decoding with a JSON schema.	Command json-output demonstrates grammar-guided decoding with a JSON schema.
rag command Command rag demonstrates retrieval-augmented generation using Zerfoo.	Command rag demonstrates retrieval-augmented generation using Zerfoo.
streaming command Command streaming demonstrates streaming chat generation using the zerfoo API.	Command streaming demonstrates streaming chat generation using the zerfoo API.
features
generate Package generate implements autoregressive text generation for transformer models loaded by the inference package.	Package generate implements autoregressive text generation for transformer models loaded by the inference package.
agent
grammar Package grammar converts a subset of JSON Schema into a context-free grammar state machine that can constrain token-by-token generation to produce only valid JSON conforming to the schema.	Package grammar converts a subset of JSON Schema into a context-free grammar state machine that can constrain token-by-token generation to produce only valid JSON conforming to the schema.
speculative Package speculative implements speculative decoding strategies for accelerating autoregressive text generation.	Package speculative implements speculative decoding strategies for accelerating autoregressive text generation.
health Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.	Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
inference Package inference provides a high-level API for loading GGUF models and running text generation, chat, embedding, and speculative decoding with minimal boilerplate.	Package inference provides a high-level API for loading GGUF models and running text generation, chat, embedding, and speculative decoding with minimal boilerplate.
multimodal Package multimodal provides audio preprocessing for audio-language model inference.	Package multimodal provides audio preprocessing for audio-language model inference.
timeseries Package timeseries implements time-series model builders.	Package timeseries implements time-series model builders.
timeseries/features Package features provides a feature store for the Wolf time-series ML platform.	Package features provides a feature store for the Wolf time-series ML platform.
internal
clblast Package clblast provides Go wrappers for the CLBlast BLAS library.	Package clblast provides Go wrappers for the CLBlast BLAS library.
codegen Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.	Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
cublas Package cublas provides low-level purego bindings for the cuBLAS library.	Package cublas provides low-level purego bindings for the cuBLAS library.
cuda Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).	Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
cuda/kernels Package kernels provides Go wrappers for custom CUDA kernels.	Package kernels provides Go wrappers for custom CUDA kernels.
cudnn Package cudnn provides purego bindings for the NVIDIA cuDNN library.	Package cudnn provides purego bindings for the NVIDIA cuDNN library.
gpuapi Package gpuapi defines internal interfaces for GPU runtime operations.	Package gpuapi defines internal interfaces for GPU runtime operations.
hip Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.	Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
hip/kernels Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.	Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
miopen Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.	Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
nccl Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).	Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
opencl Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.	Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
opencl/kernels Package kernels provides OpenCL kernel source and dispatch for elementwise operations.	Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
rocblas Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.	Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
tensorrt Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).	Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
workerpool Package workerpool provides a persistent pool of goroutines that process submitted tasks.	Package workerpool provides a persistent pool of goroutines that process submitted tasks.
xblas
layers Package layers provides neural network layer implementations for the Zerfoo ML framework.	Package layers provides neural network layer implementations for the Zerfoo ML framework.
activations Package activations provides activation function layers.	Package activations provides activation function layers.
attention Package attention provides attention mechanisms for neural networks.	Package attention provides attention mechanisms for neural networks.
audio Package audio provides audio-related neural network layers.	Package audio provides audio-related neural network layers.
components Package components provides reusable components for neural network layers.	Package components provides reusable components for neural network layers.
core Package core provides core neural network layer implementations.	Package core provides core neural network layer implementations.
embeddings Package embeddings provides neural network embedding layers for the Zerfoo ML framework.	Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
features
gather Package gather provides the Gather layer for the Zerfoo ML framework.	Package gather provides the Gather layer for the Zerfoo ML framework.
hrm Package hrm implements the Hierarchical Reasoning Model.	Package hrm implements the Hierarchical Reasoning Model.
normalization Package normalization provides various normalization layers for neural networks.	Package normalization provides various normalization layers for neural networks.
recurrent
reducesum Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.	Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
registry Package registry provides a central registration point for all layer builders.	Package registry provides a central registration point for all layer builders.
regularization Package regularization provides regularization layers for neural networks.	Package regularization provides regularization layers for neural networks.
sequence Package sequence provides sequence modeling layers such as State Space Models.	Package sequence provides sequence modeling layers such as State Space Models.
ssm Package ssm implements state space model layers.	Package ssm implements state space model layers.
timeseries
tokenizers
transformer Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.	Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
transpose Package transpose provides the Transpose layer for the Zerfoo ML framework.	Package transpose provides the Transpose layer for the Zerfoo ML framework.
model Package model provides adapter implementations for bridging existing and new model interfaces.	Package model provides adapter implementations for bridging existing and new model interfaces.
gguf Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.	Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
hrm Package hrm provides experimental Hierarchical Reasoning Model types.	Package hrm provides experimental Hierarchical Reasoning Model types.
huggingface
registry
serve Package serve provides an OpenAI-compatible HTTP API server for model inference.	Package serve provides an OpenAI-compatible HTTP API server for model inference.
agent Package agent adapts the generate/agent agentic loop to the OpenAI-compatible chat completions API, translating between OpenAI tool definitions and the internal ToolRegistry/Supervisor types.	Package agent adapts the generate/agent agentic loop to the OpenAI-compatible chat completions API, translating between OpenAI tool definitions and the internal ToolRegistry/Supervisor types.
batcher Package batcher implements a continuous batching scheduler for inference serving.	Package batcher implements a continuous batching scheduler for inference serving.
cloud Package cloud provides multi-tenant namespace isolation for the serving layer.	Package cloud provides multi-tenant namespace isolation for the serving layer.
disaggregated Package disaggregated implements disaggregated prefill/decode serving.	Package disaggregated implements disaggregated prefill/decode serving.
disaggregated/proto Package disaggpb defines the gRPC service contracts for disaggregated prefill/decode serving.	Package disaggpb defines the gRPC service contracts for disaggregated prefill/decode serving.
registry Package registry provides a bbolt-backed model version registry for tracking, activating, and managing model versions used by the serving layer.	Package registry provides a bbolt-backed model version registry for tracking, activating, and managing model versions used by the serving layer.
shutdown Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.	Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
tests
internal/testutil
training Package training contains end-to-end training loop integration tests.	Package training contains end-to-end training loop integration tests.
training Package training provides adapter implementations for bridging existing and new interfaces.	Package training provides adapter implementations for bridging existing and new interfaces.
automl Package automl provides automated machine learning utilities including Bayesian hyperparameter optimization.	Package automl provides automated machine learning utilities including Bayesian hyperparameter optimization.
fp8 Package fp8 provides FP8 mixed-precision training layers.	Package fp8 provides FP8 mixed-precision training layers.
lora Package lora provides Low-Rank Adaptation layers for parameter-efficient fine-tuning.	Package lora provides Low-Rank Adaptation layers for parameter-efficient fine-tuning.
loss Package loss provides various loss functions for neural networks.	Package loss provides various loss functions for neural networks.
nas Package nas implements Neural Architecture Search for the Zerfoo ML framework.	Package nas implements Neural Architecture Search for the Zerfoo ML framework.
online Package online provides online learning components for continuous model adaptation.	Package online provides online learning components for continuous model adaptation.
optimizer Package optimizer provides various optimization algorithms for neural networks.	Package optimizer provides various optimization algorithms for neural networks.