Documentation
¶
Overview ¶
Package zerfoo provides the core building blocks for creating and training neural networks. It offers a prelude of commonly used types to simplify development and enhance readability of model construction code.
Index ¶
- func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]
- func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]
- func NewDefaultTrainer[T tensor.Numeric](g *graph.Graph[T], lossNode graph.Node[T], opt optimizer.Optimizer[T], ...) *training.DefaultTrainer[T]
- func NewFloat32Ops() numeric.Arithmetic[float32]
- func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]
- func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]
- func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, ...) (*normalization.RMSNorm[T], error)
- func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)
- func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])
- func UnregisterLayer(opType string)
- type Batch
- type Embedding
- type Engine
- type GenerateOption
- func WithGenMaxTokens(n int) GenerateOption
- func WithGenTemperature(t float32) GenerateOption
- func WithGenTopP(p float32) GenerateOption
- func WithSchema(schema grammar.JSONSchema) GenerateOption
- func WithToolChoice(choice serve.ToolChoice) GenerateOption
- func WithTools(tools ...serve.Tool) GenerateOption
- type GenerateResult
- type Graph
- type LayerBuilder
- type Model
- func (m *Model) Chat(prompt string) (string, error)
- func (m *Model) ChatStream(ctx context.Context, prompt string, opts ...GenerateOption) (<-chan StreamToken, error)
- func (m *Model) Close() error
- func (m *Model) Embed(texts []string) ([]Embedding, error)
- func (m *Model) Generate(ctx context.Context, prompt string, opts ...GenerateOption) (*GenerateResult, error)
- type Node
- type Numeric
- type Parameter
- type StreamToken
- type Tensor
- type ToolCall
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewAdamW ¶
func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]
NewAdamW creates a new AdamW optimizer with the given hyperparameters.
Stable.
func NewCPUEngine ¶
NewCPUEngine creates a new CPU computation engine for the given numeric type.
Stable.
func NewDefaultTrainer ¶
func NewDefaultTrainer[T tensor.Numeric]( g *graph.Graph[T], lossNode graph.Node[T], opt optimizer.Optimizer[T], strategy training.GradientStrategy[T], ) *training.DefaultTrainer[T]
NewDefaultTrainer creates a new default trainer for the given graph, loss, optimizer, and gradient strategy.
Stable.
func NewFloat32Ops ¶
func NewFloat32Ops() numeric.Arithmetic[float32]
NewFloat32Ops returns the float32 arithmetic operations.
Stable.
func NewRMSNorm ¶
func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, options ...normalization.RMSNormOption[T]) (*normalization.RMSNorm[T], error)
NewRMSNorm creates a new RMSNorm normalization layer with the given configuration.
Stable.
func RegisterLayer ¶
func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])
RegisterLayer registers a new layer builder for the given operation type.
Stable.
func UnregisterLayer ¶
func UnregisterLayer(opType string)
UnregisterLayer unregisters the layer builder for the given operation type.
Stable.
Types ¶
type Batch ¶
type Batch[T tensor.Numeric] struct { Inputs map[graph.Node[T]]*tensor.TensorNumeric[T] Targets *tensor.TensorNumeric[T] }
Batch represents a training batch of inputs and targets.
Stable.
type Embedding ¶
type Embedding struct {
Vector []float32
}
Embedding holds a text embedding vector.
Stable.
func (Embedding) CosineSimilarity ¶
CosineSimilarity computes the cosine similarity between two embeddings.
Stable.
type GenerateOption ¶
type GenerateOption func(*generateOptions)
GenerateOption configures the behavior of Model.Generate.
Stable.
func WithGenMaxTokens ¶
func WithGenMaxTokens(n int) GenerateOption
WithGenMaxTokens sets the maximum number of tokens to generate.
Stable.
func WithGenTemperature ¶
func WithGenTemperature(t float32) GenerateOption
WithGenTemperature sets the sampling temperature.
Stable.
func WithGenTopP ¶
func WithGenTopP(p float32) GenerateOption
WithGenTopP sets the top-p (nucleus) sampling parameter.
Stable.
func WithSchema ¶
func WithSchema(schema grammar.JSONSchema) GenerateOption
WithSchema enables grammar-guided decoding.
The model's output will be constrained to valid JSON matching the given schema.
Experimental.
func WithToolChoice ¶
func WithToolChoice(choice serve.ToolChoice) GenerateOption
WithToolChoice sets the tool choice mode for tool call detection.
Experimental.
func WithTools ¶
func WithTools(tools ...serve.Tool) GenerateOption
WithTools configures the tools available for tool call detection.
When tools are provided, Model.Generate will attempt to detect tool calls in the model output and populate [GenerateResult.ToolCalls].
Experimental.
type GenerateResult ¶
type GenerateResult struct {
Text string
TokenCount int
Duration time.Duration
ToolCalls []ToolCall
}
GenerateResult holds the result of a text generation call.
Stable.
type LayerBuilder ¶
type LayerBuilder[T tensor.Numeric] func( engine compute.Engine[T], ops numeric.Arithmetic[T], name string, params map[string]*graph.Parameter[T], attributes map[string]interface{}, ) (graph.Node[T], error)
LayerBuilder is a function that builds a computation graph layer.
Stable.
type Model ¶
type Model struct {
// contains filtered or unexported fields
}
Model is a loaded language model ready for inference.
A Model is created via Load and used for text generation, embedding, and tool-call detection. Model.Close must be called when the model is no longer needed to release GPU and CPU resources.
Stable.
func Load ¶
Load loads a model from a file path or HuggingFace model ID.
Paths starting with "/", "./" or "../" are treated as local GGUF files. All other strings are treated as HuggingFace model IDs (e.g. "google/gemma-3-4b" or "google/gemma-3-4b/Q8_0"). If the model is not cached locally it will be downloaded from HuggingFace.
Stable.
func (*Model) ChatStream ¶
func (m *Model) ChatStream(ctx context.Context, prompt string, opts ...GenerateOption) (<-chan StreamToken, error)
ChatStream starts streaming generation and returns a receive-only channel that yields StreamToken values as they are generated. The channel is closed when generation completes or ctx is canceled. The error return is non-nil only if startup fails (e.g. the model is not loaded).
Stable.
func (*Model) Embed ¶
Embed returns embeddings for the given texts.
Each input string is tokenized, its token embeddings are looked up from the model's embedding table, mean-pooled, and L2-normalized.
Stable.
func (*Model) Generate ¶
func (m *Model) Generate(ctx context.Context, prompt string, opts ...GenerateOption) (*GenerateResult, error)
Generate runs text generation with the given prompt and options.
Stable.
Directories
¶
| Path | Synopsis |
|---|---|
|
cmd
|
|
|
bench
command
Command bench runs a standardized benchmark harness for zerfoo models.
|
Command bench runs a standardized benchmark harness for zerfoo models. |
|
bench-compare
command
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
|
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report. |
|
bench_batch
command
Command bench_batch benchmarks continuous batching vs session pool throughput.
|
Command bench_batch benchmarks continuous batching vs session pool throughput. |
|
bench_disagg
command
Command bench_disagg benchmarks disaggregated vs collocated serving throughput.
|
Command bench_disagg benchmarks disaggregated vs collocated serving throughput. |
|
bench_mamba
command
Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates.
|
Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates. |
|
bench_prefix
command
Command bench_prefix simulates a multi-turn chat workload to measure prefix cache hit rate and TTFT reduction.
|
Command bench_prefix simulates a multi-turn chat workload to measure prefix cache hit rate and TTFT reduction. |
|
bench_spec
command
Command bench_spec benchmarks speculative decoding speedup by comparing standalone target model decode against speculative decode (target + draft).
|
Command bench_spec benchmarks speculative decoding speedup by comparing standalone target model decode against speculative decode (target + draft). |
|
bench_tps
command
bench_tps measures tokens-per-second for a local ZMF model.
|
bench_tps measures tokens-per-second for a local ZMF model. |
|
cli
Package cli provides the command-line interface framework for Zerfoo.
|
Package cli provides the command-line interface framework for Zerfoo. |
|
coverage-gate
command
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
|
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold. |
|
debug-infer
command
|
|
|
finetune
command
Command finetune runs QLoRA fine-tuning on a GGUF model.
|
Command finetune runs QLoRA fine-tuning on a GGUF model. |
|
train_distributed
command
Command train_distributed launches distributed training using FSDP.
|
Command train_distributed launches distributed training using FSDP. |
|
ts_train
command
Command ts_train trains a PatchTST time-series signal model on offline feature data.
|
Command ts_train trains a PatchTST time-series signal model on offline feature data. |
|
zerfoo
command
|
|
|
zerfoo-predict
command
|
|
|
zerfoo-tokenize
command
|
|
|
Package config provides file-based configuration loading with validation and environment variable overrides.
|
Package config provides file-based configuration loading with validation and environment variable overrides. |
|
Package distributed provides multi-node distributed training for the Zerfoo ML framework.
|
Package distributed provides multi-node distributed training for the Zerfoo ML framework. |
|
coordinator
Package coordinator provides a distributed training coordinator.
|
Package coordinator provides a distributed training coordinator. |
|
fsdp
Package fsdp implements Fully Sharded Data Parallelism (FSDP) for distributed training.
|
Package fsdp implements Fully Sharded Data Parallelism (FSDP) for distributed training. |
|
examples
|
|
|
api-server
command
Command api-server demonstrates starting an OpenAI-compatible inference server.
|
Command api-server demonstrates starting an OpenAI-compatible inference server. |
|
chat
command
Command chat demonstrates a simple interactive chatbot using the zerfoo one-line API.
|
Command chat demonstrates a simple interactive chatbot using the zerfoo one-line API. |
|
embedding
command
Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
|
Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler. |
|
inference
command
Command inference demonstrates loading a GGUF model and generating text.
|
Command inference demonstrates loading a GGUF model and generating text. |
|
json-output
command
Command json-output demonstrates grammar-guided decoding with a JSON schema.
|
Command json-output demonstrates grammar-guided decoding with a JSON schema. |
|
rag
command
Command rag demonstrates retrieval-augmented generation using Zerfoo.
|
Command rag demonstrates retrieval-augmented generation using Zerfoo. |
|
streaming
command
Command streaming demonstrates streaming chat generation using the zerfoo API.
|
Command streaming demonstrates streaming chat generation using the zerfoo API. |
|
Package generate implements autoregressive text generation for transformer models loaded by the inference package.
|
Package generate implements autoregressive text generation for transformer models loaded by the inference package. |
|
grammar
Package grammar converts a subset of JSON Schema into a context-free grammar state machine that can constrain token-by-token generation to produce only valid JSON conforming to the schema.
|
Package grammar converts a subset of JSON Schema into a context-free grammar state machine that can constrain token-by-token generation to produce only valid JSON conforming to the schema. |
|
speculative
Package speculative implements speculative decoding strategies for accelerating autoregressive text generation.
|
Package speculative implements speculative decoding strategies for accelerating autoregressive text generation. |
|
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
|
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes. |
|
Package inference provides a high-level API for loading GGUF models and running text generation, chat, embedding, and speculative decoding with minimal boilerplate.
|
Package inference provides a high-level API for loading GGUF models and running text generation, chat, embedding, and speculative decoding with minimal boilerplate. |
|
multimodal
Package multimodal provides audio preprocessing for audio-language model inference.
|
Package multimodal provides audio preprocessing for audio-language model inference. |
|
timeseries
Package timeseries implements time-series model builders.
|
Package timeseries implements time-series model builders. |
|
timeseries/features
Package features provides a feature store for the Wolf time-series ML platform.
|
Package features provides a feature store for the Wolf time-series ML platform. |
|
internal
|
|
|
clblast
Package clblast provides Go wrappers for the CLBlast BLAS library.
|
Package clblast provides Go wrappers for the CLBlast BLAS library. |
|
codegen
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
|
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape. |
|
cublas
Package cublas provides low-level purego bindings for the cuBLAS library.
|
Package cublas provides low-level purego bindings for the cuBLAS library. |
|
cuda
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
|
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo). |
|
cuda/kernels
Package kernels provides Go wrappers for custom CUDA kernels.
|
Package kernels provides Go wrappers for custom CUDA kernels. |
|
cudnn
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
|
Package cudnn provides purego bindings for the NVIDIA cuDNN library. |
|
gpuapi
Package gpuapi defines internal interfaces for GPU runtime operations.
|
Package gpuapi defines internal interfaces for GPU runtime operations. |
|
hip
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
|
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen. |
|
hip/kernels
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
|
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen. |
|
miopen
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
|
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen. |
|
nccl
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
|
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL). |
|
opencl
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
|
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API. |
|
opencl/kernels
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
|
Package kernels provides OpenCL kernel source and dispatch for elementwise operations. |
|
rocblas
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
|
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen. |
|
tensorrt
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
|
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo). |
|
workerpool
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
|
Package workerpool provides a persistent pool of goroutines that process submitted tasks. |
|
Package layers provides neural network layer implementations for the Zerfoo ML framework.
|
Package layers provides neural network layer implementations for the Zerfoo ML framework. |
|
activations
Package activations provides activation function layers.
|
Package activations provides activation function layers. |
|
attention
Package attention provides attention mechanisms for neural networks.
|
Package attention provides attention mechanisms for neural networks. |
|
audio
Package audio provides audio-related neural network layers.
|
Package audio provides audio-related neural network layers. |
|
components
Package components provides reusable components for neural network layers.
|
Package components provides reusable components for neural network layers. |
|
core
Package core provides core neural network layer implementations.
|
Package core provides core neural network layer implementations. |
|
embeddings
Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
|
Package embeddings provides neural network embedding layers for the Zerfoo ML framework. |
|
gather
Package gather provides the Gather layer for the Zerfoo ML framework.
|
Package gather provides the Gather layer for the Zerfoo ML framework. |
|
hrm
Package hrm implements the Hierarchical Reasoning Model.
|
Package hrm implements the Hierarchical Reasoning Model. |
|
normalization
Package normalization provides various normalization layers for neural networks.
|
Package normalization provides various normalization layers for neural networks. |
|
reducesum
Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
|
Package reducesum provides the ReduceSum layer for the Zerfoo ML framework. |
|
registry
Package registry provides a central registration point for all layer builders.
|
Package registry provides a central registration point for all layer builders. |
|
regularization
Package regularization provides regularization layers for neural networks.
|
Package regularization provides regularization layers for neural networks. |
|
sequence
Package sequence provides sequence modeling layers such as State Space Models.
|
Package sequence provides sequence modeling layers such as State Space Models. |
|
ssm
Package ssm implements state space model layers.
|
Package ssm implements state space model layers. |
|
transformer
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
|
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks. |
|
transpose
Package transpose provides the Transpose layer for the Zerfoo ML framework.
|
Package transpose provides the Transpose layer for the Zerfoo ML framework. |
|
Package model provides adapter implementations for bridging existing and new model interfaces.
|
Package model provides adapter implementations for bridging existing and new model interfaces. |
|
gguf
Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
|
Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp. |
|
hrm
Package hrm provides experimental Hierarchical Reasoning Model types.
|
Package hrm provides experimental Hierarchical Reasoning Model types. |
|
Package serve provides an OpenAI-compatible HTTP API server for model inference.
|
Package serve provides an OpenAI-compatible HTTP API server for model inference. |
|
agent
Package agent adapts the generate/agent agentic loop to the OpenAI-compatible chat completions API, translating between OpenAI tool definitions and the internal ToolRegistry/Supervisor types.
|
Package agent adapts the generate/agent agentic loop to the OpenAI-compatible chat completions API, translating between OpenAI tool definitions and the internal ToolRegistry/Supervisor types. |
|
batcher
Package batcher implements a continuous batching scheduler for inference serving.
|
Package batcher implements a continuous batching scheduler for inference serving. |
|
cloud
Package cloud provides multi-tenant namespace isolation for the serving layer.
|
Package cloud provides multi-tenant namespace isolation for the serving layer. |
|
disaggregated
Package disaggregated implements disaggregated prefill/decode serving.
|
Package disaggregated implements disaggregated prefill/decode serving. |
|
disaggregated/proto
Package disaggpb defines the gRPC service contracts for disaggregated prefill/decode serving.
|
Package disaggpb defines the gRPC service contracts for disaggregated prefill/decode serving. |
|
registry
Package registry provides a bbolt-backed model version registry for tracking, activating, and managing model versions used by the serving layer.
|
Package registry provides a bbolt-backed model version registry for tracking, activating, and managing model versions used by the serving layer. |
|
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
|
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks. |
|
tests
|
|
|
training
Package training contains end-to-end training loop integration tests.
|
Package training contains end-to-end training loop integration tests. |
|
Package training provides adapter implementations for bridging existing and new interfaces.
|
Package training provides adapter implementations for bridging existing and new interfaces. |
|
automl
Package automl provides automated machine learning utilities including Bayesian hyperparameter optimization.
|
Package automl provides automated machine learning utilities including Bayesian hyperparameter optimization. |
|
fp8
Package fp8 provides FP8 mixed-precision training layers.
|
Package fp8 provides FP8 mixed-precision training layers. |
|
lora
Package lora provides Low-Rank Adaptation layers for parameter-efficient fine-tuning.
|
Package lora provides Low-Rank Adaptation layers for parameter-efficient fine-tuning. |
|
loss
Package loss provides various loss functions for neural networks.
|
Package loss provides various loss functions for neural networks. |
|
nas
Package nas implements Neural Architecture Search for the Zerfoo ML framework.
|
Package nas implements Neural Architecture Search for the Zerfoo ML framework. |
|
online
Package online provides online learning components for continuous model adaptation.
|
Package online provides online learning components for continuous model adaptation. |
|
optimizer
Package optimizer provides various optimization algorithms for neural networks.
|
Package optimizer provides various optimization algorithms for neural networks. |