zerfoo

package module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 15, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

README

Zerfoo

A Go ML framework built for inference performance. Pure Go with no CGo -- GPU acceleration via runtime-loaded CUDA, ROCm, and OpenCL. Ships an OpenAI-compatible API server, quantized model support (GGUF Q4_K), and CUDA graph capture for near-zero kernel launch overhead.

Installation

Prerequisites:

  • Go 1.25 or later
  • CUDA toolkit (for GPU acceleration; optional for CPU-only usage)
git clone https://github.com/zerfoo/zerfoo.git
cd zerfoo
go build ./...

For GPU support, build the CUDA kernels first:

cd internal/cuda/kernels && make shared && cd ../../../

Quickstart

Pull a model and run inference:

go run ./cmd/zerfoo pull gemma3:1b
go run ./cmd/zerfoo run gemma3:1b "The quick brown fox"

Or start the API server:

go run ./cmd/zerfoo serve --model gemma3:1b --port 8080

Supported Models

Model Format Status
Gemma 3 GGUF Q4_K Production (CUDA graph, highest throughput)
Llama 3 ZMF/ONNX Working
Qwen 2.5 ZMF/ONNX Working
Mistral 7B ZMF/ONNX Working
Phi-3/4 ZMF/ONNX Working
SigLIP ZMF Vision encoder (parity tested)
Kimi-VL ZMF Vision-language (parity tested)

See docs/benchmarks.md for current throughput numbers.

API Usage

Start the server, then send requests to the chat completions endpoint:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gemma3:1b","messages":[{"role":"user","content":"Hello"}]}'

The server implements the OpenAI chat completions API, so any OpenAI-compatible client library works out of the box -- just point it at localhost:8080.

Key Design Decisions

  • Engine[T] interface -- unified compute abstraction across CPU and GPU backends. All layers delegate arithmetic to Engine, enabling transparent hardware acceleration.
  • purego GPU bindings -- CUDA, ROCm, and OpenCL loaded via dlopen at runtime. No CGo, no build tags. go build ./... works everywhere.
  • Graph compiler with CUDA graph capture -- builds a static computation DAG, captures it as a CUDA graph for near-zero launch overhead on decode.
  • Arena memory allocator -- pre-allocated bump-pointer arena serves all inference allocations with O(1) reset per token.
  • OpenAI-compatible HTTP server -- chat completions, completions, embeddings, model management, and SSE streaming.

See docs/design.md for the full architecture.

Contributing

Standard Go workflow: fork, branch, test, PR.

The pre-commit hook enforces single-directory commits. Run tests before submitting:

go test ./... -race

License

Apache 2.0

Documentation

Overview

Package zerfoo provides the core building blocks for creating and training neural networks. It offers a prelude of commonly used types to simplify development and enhance readability of model construction code.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BuildFromZMF added in v0.3.0

func BuildFromZMF[T tensor.Numeric](
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	m *zmf.Model,
	opts ...model.BuildOption,
) (*graph.Graph[T], error)

BuildFromZMF builds a graph from a ZMF model.

func NewAdamW

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]

NewAdamW creates a new AdamW optimizer.

func NewCPUEngine

func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]

NewCPUEngine creates a new CPU engine for the given numeric type.

func NewDefaultTrainer

func NewDefaultTrainer[T tensor.Numeric](
	g *graph.Graph[T],
	lossNode graph.Node[T],
	opt optimizer.Optimizer[T],
	strategy training.GradientStrategy[T],
) *training.DefaultTrainer[T]

NewDefaultTrainer creates a new default trainer.

func NewFloat32Ops

func NewFloat32Ops() numeric.Arithmetic[float32]

NewFloat32Ops returns the float32 arithmetic operations.

func NewGraph

func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]

NewGraph creates a new computation graph.

func NewMSE

func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]

NewMSE creates a new Mean Squared Error loss function.

func NewRMSNorm

func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, options ...normalization.RMSNormOption[T]) (*normalization.RMSNorm[T], error)

NewRMSNorm is a factory function for creating RMSNorm layers.

func NewTensor

func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)

NewTensor creates a new tensor with the given shape and data.

func RegisterLayer

func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])

RegisterLayer registers a new layer builder.

func UnregisterLayer

func UnregisterLayer(opType string)

UnregisterLayer unregisters a layer builder.

Types

type Batch

type Batch[T tensor.Numeric] struct {
	Inputs  map[graph.Node[T]]*tensor.TensorNumeric[T]
	Targets *tensor.TensorNumeric[T]
}

Batch represents a training batch.

type Engine

type Engine[T tensor.Numeric] interface {
	compute.Engine[T]
}

Engine represents a computation engine (e.g., CPU).

type Graph

type Graph[T tensor.Numeric] struct {
	*graph.Graph[T]
}

Graph represents a computation graph.

type LayerBuilder

type LayerBuilder[T tensor.Numeric] func(
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	name string,
	params map[string]*graph.Parameter[T],
	attributes map[string]interface{},
) (graph.Node[T], error)

LayerBuilder is a function that builds a layer.

type Node

type Node[T tensor.Numeric] interface {
	graph.Node[T]
}

Node represents a node in the computation graph.

type Numeric

type Numeric tensor.Numeric

Numeric represents a numeric type constraint.

type Parameter

type Parameter[T tensor.Numeric] struct {
	*graph.Parameter[T]
}

Parameter represents a trainable parameter in the model.

type Tensor

type Tensor[T tensor.Numeric] struct {
	*tensor.TensorNumeric[T]
}

Tensor represents a multi-dimensional array.

type ZMFModel added in v0.3.0

type ZMFModel = zmf.Model

Model is a ZMF model.

Directories

Path Synopsis
cmd
bench-compare command
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
bench_tps command
bench_tps measures tokens-per-second for a local ZMF model.
bench_tps measures tokens-per-second for a local ZMF model.
cli
Package cli provides a generic command-line interface framework for Zerfoo.
Package cli provides a generic command-line interface framework for Zerfoo.
coverage-gate command
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
debug-infer command
zerfoo command
zerfoo-predict command
zerfoo-tokenize command
Package compute implements tensor computation engines and operations.
Package compute implements tensor computation engines and operations.
Package config provides file-based configuration loading with validation and environment variable overrides.
Package config provides file-based configuration loading with validation and environment variable overrides.
Package device provides device abstraction and memory allocation interfaces.
Package device provides device abstraction and memory allocation interfaces.
Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.
Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.
coordinator
Package coordinator provides a distributed training coordinator.
Package coordinator provides a distributed training coordinator.
pb
Package graph provides a computational graph abstraction.
Package graph provides a computational graph abstraction.
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
Package inference provides a high-level API for loading models and generating text with minimal boilerplate.
Package inference provides a high-level API for loading models and generating text with minimal boilerplate.
internal
clblast
Package clblast provides Go wrappers for the CLBlast BLAS library.
Package clblast provides Go wrappers for the CLBlast BLAS library.
codegen
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
cublas
Package cublas provides low-level purego bindings for the cuBLAS library.
Package cublas provides low-level purego bindings for the cuBLAS library.
cuda
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
cuda/kernels
Package kernels provides Go wrappers for custom CUDA kernels.
Package kernels provides Go wrappers for custom CUDA kernels.
cudnn
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
gpuapi
Package gpuapi defines internal interfaces for GPU runtime operations.
Package gpuapi defines internal interfaces for GPU runtime operations.
hip
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
hip/kernels
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
miopen
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
nccl
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
opencl
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
opencl/kernels
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
rocblas
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
tensorrt
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
workerpool
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
layers
activations
Package activations provides activation function layers.
Package activations provides activation function layers.
attention
Package attention provides attention mechanisms for neural networks.
Package attention provides attention mechanisms for neural networks.
components
Package components provides reusable components for neural network layers.
Package components provides reusable components for neural network layers.
core
Package core provides core neural network layer implementations.
Package core provides core neural network layer implementations.
embeddings
Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
gather
Package gather provides the Gather layer for the Zerfoo ML framework.
Package gather provides the Gather layer for the Zerfoo ML framework.
hrm
Package hrm implements the Hierarchical Reasoning Model.
Package hrm implements the Hierarchical Reasoning Model.
normalization
Package normalization provides various normalization layers for neural networks.
Package normalization provides various normalization layers for neural networks.
reducesum
Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
registry
Package registry provides a central registration point for all layer builders.
Package registry provides a central registration point for all layer builders.
regularization
Package regularization provides regularization layers for neural networks.
Package regularization provides regularization layers for neural networks.
sequence
Package sequence provides sequence modeling layers such as State Space Models.
Package sequence provides sequence modeling layers such as State Space Models.
transformer
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
transpose
Package transpose provides the Transpose layer for the Zerfoo ML framework.
Package transpose provides the Transpose layer for the Zerfoo ML framework.
Package log provides a structured, leveled logging abstraction.
Package log provides a structured, leveled logging abstraction.
runtime
Package runtime provides a backend-agnostic metrics collection abstraction for runtime observability.
Package runtime provides a backend-agnostic metrics collection abstraction for runtime observability.
Package model provides adapter implementations for bridging existing and new model interfaces.
Package model provides adapter implementations for bridging existing and new model interfaces.
gguf
Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
hrm
Package hrm provides experimental Hierarchical Reasoning Model types.
Package hrm provides experimental Hierarchical Reasoning Model types.
Package numeric provides precision types, arithmetic operations, and generic constraints for the Zerfoo ML framework.
Package numeric provides precision types, arithmetic operations, and generic constraints for the Zerfoo ML framework.
pkg
tokenizer
Package tokenizer provides text tokenization for ML model inference.
Package tokenizer provides text tokenization for ML model inference.
Package serve provides an OpenAI-compatible HTTP API server for model inference.
Package serve provides an OpenAI-compatible HTTP API server for model inference.
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
Package tensor provides a multi-dimensional array (tensor) implementation.
Package tensor provides a multi-dimensional array (tensor) implementation.
testing
testutils
Package testutils provides testing utilities and mock implementations for the Zerfoo ML framework.
Package testutils provides testing utilities and mock implementations for the Zerfoo ML framework.
tests
Package training provides adapter implementations for bridging existing and new interfaces.
Package training provides adapter implementations for bridging existing and new interfaces.
loss
Package loss provides various loss functions for neural networks.
Package loss provides various loss functions for neural networks.
optimizer
Package optimizer provides various optimization algorithms for neural networks.
Package optimizer provides various optimization algorithms for neural networks.
Package types contains shared, fundamental types for the Zerfoo framework.
Package types contains shared, fundamental types for the Zerfoo framework.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL