zerfoo

package module

v1.3.0 Latest Latest Go to latest Published: Mar 16, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

README ¶

Zerfoo

234.30 tok/s on Gemma 3 1B Q4_K_M -- 18.8% faster than Ollama.

A production-grade ML inference framework written entirely in Go. Pure Go with zero CGo -- GPU acceleration (CUDA, ROCm, OpenCL) is loaded dynamically at runtime via purego/dlopen. Import it as a library and run inference directly from your Go application, or use the CLI and OpenAI-compatible API server.

Install

go install github.com/zerfoo/zerfoo/cmd/zerfoo@latest

Quickstart: CLI

Pull a model and run inference:

zerfoo pull gemma-3-1b-q4
zerfoo run gemma-3-1b-q4 "The quick brown fox"

Or start the API server:

zerfoo serve gemma-3-1b-q4 --port 8080

Quickstart: Library

Import github.com/zerfoo/zerfoo/inference to load models and generate text from your own Go code:

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/zerfoo/zerfoo/inference"
)

func main() {
	// Load a GGUF model by alias or HuggingFace repo ID.
	// Pulls from HuggingFace automatically if not cached.
	mdl, err := inference.Load("gemma-3-1b-q4")
	if err != nil {
		log.Fatal(err)
	}
	defer mdl.Close()

	// Generate text from a prompt.
	result, err := mdl.Generate(context.Background(), "Explain quicksort in one paragraph.",
		inference.WithTemperature(0.7),
		inference.WithMaxTokens(256),
	)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(result)
}

For chat-style multi-turn conversations:

resp, err := mdl.Chat(context.Background(), []inference.Message{
	{Role: "system", Content: "You are a helpful assistant."},
	{Role: "user", Content: "What is the capital of France?"},
},
	inference.WithTemperature(0.7),
	inference.WithMaxTokens(256),
)
if err != nil {
	log.Fatal(err)
}
fmt.Println(resp.Content)

To stream tokens as they are generated:

err = mdl.GenerateStream(ctx, "Tell me a joke.",
	generate.TokenStreamFunc(func(token string, done bool) error {
		if !done {
			fmt.Print(token)
		}
		return nil
	}),
	inference.WithMaxTokens(128),
)

To load a local GGUF file directly (skip the registry):

mdl, err := inference.LoadFile("/path/to/model.gguf",
	inference.WithDevice("cuda"),
	inference.WithDType("fp16"),
	inference.WithKVDtype("fp16"),
)

Supported Models

Model	Format	Status
Gemma 3	GGUF Q4_K	Production (CUDA graph, highest throughput)
Llama 3	GGUF	Working
Qwen 2.5	GGUF	Working
Mistral 7B	GGUF	Working
Phi-3/4	GGUF	Working
SigLIP	GGUF	Vision encoder (parity tested)
Kimi-VL	GGUF	Vision-language (parity tested)

Links

Getting Started -- full walkthrough: install, pull a model, run inference via CLI and library
GPU Setup -- configure CUDA, ROCm, or OpenCL for hardware-accelerated inference
Benchmarks -- throughput numbers across models and hardware
Design -- architecture overview and key design decisions
Blog -- development updates and deep dives
CONTRIBUTING.md -- how to contribute

License

Apache 2.0

Documentation ¶

Overview ¶

Package zerfoo provides the core building blocks for creating and training neural networks. It offers a prelude of commonly used types to simplify development and enhance readability of model construction code.

Index ¶

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]
func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]
func NewDefaultTrainer[T tensor.Numeric](g *graph.Graph[T], lossNode graph.Node[T], opt optimizer.Optimizer[T], ...) *training.DefaultTrainer[T]
func NewFloat32Ops() numeric.Arithmetic[float32]
func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]
func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]
func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, ...) (*normalization.RMSNorm[T], error)
func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)
func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])
func UnregisterLayer(opType string)
type Batch
type Engine
type Graph
type LayerBuilder
type Node
type Numeric
type Parameter
type Tensor

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewAdamW ¶

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]

NewAdamW creates a new AdamW optimizer.

func NewCPUEngine ¶

func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]

NewCPUEngine creates a new CPU engine for the given numeric type.

func NewDefaultTrainer ¶

func NewDefaultTrainer[T tensor.Numeric](
	g *graph.Graph[T],
	lossNode graph.Node[T],
	opt optimizer.Optimizer[T],
	strategy training.GradientStrategy[T],
) *training.DefaultTrainer[T]

NewDefaultTrainer creates a new default trainer.

func NewFloat32Ops ¶

func NewFloat32Ops() numeric.Arithmetic[float32]

NewFloat32Ops returns the float32 arithmetic operations.

func NewGraph ¶

func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]

NewGraph creates a new computation graph.

func NewMSE ¶

func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]

NewMSE creates a new Mean Squared Error loss function.

func NewRMSNorm ¶

func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, options ...normalization.RMSNormOption[T]) (*normalization.RMSNorm[T], error)

NewRMSNorm is a factory function for creating RMSNorm layers.

func NewTensor ¶

func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)

NewTensor creates a new tensor with the given shape and data.

func RegisterLayer ¶

func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])

RegisterLayer registers a new layer builder.

func UnregisterLayer ¶

func UnregisterLayer(opType string)

UnregisterLayer unregisters a layer builder.

Types ¶

type Batch ¶

type Batch[T tensor.Numeric] struct {
	Inputs  map[graph.Node[T]]*tensor.TensorNumeric[T]
	Targets *tensor.TensorNumeric[T]
}

Batch represents a training batch.

type Engine ¶

type Engine[T tensor.Numeric] interface {
	compute.Engine[T]
}

Engine represents a computation engine (e.g., CPU).

type Graph ¶

type Graph[T tensor.Numeric] struct {
	*graph.Graph[T]
}

Graph represents a computation graph.

type LayerBuilder ¶

type LayerBuilder[T tensor.Numeric] func(
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	name string,
	params map[string]*graph.Parameter[T],
	attributes map[string]interface{},
) (graph.Node[T], error)

LayerBuilder is a function that builds a layer.

type Node ¶

type Node[T tensor.Numeric] interface {
	graph.Node[T]
}

Node represents a node in the computation graph.

type Numeric ¶

type Numeric tensor.Numeric

Numeric represents a numeric type constraint.

type Parameter ¶

type Parameter[T tensor.Numeric] struct {
	*graph.Parameter[T]
}

Parameter represents a trainable parameter in the model.

type Tensor ¶

type Tensor[T tensor.Numeric] struct {
	*tensor.TensorNumeric[T]
}

Tensor represents a multi-dimensional array.

Source Files ¶

View all Source files

zerfoo.go

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
bench-compare command Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.	Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
bench_tps command bench_tps measures tokens-per-second for a local ZMF model.	bench_tps measures tokens-per-second for a local ZMF model.
cli Package cli provides a generic command-line interface framework for Zerfoo.	Package cli provides a generic command-line interface framework for Zerfoo.
coverage-gate command Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.	Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
debug-infer command
zerfoo command
zerfoo-predict command
zerfoo-tokenize command
config Package config provides file-based configuration loading with validation and environment variable overrides.	Package config provides file-based configuration loading with validation and environment variable overrides.
data
distributed Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.	Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.
coordinator Package coordinator provides a distributed training coordinator.	Package coordinator provides a distributed training coordinator.
pb
examples
api-server command Command api-server demonstrates starting an OpenAI-compatible inference server.	Command api-server demonstrates starting an OpenAI-compatible inference server.
embedding command Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.	Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
inference command Command inference demonstrates loading a GGUF model and generating text.	Command inference demonstrates loading a GGUF model and generating text.
features
generate
health Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.	Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
inference Package inference provides a high-level API for loading models and generating text with minimal boilerplate.	Package inference provides a high-level API for loading models and generating text with minimal boilerplate.
internal
clblast Package clblast provides Go wrappers for the CLBlast BLAS library.	Package clblast provides Go wrappers for the CLBlast BLAS library.
codegen Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.	Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
cublas Package cublas provides low-level purego bindings for the cuBLAS library.	Package cublas provides low-level purego bindings for the cuBLAS library.
cuda Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).	Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
cuda/kernels Package kernels provides Go wrappers for custom CUDA kernels.	Package kernels provides Go wrappers for custom CUDA kernels.
cudnn Package cudnn provides purego bindings for the NVIDIA cuDNN library.	Package cudnn provides purego bindings for the NVIDIA cuDNN library.
gpuapi Package gpuapi defines internal interfaces for GPU runtime operations.	Package gpuapi defines internal interfaces for GPU runtime operations.
hip Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.	Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
hip/kernels Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.	Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
miopen Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.	Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
nccl Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).	Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
opencl Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.	Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
opencl/kernels Package kernels provides OpenCL kernel source and dispatch for elementwise operations.	Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
rocblas Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.	Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
tensorrt Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).	Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
workerpool Package workerpool provides a persistent pool of goroutines that process submitted tasks.	Package workerpool provides a persistent pool of goroutines that process submitted tasks.
xblas
layers
activations Package activations provides activation function layers.	Package activations provides activation function layers.
attention Package attention provides attention mechanisms for neural networks.	Package attention provides attention mechanisms for neural networks.
components Package components provides reusable components for neural network layers.	Package components provides reusable components for neural network layers.
core Package core provides core neural network layer implementations.	Package core provides core neural network layer implementations.
embeddings Package embeddings provides neural network embedding layers for the Zerfoo ML framework.	Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
features
gather Package gather provides the Gather layer for the Zerfoo ML framework.	Package gather provides the Gather layer for the Zerfoo ML framework.
hrm Package hrm implements the Hierarchical Reasoning Model.	Package hrm implements the Hierarchical Reasoning Model.
normalization Package normalization provides various normalization layers for neural networks.	Package normalization provides various normalization layers for neural networks.
recurrent
reducesum Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.	Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
registry Package registry provides a central registration point for all layer builders.	Package registry provides a central registration point for all layer builders.
regularization Package regularization provides regularization layers for neural networks.	Package regularization provides regularization layers for neural networks.
sequence Package sequence provides sequence modeling layers such as State Space Models.	Package sequence provides sequence modeling layers such as State Space Models.
tokenizers
transformer Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.	Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
transpose Package transpose provides the Transpose layer for the Zerfoo ML framework.	Package transpose provides the Transpose layer for the Zerfoo ML framework.
model Package model provides adapter implementations for bridging existing and new model interfaces.	Package model provides adapter implementations for bridging existing and new model interfaces.
gguf Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.	Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
hrm Package hrm provides experimental Hierarchical Reasoning Model types.	Package hrm provides experimental Hierarchical Reasoning Model types.
registry
serve Package serve provides an OpenAI-compatible HTTP API server for model inference.	Package serve provides an OpenAI-compatible HTTP API server for model inference.
shutdown Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.	Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
tests
internal/testutil
training Package training provides adapter implementations for bridging existing and new interfaces.	Package training provides adapter implementations for bridging existing and new interfaces.
loss Package loss provides various loss functions for neural networks.	Package loss provides various loss functions for neural networks.
optimizer Package optimizer provides various optimization algorithms for neural networks.	Package optimizer provides various optimization algorithms for neural networks.