zerfoo

package module
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

README

Zerfoo

234.30 tok/s on Gemma 3 1B Q4_K_M -- 18.8% faster than Ollama.

A production-grade ML inference framework written entirely in Go. Pure Go with zero CGo -- GPU acceleration (CUDA, ROCm, OpenCL) is loaded dynamically at runtime via purego/dlopen. Import it as a library and run inference directly from your Go application, or use the CLI and OpenAI-compatible API server.

Install

go install github.com/zerfoo/zerfoo/cmd/zerfoo@latest

Quickstart: CLI

Pull a model and run inference:

zerfoo pull gemma-3-1b-q4
zerfoo run gemma-3-1b-q4 "The quick brown fox"

Or start the API server:

zerfoo serve gemma-3-1b-q4 --port 8080

Quickstart: Library

Import github.com/zerfoo/zerfoo/inference to load models and generate text from your own Go code:

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/zerfoo/zerfoo/inference"
)

func main() {
	// Load a GGUF model by alias or HuggingFace repo ID.
	// Pulls from HuggingFace automatically if not cached.
	mdl, err := inference.Load("gemma-3-1b-q4")
	if err != nil {
		log.Fatal(err)
	}
	defer mdl.Close()

	// Generate text from a prompt.
	result, err := mdl.Generate(context.Background(), "Explain quicksort in one paragraph.",
		inference.WithTemperature(0.7),
		inference.WithMaxTokens(256),
	)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(result)
}

For chat-style multi-turn conversations:

resp, err := mdl.Chat(context.Background(), []inference.Message{
	{Role: "system", Content: "You are a helpful assistant."},
	{Role: "user", Content: "What is the capital of France?"},
},
	inference.WithTemperature(0.7),
	inference.WithMaxTokens(256),
)
if err != nil {
	log.Fatal(err)
}
fmt.Println(resp.Content)

To stream tokens as they are generated:

err = mdl.GenerateStream(ctx, "Tell me a joke.",
	generate.TokenStreamFunc(func(token string, done bool) error {
		if !done {
			fmt.Print(token)
		}
		return nil
	}),
	inference.WithMaxTokens(128),
)

To load a local GGUF file directly (skip the registry):

mdl, err := inference.LoadFile("/path/to/model.gguf",
	inference.WithDevice("cuda"),
	inference.WithDType("fp16"),
	inference.WithKVDtype("fp16"),
)

Supported Models

Model Format Status
Gemma 3 GGUF Q4_K Production (CUDA graph, highest throughput)
Llama 3 GGUF Working
Qwen 2.5 GGUF Working
Mistral 7B GGUF Working
Phi-3/4 GGUF Working
SigLIP GGUF Vision encoder (parity tested)
Kimi-VL GGUF Vision-language (parity tested)
  • Getting Started -- full walkthrough: install, pull a model, run inference via CLI and library
  • GPU Setup -- configure CUDA, ROCm, or OpenCL for hardware-accelerated inference
  • Benchmarks -- throughput numbers across models and hardware
  • Design -- architecture overview and key design decisions
  • Blog -- development updates and deep dives
  • CONTRIBUTING.md -- how to contribute

License

Apache 2.0

Documentation

Overview

Package zerfoo provides the core building blocks for creating and training neural networks. It offers a prelude of commonly used types to simplify development and enhance readability of model construction code.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewAdamW

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]

NewAdamW creates a new AdamW optimizer.

func NewCPUEngine

func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]

NewCPUEngine creates a new CPU engine for the given numeric type.

func NewDefaultTrainer

func NewDefaultTrainer[T tensor.Numeric](
	g *graph.Graph[T],
	lossNode graph.Node[T],
	opt optimizer.Optimizer[T],
	strategy training.GradientStrategy[T],
) *training.DefaultTrainer[T]

NewDefaultTrainer creates a new default trainer.

func NewFloat32Ops

func NewFloat32Ops() numeric.Arithmetic[float32]

NewFloat32Ops returns the float32 arithmetic operations.

func NewGraph

func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]

NewGraph creates a new computation graph.

func NewMSE

func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]

NewMSE creates a new Mean Squared Error loss function.

func NewRMSNorm

func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, options ...normalization.RMSNormOption[T]) (*normalization.RMSNorm[T], error)

NewRMSNorm is a factory function for creating RMSNorm layers.

func NewTensor

func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)

NewTensor creates a new tensor with the given shape and data.

func RegisterLayer

func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])

RegisterLayer registers a new layer builder.

func UnregisterLayer

func UnregisterLayer(opType string)

UnregisterLayer unregisters a layer builder.

Types

type Batch

type Batch[T tensor.Numeric] struct {
	Inputs  map[graph.Node[T]]*tensor.TensorNumeric[T]
	Targets *tensor.TensorNumeric[T]
}

Batch represents a training batch.

type Engine

type Engine[T tensor.Numeric] interface {
	compute.Engine[T]
}

Engine represents a computation engine (e.g., CPU).

type Graph

type Graph[T tensor.Numeric] struct {
	*graph.Graph[T]
}

Graph represents a computation graph.

type LayerBuilder

type LayerBuilder[T tensor.Numeric] func(
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	name string,
	params map[string]*graph.Parameter[T],
	attributes map[string]interface{},
) (graph.Node[T], error)

LayerBuilder is a function that builds a layer.

type Node

type Node[T tensor.Numeric] interface {
	graph.Node[T]
}

Node represents a node in the computation graph.

type Numeric

type Numeric tensor.Numeric

Numeric represents a numeric type constraint.

type Parameter

type Parameter[T tensor.Numeric] struct {
	*graph.Parameter[T]
}

Parameter represents a trainable parameter in the model.

type Tensor

type Tensor[T tensor.Numeric] struct {
	*tensor.TensorNumeric[T]
}

Tensor represents a multi-dimensional array.

Directories

Path Synopsis
cmd
bench-compare command
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
bench_tps command
bench_tps measures tokens-per-second for a local ZMF model.
bench_tps measures tokens-per-second for a local ZMF model.
cli
Package cli provides a generic command-line interface framework for Zerfoo.
Package cli provides a generic command-line interface framework for Zerfoo.
coverage-gate command
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
debug-infer command
zerfoo command
zerfoo-predict command
zerfoo-tokenize command
Package config provides file-based configuration loading with validation and environment variable overrides.
Package config provides file-based configuration loading with validation and environment variable overrides.
Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.
Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.
coordinator
Package coordinator provides a distributed training coordinator.
Package coordinator provides a distributed training coordinator.
pb
examples
api-server command
Command api-server demonstrates starting an OpenAI-compatible inference server.
Command api-server demonstrates starting an OpenAI-compatible inference server.
embedding command
Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
inference command
Command inference demonstrates loading a GGUF model and generating text.
Command inference demonstrates loading a GGUF model and generating text.
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
Package inference provides a high-level API for loading models and generating text with minimal boilerplate.
Package inference provides a high-level API for loading models and generating text with minimal boilerplate.
internal
clblast
Package clblast provides Go wrappers for the CLBlast BLAS library.
Package clblast provides Go wrappers for the CLBlast BLAS library.
codegen
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
cublas
Package cublas provides low-level purego bindings for the cuBLAS library.
Package cublas provides low-level purego bindings for the cuBLAS library.
cuda
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
cuda/kernels
Package kernels provides Go wrappers for custom CUDA kernels.
Package kernels provides Go wrappers for custom CUDA kernels.
cudnn
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
gpuapi
Package gpuapi defines internal interfaces for GPU runtime operations.
Package gpuapi defines internal interfaces for GPU runtime operations.
hip
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
hip/kernels
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
miopen
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
nccl
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
opencl
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
opencl/kernels
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
rocblas
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
tensorrt
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
workerpool
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
layers
activations
Package activations provides activation function layers.
Package activations provides activation function layers.
attention
Package attention provides attention mechanisms for neural networks.
Package attention provides attention mechanisms for neural networks.
components
Package components provides reusable components for neural network layers.
Package components provides reusable components for neural network layers.
core
Package core provides core neural network layer implementations.
Package core provides core neural network layer implementations.
embeddings
Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
gather
Package gather provides the Gather layer for the Zerfoo ML framework.
Package gather provides the Gather layer for the Zerfoo ML framework.
hrm
Package hrm implements the Hierarchical Reasoning Model.
Package hrm implements the Hierarchical Reasoning Model.
normalization
Package normalization provides various normalization layers for neural networks.
Package normalization provides various normalization layers for neural networks.
reducesum
Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
registry
Package registry provides a central registration point for all layer builders.
Package registry provides a central registration point for all layer builders.
regularization
Package regularization provides regularization layers for neural networks.
Package regularization provides regularization layers for neural networks.
sequence
Package sequence provides sequence modeling layers such as State Space Models.
Package sequence provides sequence modeling layers such as State Space Models.
transformer
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
transpose
Package transpose provides the Transpose layer for the Zerfoo ML framework.
Package transpose provides the Transpose layer for the Zerfoo ML framework.
Package model provides adapter implementations for bridging existing and new model interfaces.
Package model provides adapter implementations for bridging existing and new model interfaces.
gguf
Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
hrm
Package hrm provides experimental Hierarchical Reasoning Model types.
Package hrm provides experimental Hierarchical Reasoning Model types.
Package serve provides an OpenAI-compatible HTTP API server for model inference.
Package serve provides an OpenAI-compatible HTTP API server for model inference.
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
tests
Package training provides adapter implementations for bridging existing and new interfaces.
Package training provides adapter implementations for bridging existing and new interfaces.
loss
Package loss provides various loss functions for neural networks.
Package loss provides various loss functions for neural networks.
optimizer
Package optimizer provides various optimization algorithms for neural networks.
Package optimizer provides various optimization algorithms for neural networks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL