examples/

directory
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: Apache-2.0

README

Zerfoo Examples

These examples demonstrate Zerfoo's core value: embeddable ML inference in pure Go. Each example is a standalone program you can build and run with go build.

Prerequisites

  • Go 1.25 or later -- Download Go
  • A GGUF model file -- download one from HuggingFace. For a quick start, pull Gemma 3 1B Q4:
zerfoo pull google/gemma-3-1b-it-qat-q4_0-gguf

Or download directly:

# The model file will be cached in ~/.cache/zerfoo/
zerfoo pull gemma-3-1b-q4
  • CUDA toolkit (optional) -- only needed for GPU acceleration. All examples work on CPU out of the box.

Available Examples

Example Description
inference/ Load a GGUF model and generate text from a prompt. Demonstrates the core inference.LoadFile and model.Generate API with sampling options (temperature, top-K, top-P) and token streaming.
api-server/ Start an OpenAI-compatible HTTP server backed by a GGUF model. Demonstrates serve.NewServer with graceful shutdown. Drop-in replacement for any OpenAI client.
embedding/ Embed inference inside a custom Go HTTP handler. Demonstrates the pattern of loading a model once at startup and serving many concurrent requests through your own routing and request/response types.

Running an Example

# Build and run the inference example
go build -o inference ./examples/inference/
./inference path/to/model.gguf "What is the capital of France?"

# With GPU acceleration (automatic if CUDA is available)
./inference --device cuda path/to/model.gguf "What is the capital of France?"

Further Reading

See docs/getting-started.md for a full tutorial covering CLI usage, library API, and the OpenAI-compatible server.

Directories

Path Synopsis
Command api-server demonstrates starting an OpenAI-compatible inference server.
Command api-server demonstrates starting an OpenAI-compatible inference server.
Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
Command inference demonstrates loading a GGUF model and generating text.
Command inference demonstrates loading a GGUF model and generating text.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL