examples/

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

Links

Open Source Insights

README ¶

Zerfoo Examples

These examples demonstrate Zerfoo's core value: embeddable ML inference in pure Go. Each example is a standalone program you can build and run with go build.

Prerequisites

Go 1.25 or later -- Download Go
A GGUF model file -- download one from HuggingFace. For a quick start, pull Gemma 3 1B Q4:

zerfoo pull google/gemma-3-1b-it-qat-q4_0-gguf

Or download directly:

# The model file will be cached in ~/.cache/zerfoo/
zerfoo pull gemma-3-1b-q4

CUDA toolkit (optional) -- only needed for GPU acceleration. All examples work on CPU out of the box.

Available Examples

Example	Description	Prerequisites
`inference/`	Load a GGUF model and generate text from a prompt. Demonstrates the core `inference.LoadFile` and `model.Generate` API with sampling options (temperature, top-K, top-P) and token streaming.	GGUF model file
`chat/`	Interactive chatbot CLI. Demonstrates the `zerfoo.Load` and `model.Chat` one-line API with a readline loop.	GGUF model file
`embedding/`	Embed inference inside a custom Go HTTP handler. Demonstrates the pattern of loading a model once at startup and serving many concurrent requests through your own routing and request/response types.	GGUF model file
`api-server/`	Start an OpenAI-compatible HTTP server backed by a GGUF model. Demonstrates `serve.NewServer` with graceful shutdown. Drop-in replacement for any OpenAI client.	GGUF model file
`json-output/`	Grammar-guided decoding that constrains model output to valid JSON matching a predefined schema. Useful for structured data extraction and tool-calling pipelines.	GGUF model file
`rag/`	Retrieval-augmented generation pattern: embed a document corpus, retrieve the most relevant documents via cosine similarity, and generate answers grounded in those facts using `model.Embed` and `model.Chat`.	GGUF model file

Running an Example

# Build and run the inference example
go build -o inference ./examples/inference/
./inference path/to/model.gguf "What is the capital of France?"

# With GPU acceleration (automatic if CUDA is available)
./inference --device cuda path/to/model.gguf "What is the capital of France?"

Directories ¶

Path	Synopsis
api-server Command api-server demonstrates starting an OpenAI-compatible inference server.	Command api-server demonstrates starting an OpenAI-compatible inference server.
chat Command chat demonstrates a simple interactive chatbot using the zerfoo one-line API.	Command chat demonstrates a simple interactive chatbot using the zerfoo one-line API.
embedding Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.	Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.
inference Command inference demonstrates loading a GGUF model and generating text.	Command inference demonstrates loading a GGUF model and generating text.
json-output Command json-output demonstrates grammar-guided decoding with a JSON schema.	Command json-output demonstrates grammar-guided decoding with a JSON schema.
rag Command rag demonstrates retrieval-augmented generation using Zerfoo.	Command rag demonstrates retrieval-augmented generation using Zerfoo.
streaming Command streaming demonstrates streaming chat generation using the zerfoo API.	Command streaming demonstrates streaming chat generation using the zerfoo API.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Zerfoo Examples

Prerequisites

Available Examples

Running an Example

Further Reading

Directories ¶