embedding

command

v1.22.0 Latest Latest Go to latest Published: Mar 26, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

README ¶

Embedding Example

Embed Zerfoo inference directly inside a Go HTTP handler. This pattern is useful when you want to add ML inference to an existing Go service without running a separate server process.

Prerequisites

Go 1.25+
A GGUF model file (e.g., Gemma 3 1B or Llama 3.2 1B)

Downloading a test model

pip install huggingface-hub

huggingface-cli download google/gemma-3-1b-it-qat-q4_0-gguf \
  --local-dir ./models

Build

go build -o embedding ./examples/embedding/

Run

./embedding ./models/gemma-3-1b-it-qat-q4_0.gguf

With a custom port and GPU:

./embedding -port 9090 -device cuda ./models/gemma-3-1b-it-qat-q4_0.gguf

Testing with curl

Generate text

curl -s http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain what a tensor is in one sentence.",
    "max_tokens": 128,
    "temperature": 0.7
  }' | jq .

Health check

curl http://localhost:8080/health

How it works

The model is loaded once at startup. Each incoming HTTP request calls model.Generate() with the provided prompt and options. This is the simplest way to add inference to any Go application — just import github.com/zerfoo/zerfoo/inference and call LoadFile / Generate.

Documentation ¶

Overview ¶

Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.

Usage:

go build -o embedding ./examples/embedding/
./embedding path/to/model.gguf

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL