embedding

command
v1.22.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 26, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

README

Embedding Example

Embed Zerfoo inference directly inside a Go HTTP handler. This pattern is useful when you want to add ML inference to an existing Go service without running a separate server process.

Prerequisites

  • Go 1.25+
  • A GGUF model file (e.g., Gemma 3 1B or Llama 3.2 1B)
Downloading a test model
pip install huggingface-hub

huggingface-cli download google/gemma-3-1b-it-qat-q4_0-gguf \
  --local-dir ./models

Build

go build -o embedding ./examples/embedding/

Run

./embedding ./models/gemma-3-1b-it-qat-q4_0.gguf

With a custom port and GPU:

./embedding -port 9090 -device cuda ./models/gemma-3-1b-it-qat-q4_0.gguf

Testing with curl

Generate text
curl -s http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain what a tensor is in one sentence.",
    "max_tokens": 128,
    "temperature": 0.7
  }' | jq .
Health check
curl http://localhost:8080/health

How it works

The model is loaded once at startup. Each incoming HTTP request calls model.Generate() with the provided prompt and options. This is the simplest way to add inference to any Go application — just import github.com/zerfoo/zerfoo/inference and call LoadFile / Generate.

Documentation

Overview

Command embedding demonstrates embedding Zerfoo inference inside a Go HTTP handler.

Usage:

go build -o embedding ./examples/embedding/
./embedding path/to/model.gguf

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL