text-embedding

command

v1.38.3 Latest Latest Go to latest Published: Mar 31, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

README ¶

Text Embedding Generation

Demonstrates extracting text embedding vectors from a GGUF model using the inference package and computing semantic similarity between texts.

Prerequisites

Go 1.25+
A GGUF model file (any supported architecture: Llama, Gemma, Mistral, Qwen, Phi)

Downloading a test model

pip install huggingface-hub

huggingface-cli download google/gemma-3-1b-it-qat-q4_0-gguf \
  --local-dir ./models

Build

go build -o text-embedding ./examples/text-embedding/

Run

./text-embedding ./models/gemma-3-1b-it-qat-q4_0.gguf

With GPU acceleration:

./text-embedding -device cuda ./models/gemma-3-1b-it-qat-q4_0.gguf

Expected output

=== Text Embedding Example ===

Loading model: ./models/gemma-3-1b-it-qat-q4_0.gguf
Device: cpu

Model: gemma (18 layers, hidden=1536, vocab=262144)

--- Generating Embeddings ---
Text 0: "The cat sat on the mat." -> 1536-dim embedding (first 5: [...])
Text 1: "A kitten rested on the rug." -> 1536-dim embedding (first 5: [...])
Text 2: "Stock prices rose sharply today." -> 1536-dim embedding (first 5: [...])
Text 3: "The financial markets surged in early trading." -> 1536-dim embedding (first 5: [...])

--- Cosine Similarity Matrix ---
                           Text 0  Text 1  Text 2  Text 3
  Text 0 (The cat sat on th...)  1.0000  0.8xxx  0.3xxx  0.2xxx
  Text 1 (A kitten rested o...)  0.8xxx  1.0000  0.3xxx  0.2xxx
  Text 2 (Stock prices rose...)  0.3xxx  0.3xxx  1.0000  0.8xxx
  Text 3 (The financial mar...)  0.2xxx  0.2xxx  0.8xxx  1.0000

--- Interpretation ---
Texts 0-1 (cats) should have high similarity.
Texts 2-3 (finance) should have high similarity.
Cross-topic pairs should have lower similarity.

=== Done ===

How it works

Load -- inference.LoadFile loads the GGUF model file into memory (with optional mmap)
Embed -- model.Embed(text) tokenizes the input, looks up token embeddings from the model's embedding table, mean-pools across tokens, and L2-normalizes the result
Compare -- cosine similarity between L2-normalized vectors is just their dot product

The embedding dimension matches the model's hidden size (e.g., 1536 for Gemma 3 1B).

Key APIs

Function/Type	Package	Purpose
`inference.LoadFile`	`inference/`	Load a GGUF model with device/dtype options
`model.Embed(text)`	`inference/`	Generate L2-normalized embedding vector
`model.EmbeddingWeights()`	`inference/`	Access raw embedding table and hidden size
`model.Config()`	`inference/`	Model metadata (architecture, layers, vocab)

Use cases

Semantic search: embed queries and documents, find nearest neighbors by cosine similarity
Clustering: group similar texts by embedding distance
RAG: embed chunks of a knowledge base for retrieval-augmented generation
Deduplication: detect near-duplicate texts by high cosine similarity

Documentation ¶

Overview ¶

Command text-embedding demonstrates extracting text embedding vectors from a loaded GGUF model using the inference package.

This example loads a model, generates embeddings for several texts, and computes cosine similarity between them. Useful for semantic search, clustering, and retrieval-augmented generation (RAG).

Usage:

go build -o text-embedding ./examples/text-embedding/
./text-embedding path/to/model.gguf

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL