text-embedding

command
v1.31.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 28, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

README

Text Embedding Generation

Demonstrates extracting text embedding vectors from a GGUF model using the inference package and computing semantic similarity between texts.

Prerequisites

  • Go 1.25+
  • A GGUF model file (any supported architecture: Llama, Gemma, Mistral, Qwen, Phi)
Downloading a test model
pip install huggingface-hub

huggingface-cli download google/gemma-3-1b-it-qat-q4_0-gguf \
  --local-dir ./models

Build

go build -o text-embedding ./examples/text-embedding/

Run

./text-embedding ./models/gemma-3-1b-it-qat-q4_0.gguf

With GPU acceleration:

./text-embedding -device cuda ./models/gemma-3-1b-it-qat-q4_0.gguf

Expected output

=== Text Embedding Example ===

Loading model: ./models/gemma-3-1b-it-qat-q4_0.gguf
Device: cpu

Model: gemma (18 layers, hidden=1536, vocab=262144)

--- Generating Embeddings ---
Text 0: "The cat sat on the mat." -> 1536-dim embedding (first 5: [...])
Text 1: "A kitten rested on the rug." -> 1536-dim embedding (first 5: [...])
Text 2: "Stock prices rose sharply today." -> 1536-dim embedding (first 5: [...])
Text 3: "The financial markets surged in early trading." -> 1536-dim embedding (first 5: [...])

--- Cosine Similarity Matrix ---
                           Text 0  Text 1  Text 2  Text 3
  Text 0 (The cat sat on th...)  1.0000  0.8xxx  0.3xxx  0.2xxx
  Text 1 (A kitten rested o...)  0.8xxx  1.0000  0.3xxx  0.2xxx
  Text 2 (Stock prices rose...)  0.3xxx  0.3xxx  1.0000  0.8xxx
  Text 3 (The financial mar...)  0.2xxx  0.2xxx  0.8xxx  1.0000

--- Interpretation ---
Texts 0-1 (cats) should have high similarity.
Texts 2-3 (finance) should have high similarity.
Cross-topic pairs should have lower similarity.

=== Done ===

How it works

  1. Load -- inference.LoadFile loads the GGUF model file into memory (with optional mmap)
  2. Embed -- model.Embed(text) tokenizes the input, looks up token embeddings from the model's embedding table, mean-pools across tokens, and L2-normalizes the result
  3. Compare -- cosine similarity between L2-normalized vectors is just their dot product

The embedding dimension matches the model's hidden size (e.g., 1536 for Gemma 3 1B).

Key APIs

Function/Type Package Purpose
inference.LoadFile inference/ Load a GGUF model with device/dtype options
model.Embed(text) inference/ Generate L2-normalized embedding vector
model.EmbeddingWeights() inference/ Access raw embedding table and hidden size
model.Config() inference/ Model metadata (architecture, layers, vocab)

Use cases

  • Semantic search: embed queries and documents, find nearest neighbors by cosine similarity
  • Clustering: group similar texts by embedding distance
  • RAG: embed chunks of a knowledge base for retrieval-augmented generation
  • Deduplication: detect near-duplicate texts by high cosine similarity

Documentation

Overview

Command text-embedding demonstrates extracting text embedding vectors from a loaded GGUF model using the inference package.

This example loads a model, generates embeddings for several texts, and computes cosine similarity between them. Useful for semantic search, clustering, and retrieval-augmented generation (RAG).

Usage:

go build -o text-embedding ./examples/text-embedding/
./text-embedding path/to/model.gguf

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL