embeddings

package
v0.403.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 29, 2026 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package embeddings provides a unified interface for generating embeddings from multiple providers (OpenAI, Google, Ollama, Anthropic/Voyage).

Index

Examples

Constants

View Source
const DefaultChunkOverlap = 100

DefaultChunkOverlap is the default overlap between consecutive chunks.

View Source
const DefaultChunkSize = 800

DefaultChunkSize is the default maximum size for text chunks in characters.

Variables

View Source
var ErrDimensionMismatch = fmt.Errorf("embedding dimension mismatch")

ErrDimensionMismatch is returned when embeddings have unexpected dimensions.

View Source
var ErrEmptyAPIKey = fmt.Errorf("API key is required but not provided")

ErrEmptyAPIKey is returned when an API key is required but not provided.

View Source
var ErrNoTexts = fmt.Errorf("no texts provided for embedding")

ErrNoTexts is returned when an empty text slice is passed to EmbedDocuments.

View Source
var ErrUnsupportedProvider = fmt.Errorf("unsupported embedding provider")

ErrUnsupportedProvider is returned when the provider name is not recognized.

View Source
var ProviderDefaultModels = map[string]string{
	"openai":    "text-embedding-3-small",
	"google":    "text-embedding-004",
	"gemini":    "text-embedding-004",
	"ollama":    "nomic-embed-text",
	"anthropic": "voyage-3",
	"voyage":    "voyage-3",
}

ProviderDefaultModels maps provider names to their recommended default models.

View Source
var ProviderDimensions = map[string]int{

	"text-embedding-3-small": 1536,
	"text-embedding-3-large": 3072,
	"text-embedding-ada-002": 1536,

	"text-embedding-004": 768,

	"nomic-embed-text":  768,
	"mxbai-embed-large": 1024,
	"all-minilm":        384,

	"voyage-3":       1024,
	"voyage-3-large": 1536,
	"voyage-code-3":  1024,
}

ProviderDimensions maps common models to their embedding dimensions. This is useful for pre-allocating storage or validation.

Functions

func AverageEmbeddings

func AverageEmbeddings(embeddings [][]float32) []float32

AverageEmbeddings computes the average of multiple embeddings. This is useful for combining embeddings of multiple chunks into a single document embedding.

Returns nil if embeddings is empty or if embeddings have inconsistent dimensions.

func ChunkText

func ChunkText(text string, maxSize, overlap int) []string

ChunkText splits text into chunks with sentence-aware splitting. It tries to split on sentence boundaries to preserve semantic coherence.

Parameters:

  • text: The text to split into chunks
  • maxSize: Maximum size of each chunk in characters
  • overlap: Number of characters to overlap between consecutive chunks

Returns a slice of text chunks.

func GetDefaultModel

func GetDefaultModel(provider string) string

GetDefaultModel returns the default model for a given provider. Returns empty string if the provider is unknown.

func GetModelDimension

func GetModelDimension(model string) int

GetModelDimension returns the expected dimension for a given model. Returns 0 if the model is not in the known list (dimension will be auto-detected).

func SplitAndEmbed

func SplitAndEmbed(embedder Embedder, text string, maxSize, overlap int) ([][]float32, error)

SplitAndEmbed is a helper that chunks text and generates embeddings for all chunks. It's a convenience function that combines ChunkText and Embedder.EmbedDocuments.

Parameters:

  • embedder: The embedder to use
  • text: The text to chunk and embed
  • maxSize: Maximum chunk size
  • overlap: Chunk overlap

Returns embeddings for all chunks.

func SplitAndEmbedAverage

func SplitAndEmbedAverage(embedder Embedder, text string, maxSize, overlap int) ([]float32, error)

SplitAndEmbedAverage chunks text, generates embeddings, and averages them. This produces a single embedding representing the entire text.

Returns a single averaged embedding.

Types

type AnthropicEmbedder

type AnthropicEmbedder struct {
	// contains filtered or unexported fields
}

AnthropicEmbedder generates embeddings using Voyage AI's API. Anthropic partners with Voyage for embeddings (voyage-3, voyage-3-large, voyage-code-3).

func NewAnthropicEmbedder

func NewAnthropicEmbedder(apiKey, model, baseURL string) (*AnthropicEmbedder, error)

NewAnthropicEmbedder creates a new Anthropic/Voyage embedder. If baseURL is empty, it defaults to "https://api.voyageai.com".

func (*AnthropicEmbedder) Dimension

func (e *AnthropicEmbedder) Dimension() int

Dimension returns the dimensionality of embeddings.

func (*AnthropicEmbedder) EmbedDocuments

func (e *AnthropicEmbedder) EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)

EmbedDocuments generates embeddings for multiple documents. Voyage supports up to 128 inputs per request.

func (*AnthropicEmbedder) EmbedQuery

func (e *AnthropicEmbedder) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery generates an embedding for a single query.

type Embedder

type Embedder interface {
	// EmbedDocuments generates embeddings for multiple documents.
	// Implementations should batch requests according to provider limits.
	// Partial failures are allowed: continue processing remaining documents.
	EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)

	// EmbedQuery generates an embedding for a single query text.
	// This is typically the same as EmbedDocuments but may use different
	// parameters in some providers (e.g. search_query vs search_document).
	EmbedQuery(ctx context.Context, text string) ([]float32, error)

	// Dimension returns the dimensionality of the embeddings.
	// This is auto-detected on the first successful embedding call.
	Dimension() int
}

Embedder is the main interface for generating embeddings from text. All provider implementations must be thread-safe.

func NewEmbedder

func NewEmbedder(provider, model, apiKey, baseURL string) (Embedder, error)

NewEmbedder creates an Embedder based on the provider name and configuration. Supported providers: "openai", "google", "gemini", "ollama", "anthropic", "voyage".

Parameters:

  • provider: The embedding provider name (case-insensitive)
  • model: The model name/ID to use
  • apiKey: API key (not needed for Ollama)
  • baseURL: Base URL for the API (optional, uses defaults if empty)

Returns an Embedder instance or an error if the provider is unsupported.

Example

Example demonstrates basic usage of the embeddings package.

// Create an Ollama embedder (doesn't require API key)
embedder, err := NewEmbedder("ollama", "nomic-embed-text", "", "")
if err != nil {
	panic(err)
}

// Generate embeddings (would require actual Ollama instance)
ctx := context.Background()
texts := []string{"Hello world", "This is a test"}
embeddings, err := embedder.EmbedDocuments(ctx, texts)
if err != nil {
	// Handle error (in tests we expect this to fail without Ollama running)
	return
}

// Use embeddings
_ = embeddings

type GoogleEmbedder

type GoogleEmbedder struct {
	// contains filtered or unexported fields
}

GoogleEmbedder generates embeddings using Google's Gemini embedding API. Supports text-embedding-004 and other Gemini embedding models.

func NewGoogleEmbedder

func NewGoogleEmbedder(apiKey, model, baseURL string) (*GoogleEmbedder, error)

NewGoogleEmbedder creates a new Google/Gemini embedder. If baseURL is empty, it defaults to "https://generativelanguage.googleapis.com".

func (*GoogleEmbedder) Dimension

func (e *GoogleEmbedder) Dimension() int

Dimension returns the dimensionality of embeddings.

func (*GoogleEmbedder) EmbedDocuments

func (e *GoogleEmbedder) EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)

EmbedDocuments generates embeddings for multiple documents. Google's batch endpoint supports multiple texts in one request.

func (*GoogleEmbedder) EmbedQuery

func (e *GoogleEmbedder) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery generates an embedding for a single query.

type OllamaEmbedder

type OllamaEmbedder struct {
	// contains filtered or unexported fields
}

OllamaEmbedder generates embeddings using a local Ollama instance. Supports models like nomic-embed-text, mxbai-embed-large, etc.

func NewOllamaEmbedder

func NewOllamaEmbedder(model, baseURL string) (*OllamaEmbedder, error)

NewOllamaEmbedder creates a new Ollama embedder. If baseURL is empty, it defaults to "http://localhost:11434".

func (*OllamaEmbedder) Dimension

func (e *OllamaEmbedder) Dimension() int

Dimension returns the dimensionality of embeddings.

func (*OllamaEmbedder) EmbedDocuments

func (e *OllamaEmbedder) EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)

EmbedDocuments generates embeddings for multiple documents. Ollama doesn't have a batch endpoint, so we make individual calls.

func (*OllamaEmbedder) EmbedQuery

func (e *OllamaEmbedder) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery generates an embedding for a single query.

type OpenAIEmbedder

type OpenAIEmbedder struct {
	// contains filtered or unexported fields
}

OpenAIEmbedder generates embeddings using OpenAI's embedding API. Supports text-embedding-3-small, text-embedding-3-large, and ada-002.

func NewOpenAIEmbedder

func NewOpenAIEmbedder(apiKey, model, baseURL string) (*OpenAIEmbedder, error)

NewOpenAIEmbedder creates a new OpenAI embedder. If baseURL is empty, it defaults to "https://api.openai.com".

func (*OpenAIEmbedder) Dimension

func (e *OpenAIEmbedder) Dimension() int

Dimension returns the dimensionality of embeddings.

func (*OpenAIEmbedder) EmbedDocuments

func (e *OpenAIEmbedder) EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)

EmbedDocuments generates embeddings for multiple documents. OpenAI supports up to 2048 inputs per request, so we batch if needed.

func (*OpenAIEmbedder) EmbedQuery

func (e *OpenAIEmbedder) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery generates an embedding for a single query.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL