Documentation
¶
Overview ¶
Package embeddings provides a unified interface for generating embeddings from multiple providers (OpenAI, Google, Ollama, Anthropic/Voyage).
Index ¶
- Constants
- Variables
- func AverageEmbeddings(embeddings [][]float32) []float32
- func ChunkText(text string, maxSize, overlap int) []string
- func GetDefaultModel(provider string) string
- func GetModelDimension(model string) int
- func SplitAndEmbed(embedder Embedder, text string, maxSize, overlap int) ([][]float32, error)
- func SplitAndEmbedAverage(embedder Embedder, text string, maxSize, overlap int) ([]float32, error)
- type AnthropicEmbedder
- type Embedder
- type GoogleEmbedder
- type OllamaEmbedder
- type OpenAIEmbedder
Examples ¶
Constants ¶
const DefaultChunkOverlap = 100
DefaultChunkOverlap is the default overlap between consecutive chunks.
const DefaultChunkSize = 800
DefaultChunkSize is the default maximum size for text chunks in characters.
Variables ¶
var ErrDimensionMismatch = fmt.Errorf("embedding dimension mismatch")
ErrDimensionMismatch is returned when embeddings have unexpected dimensions.
var ErrEmptyAPIKey = fmt.Errorf("API key is required but not provided")
ErrEmptyAPIKey is returned when an API key is required but not provided.
var ErrNoTexts = fmt.Errorf("no texts provided for embedding")
ErrNoTexts is returned when an empty text slice is passed to EmbedDocuments.
var ErrUnsupportedProvider = fmt.Errorf("unsupported embedding provider")
ErrUnsupportedProvider is returned when the provider name is not recognized.
var ProviderDefaultModels = map[string]string{
"openai": "text-embedding-3-small",
"google": "text-embedding-004",
"gemini": "text-embedding-004",
"ollama": "nomic-embed-text",
"anthropic": "voyage-3",
"voyage": "voyage-3",
}
ProviderDefaultModels maps provider names to their recommended default models.
var ProviderDimensions = map[string]int{
"text-embedding-3-small": 1536,
"text-embedding-3-large": 3072,
"text-embedding-ada-002": 1536,
"text-embedding-004": 768,
"nomic-embed-text": 768,
"mxbai-embed-large": 1024,
"all-minilm": 384,
"voyage-3": 1024,
"voyage-3-large": 1536,
"voyage-code-3": 1024,
}
ProviderDimensions maps common models to their embedding dimensions. This is useful for pre-allocating storage or validation.
Functions ¶
func AverageEmbeddings ¶
AverageEmbeddings computes the average of multiple embeddings. This is useful for combining embeddings of multiple chunks into a single document embedding.
Returns nil if embeddings is empty or if embeddings have inconsistent dimensions.
func ChunkText ¶
ChunkText splits text into chunks with sentence-aware splitting. It tries to split on sentence boundaries to preserve semantic coherence.
Parameters:
- text: The text to split into chunks
- maxSize: Maximum size of each chunk in characters
- overlap: Number of characters to overlap between consecutive chunks
Returns a slice of text chunks.
func GetDefaultModel ¶
GetDefaultModel returns the default model for a given provider. Returns empty string if the provider is unknown.
func GetModelDimension ¶
GetModelDimension returns the expected dimension for a given model. Returns 0 if the model is not in the known list (dimension will be auto-detected).
func SplitAndEmbed ¶
SplitAndEmbed is a helper that chunks text and generates embeddings for all chunks. It's a convenience function that combines ChunkText and Embedder.EmbedDocuments.
Parameters:
- embedder: The embedder to use
- text: The text to chunk and embed
- maxSize: Maximum chunk size
- overlap: Chunk overlap
Returns embeddings for all chunks.
func SplitAndEmbedAverage ¶
SplitAndEmbedAverage chunks text, generates embeddings, and averages them. This produces a single embedding representing the entire text.
Returns a single averaged embedding.
Types ¶
type AnthropicEmbedder ¶
type AnthropicEmbedder struct {
// contains filtered or unexported fields
}
AnthropicEmbedder generates embeddings using Voyage AI's API. Anthropic partners with Voyage for embeddings (voyage-3, voyage-3-large, voyage-code-3).
func NewAnthropicEmbedder ¶
func NewAnthropicEmbedder(apiKey, model, baseURL string) (*AnthropicEmbedder, error)
NewAnthropicEmbedder creates a new Anthropic/Voyage embedder. If baseURL is empty, it defaults to "https://api.voyageai.com".
func (*AnthropicEmbedder) Dimension ¶
func (e *AnthropicEmbedder) Dimension() int
Dimension returns the dimensionality of embeddings.
func (*AnthropicEmbedder) EmbedDocuments ¶
func (e *AnthropicEmbedder) EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)
EmbedDocuments generates embeddings for multiple documents. Voyage supports up to 128 inputs per request.
func (*AnthropicEmbedder) EmbedQuery ¶
EmbedQuery generates an embedding for a single query.
type Embedder ¶
type Embedder interface {
// EmbedDocuments generates embeddings for multiple documents.
// Implementations should batch requests according to provider limits.
// Partial failures are allowed: continue processing remaining documents.
EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)
// EmbedQuery generates an embedding for a single query text.
// This is typically the same as EmbedDocuments but may use different
// parameters in some providers (e.g. search_query vs search_document).
EmbedQuery(ctx context.Context, text string) ([]float32, error)
// Dimension returns the dimensionality of the embeddings.
// This is auto-detected on the first successful embedding call.
Dimension() int
}
Embedder is the main interface for generating embeddings from text. All provider implementations must be thread-safe.
func NewEmbedder ¶
NewEmbedder creates an Embedder based on the provider name and configuration. Supported providers: "openai", "google", "gemini", "ollama", "anthropic", "voyage".
Parameters:
- provider: The embedding provider name (case-insensitive)
- model: The model name/ID to use
- apiKey: API key (not needed for Ollama)
- baseURL: Base URL for the API (optional, uses defaults if empty)
Returns an Embedder instance or an error if the provider is unsupported.
Example ¶
Example demonstrates basic usage of the embeddings package.
// Create an Ollama embedder (doesn't require API key)
embedder, err := NewEmbedder("ollama", "nomic-embed-text", "", "")
if err != nil {
panic(err)
}
// Generate embeddings (would require actual Ollama instance)
ctx := context.Background()
texts := []string{"Hello world", "This is a test"}
embeddings, err := embedder.EmbedDocuments(ctx, texts)
if err != nil {
// Handle error (in tests we expect this to fail without Ollama running)
return
}
// Use embeddings
_ = embeddings
type GoogleEmbedder ¶
type GoogleEmbedder struct {
// contains filtered or unexported fields
}
GoogleEmbedder generates embeddings using Google's Gemini embedding API. Supports text-embedding-004 and other Gemini embedding models.
func NewGoogleEmbedder ¶
func NewGoogleEmbedder(apiKey, model, baseURL string) (*GoogleEmbedder, error)
NewGoogleEmbedder creates a new Google/Gemini embedder. If baseURL is empty, it defaults to "https://generativelanguage.googleapis.com".
func (*GoogleEmbedder) Dimension ¶
func (e *GoogleEmbedder) Dimension() int
Dimension returns the dimensionality of embeddings.
func (*GoogleEmbedder) EmbedDocuments ¶
EmbedDocuments generates embeddings for multiple documents. Google's batch endpoint supports multiple texts in one request.
func (*GoogleEmbedder) EmbedQuery ¶
EmbedQuery generates an embedding for a single query.
type OllamaEmbedder ¶
type OllamaEmbedder struct {
// contains filtered or unexported fields
}
OllamaEmbedder generates embeddings using a local Ollama instance. Supports models like nomic-embed-text, mxbai-embed-large, etc.
func NewOllamaEmbedder ¶
func NewOllamaEmbedder(model, baseURL string) (*OllamaEmbedder, error)
NewOllamaEmbedder creates a new Ollama embedder. If baseURL is empty, it defaults to "http://localhost:11434".
func (*OllamaEmbedder) Dimension ¶
func (e *OllamaEmbedder) Dimension() int
Dimension returns the dimensionality of embeddings.
func (*OllamaEmbedder) EmbedDocuments ¶
EmbedDocuments generates embeddings for multiple documents. Ollama doesn't have a batch endpoint, so we make individual calls.
func (*OllamaEmbedder) EmbedQuery ¶
EmbedQuery generates an embedding for a single query.
type OpenAIEmbedder ¶
type OpenAIEmbedder struct {
// contains filtered or unexported fields
}
OpenAIEmbedder generates embeddings using OpenAI's embedding API. Supports text-embedding-3-small, text-embedding-3-large, and ada-002.
func NewOpenAIEmbedder ¶
func NewOpenAIEmbedder(apiKey, model, baseURL string) (*OpenAIEmbedder, error)
NewOpenAIEmbedder creates a new OpenAI embedder. If baseURL is empty, it defaults to "https://api.openai.com".
func (*OpenAIEmbedder) Dimension ¶
func (e *OpenAIEmbedder) Dimension() int
Dimension returns the dimensionality of embeddings.
func (*OpenAIEmbedder) EmbedDocuments ¶
EmbedDocuments generates embeddings for multiple documents. OpenAI supports up to 2048 inputs per request, so we batch if needed.
func (*OpenAIEmbedder) EmbedQuery ¶
EmbedQuery generates an embedding for a single query.