embedder

package
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 14, 2025 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package embedder contains Embedder interface and different providers including openai, voyageai, coheren, gemini and huggingface, etc.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EmbedChunk

func EmbedChunk(ctx context.Context, embedder Embedder, embedding *Embedding, usage *components.LLMUsage) error

EmbedChunk processes text chunk and generates embeddings. It handles the embedding process in sequence, with debug output for monitoring. The function: 1. Allocates space for the results 2. Processes each chunk through the embedder 3. Creates EmbeddedChunk instances with the results 4. Provides progress information via debug output

func EmbedChunks

func EmbedChunks(ctx context.Context, embedder Embedder, chunks []Embedding, usage *components.LLMUsage) error

EmbedChunks processes a slice of text chunks and generates embeddings for each one. It handles the embedding process in sequence, with debug output for monitoring. The function: 1. Allocates space for the results 2. Processes each chunk through the embedder 3. Creates EmbeddedChunk instances with the results 4. Provides progress information via debug output

Returns an error if any chunk fails to embed properly.

Types

type Base64

type Base64 string

Base64 is base64 encoded embedding string.

func (Base64) Decode

func (s Base64) Decode() (*Embedding, error)

Decode decodes base64 encoded string into a slice of floats.

type Chunker

type Chunker interface {
	SplitText(string) []string
	TokenCount(txt string) int
}

type Embedder

type Embedder interface {
	Provider() Provider
	Model() string
	Embed(context.Context, string, *Embedding, *components.LLMUsage) error
	BatchEmbed(ctx context.Context, parts []string, usage *components.LLMUsage) ([]Embedding, error)
	DotProduct(context.Context, *Embedding, *Embedding) (float64, error)
}

type Embedding

type Embedding struct {
	Object    string            `json:"object"`
	Embedding []float64         `json:"embedding"`
	Index     int               `json:"index"`
	Meta      map[string]string `json:"meta,omitempty"`
}

Embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.

func (*Embedding) DotProduct

func (e *Embedding) DotProduct(other *Embedding) (float64, error)

DotProduct calculates the dot product of the embedding vector with another embedding vector. Both vectors must have the same length; otherwise, an ErrVectorLengthMismatch is returned. The method returns the calculated dot product as a float32 value.

func (Embedding) UUID

func (e Embedding) UUID() string

type Option

type Option func(*Options)

Option is a function type for configuring the EmbedderConfig. It follows the functional options pattern for clean and flexible configuration.

func WithModel

func WithModel(model string) Option

func WithProvider

func WithProvider(provider Provider) Option

type Options

type Options struct {
	// contains filtered or unexported fields
}

Options holds the configuration for creating an Embedder instance. It supports multiple embedding providers and their specific options.

func (Options) Model

func (i Options) Model() string

func (Options) Provider

func (i Options) Provider() Provider

type Provider

type Provider = string
const (
	ProviderOpenAI      Provider = "OpenAI"
	ProviderVoyageAI    Provider = "VoyageAI"
	ProviderCohere      Provider = "Cohere"
	ProviderGemini      Provider = "Gemini"
	ProviderHuggingFace Provider = "HuggingFace"
)

Directories

Path Synopsis
Package splitter defines different chunker spliters
Package splitter defines different chunker spliters

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL