embed

package
v0.53.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 9, 2026 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package embed provides text embedding backends with a unified interface.

Backends:

  • HTTPEmbedder — OpenAI-compatible /v1/embeddings clients (e.g. the self-hosted embed-server sidecar serving multilingual-e5-large + jina-code-v2 on the internal Docker network).
  • OllamaClient — Ollama /api/embed (batch, ≥ 0.3.6).
  • VoyageClient — Voyage AI hosted /v1/embeddings.
  • ONNX (build tag "cgo") — local ONNX Runtime inference; lives in subpackage github.com/anatolykoptev/go-kit/embed/onnx so callers who don't need cgo never link libonnxruntime / libtokenizers.

All backends share the Embedder interface (Embed / EmbedQuery / Dimension / Close), shared retry/backoff (transient errors, 429, 5xx), and shared Prometheus metrics under the embed_* namespace.

Use New for env-driven backend selection (Type ∈ {http, ollama, voyage, onnx}). Use NewHTTPEmbedder / NewOllamaClient / NewVoyageClient directly for explicit construction.

Multi-model wiring uses Registry, which maps model names (e.g. "multilingual-e5-large", "jina-code-v2") to embedders and falls back to a designated default when the lookup name is empty.

Index

Constants

This section is empty.

Variables

View Source
var ErrCircuitOpen = errors.New("embed: circuit breaker open")

ErrCircuitOpen is returned by callBackendResilient when the circuit breaker is in the Open state and has blocked the call.

View Source
var ErrONNXNotInFactory = errors.New(
	"embed.New: type=\"onnx\" not supported by this factory; " +
		"import github.com/anatolykoptev/go-kit/embed/onnx and call onnx.New",
)

ErrONNXNotInFactory is returned by New when Config.Type == "onnx".

ONNX requires cgo + libonnxruntime + libtokenizers, which is too heavy a dependency for the default factory. Callers that need ONNX should import the subpackage github.com/anatolykoptev/go-kit/embed/onnx and call onnx.New(cfg, logger) directly. memdb-go does this in its server-init wiring; pure-HTTP/Ollama/Voyage callers never link the cgo deps.

View Source
var NoRetry = RetryPolicy{MaxAttempts: 1}

NoRetry is an explicit opt-out from the default retry policy. Pass via WithRetry(embed.NoRetry) to disable all retries.

Functions

func EmbedQueryViaEmbed

func EmbedQueryViaEmbed(ctx context.Context, e Embedder, text string) ([]float32, error)

EmbedQueryViaEmbed is a helper that implements EmbedQuery by delegating to Embed. Use it in embedder implementations that don't need query-specific behaviour.

func L2Normalize

func L2Normalize(vec []float32)

L2Normalize is the exported alias of l2Normalize. The ONNX subpackage imports this from the parent package so mean pooling stays in one place.

Types

type Cache

type Cache interface {
	// Get returns the cached embedding for the given key. ok=false if not cached.
	// Implementations must NOT panic on ctx cancellation; return ok=false instead.
	Get(ctx context.Context, key string) (vector []float32, ok bool)
	// Set stores the embedding for the given key. Idempotent. Implementations
	// may TTL or evict per their policy.
	Set(ctx context.Context, key string, vector []float32)
}

Cache abstracts a (text → vector) lookup table. go-kit/embed ships NO concrete implementation — callers wire LRU/Redis/sync.Map per their runtime. Implementations MUST be safe for concurrent reads and writes.

TTL semantics, eviction policy, and persistence are caller concerns. Cache key invalidation on model/dim/prefix change is automatic (key includes all parameters that affect output vector).

Trade-offs:

  • On partial-miss, ALL N vectors are re-Set after the backend call (not only the missing ones). For Redis-backed caches with non-trivial Set cost, implementations may dedupe internally. In-process LRU/sync.Map caches: noop (Set is O(1) and idempotent).

Future-proofing — these vector-affecting fields are NOT YET in cacheKey because they are static or single-valued today; once they become per-call settable, cacheKey will be extended:

  • Voyage input_type ("document" vs "query") — hardcoded "query" today
  • Ollama normalize_l2 toggle — applied unconditionally today

Callers persisting a cache across Client lifecycles SHOULD include their own config-hash prefix on keys to avoid cross-config pollution.

type CircuitBreaker

type CircuitBreaker struct {
	// contains filtered or unexported fields
}

CircuitBreaker is a thread-safe Closed/Open/HalfOpen state machine. Reads use RLock; writes (transitions) use Lock.

func NewCircuitBreaker

func NewCircuitBreaker(cfg CircuitConfig, model string, onTransition func(CircuitState, CircuitState)) *CircuitBreaker

NewCircuitBreaker constructs a CircuitBreaker with the given config and an optional transition callback. The callback is invoked (via safeCall) on every state change; pass nil to skip.

func (*CircuitBreaker) Allow

func (cb *CircuitBreaker) Allow() bool

Allow reports whether the current request may proceed.

  • CircuitClosed: always true.
  • CircuitOpen: false unless OpenDuration elapsed — then transitions to HalfOpen and returns true for up to HalfOpenProbes concurrent requests.
  • CircuitHalfOpen: true only for HalfOpenProbes slots; false afterwards.

func (*CircuitBreaker) MarkFailure

func (cb *CircuitBreaker) MarkFailure()

MarkFailure notifies the breaker that the call failed. Closed: increments consecutive failure counter; trips to Open at FailThreshold. HalfOpen: immediately returns to Open (probe failed).

func (*CircuitBreaker) MarkSuccess

func (cb *CircuitBreaker) MarkSuccess()

MarkSuccess notifies the breaker that the call succeeded. HalfOpen → Closed; resets consecutive failure counter.

func (*CircuitBreaker) State

func (cb *CircuitBreaker) State() CircuitState

State returns the current CircuitState. Safe for concurrent reads.

type CircuitConfig

type CircuitConfig struct {
	// FailThreshold is the number of consecutive failures that trip the
	// circuit from Closed to Open. Default: 5.
	FailThreshold int
	// OpenDuration is how long the circuit stays Open before transitioning
	// to HalfOpen for probe requests. Default: 30s.
	OpenDuration time.Duration
	// HalfOpenProbes is the number of requests allowed through when in
	// HalfOpen state. Default: 1.
	HalfOpenProbes int
	// FailRateWindow is reserved for future fail-rate counting (currently
	// consecutive-failure counting is used). Default: 10s.
	FailRateWindow time.Duration
}

CircuitConfig configures a CircuitBreaker instance.

type CircuitState

type CircuitState uint8

CircuitState represents the state of a circuit breaker. Defined here as a placeholder foundation for E1 which implements the full FSM.

const (
	// CircuitClosed is the normal operating state — calls pass through.
	CircuitClosed CircuitState = iota
	// CircuitOpen means the breaker has tripped — calls are short-circuited.
	CircuitOpen
	// CircuitHalfOpen means the breaker is probing for recovery.
	CircuitHalfOpen
)

func (CircuitState) String

func (s CircuitState) String() string

String returns the human-readable label for the circuit state. Used as a Prometheus label value.

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client wraps an Embedder backend with v2 features: Observer hooks, retry, circuit breaker, and multi-model fallback (E1). Built via NewClient(url, opts...).

Client itself implements Embedder, so it is drop-in replaceable for v1 backends. v1 callers that hold the result as Embedder continue to work unchanged; v2 callers cast to *Client to call EmbedWithResult directly.

func NewClient

func NewClient(url string, opts ...Opt) (*Client, error)

NewClient is the v2 entry point — returns a *Client configured via functional options. v1 callers continue to use New(cfg, logger) which calls the per-backend helpers directly.

url is the backend URL when applicable. For Ollama/HTTP backends, pass the base URL. For Voyage, url is ignored (endpoint is hardcoded by the API). For ONNX, use the embed/onnx subpackage directly.

At least one backend-specific Opt must be applied; otherwise NewClient returns an error from the underlying constructor.

The returned *Client implements Embedder, so it is assignable to an Embedder variable for v1-style callers. Cast to *Client to access EmbedWithResult.

func (*Client) Close

func (c *Client) Close() error

Close satisfies Embedder; closes the inner backend.

func (*Client) Dimension

func (c *Client) Dimension() int

Dimension satisfies Embedder.

func (*Client) Embed

func (c *Client) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed satisfies the Embedder interface. Routes through EmbedWithResult so that ALL configured layers — cache (E3), circuit breaker (E1), fallback, and observer hooks — fire on this path identically to EmbedWithResult.

2026-05-01 fix: prior implementation called callBackendResilient directly, which silently bypassed the cache layer (WithCache was effectively a no-op for callers using the simpler Embed API). Verified empirically — memdb-go wired WithCache via NewHTTPEmbedderWithOpts but its embed_cache_total counter stayed at 0 across a full LoCoMo ingest because every Embed() call took the no-cache path. Routing through EmbedWithResult fixes this without changing the public Embed signature.

func (*Client) EmbedQuery

func (c *Client) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery satisfies Embedder; routes through Embed so single-text query embeddings benefit from the same cache + resilience layers as batch calls. When WithDim was set, the returned vector length is validated against c.expectedDim — a mismatch surfaces as *ErrDimMismatch and bumps embed_dim_mismatch_total{model}.

2026-05-01 fix: was c.inner.EmbedQuery directly, also bypassing cache. Now identical resilience semantics whether you call Embed or EmbedQuery.

func (*Client) EmbedWithResult

func (c *Client) EmbedWithResult(ctx context.Context, texts []string, opts ...EmbedOpt) (*Result, error)

EmbedWithResult is the v2 Embed API — returns a typed Result with Status and fires Observer hooks around the backend call.

Lifecycle:

OnBeforeEmbed → (fallback check) → callBackendResilient → OnAfterEmbed

When chunking is active (len(texts) > chunkSize), `OnBeforeEmbed` / `OnAfterEmbed` fire ONCE PER DISPATCHED CHUNK, not once per user-facing call — observers tracking call count vs token count should reflect this. `embed_chunks_per_call` is recorded once per `EmbedWithResult` call (value=1 for non-chunked, value=N for chunked).

Status semantics:

  • StatusOk — request succeeded, vectors are valid
  • StatusDegraded — request failed, Err is set
  • StatusFallback — primary degraded, secondary succeeded (E1)
  • StatusSkipped — nil inner, empty texts, or DryRun enabled

E1 wires retry/circuit/fallback on top of this call. E2 wires auto-batching, E3 wires cache, E4 wires per-text Status reasoning. E5 wires client-side chunking when len(texts) > c.chunkSize.

func (*Client) Model

func (c *Client) Model() string

Model returns the resolved model name. Satisfies the optional modelGetter interface used by modelFromEmbedder's fallback chain.

type Config

type Config struct {
	Type         string // "http" | "ollama" | "voyage" | "onnx"
	ONNXModelDir string
	VoyageAPIKey string
	Model        string // voyage, ollama, or http model name
	OllamaURL    string
	OllamaDim    int    // 0 = auto-detect from first response
	OllamaPrefix string // client-side document prefix (e.g. "passage: ")
	OllamaQuery  string // client-side query prefix (e.g. "query: ")
	HTTPBaseURL  string // for type="http" — URL of embed-server sidecar
	HTTPDim      int    // dimension override (default 1024)
}

Config holds all embedder configuration in one typed struct. Populated from environment variables by callers.

Type selects the backend:

  • "http" — OpenAI-compatible /v1/embeddings endpoint (HTTPBaseURL).
  • "ollama" — Ollama /api/embed (OllamaURL).
  • "voyage" — Voyage AI hosted /v1/embeddings (VoyageAPIKey).
  • "onnx" — local ONNX Runtime; requires the embed/onnx subpackage factory because it depends on cgo.

Fields not relevant to the chosen Type are ignored.

type EmbedOpt

type EmbedOpt func(*embedCallCfg)

EmbedOpt is a per-call option for EmbedWithResult.

func WithDryRun

func WithDryRun() EmbedOpt

WithDryRun skips the backend call entirely and returns Status=Skipped vectors of zero length. For testing pipeline wiring without a live server.

type Embedder

type Embedder interface {
	// Embed returns embeddings for the given texts (document/storage use case).
	Embed(ctx context.Context, texts []string) ([][]float32, error)
	// EmbedQuery embeds a single query string (search/retrieval use case).
	// Implementations may apply query-specific prefixes or instructions.
	// Default: delegates to Embed.
	EmbedQuery(ctx context.Context, text string) ([]float32, error)
	// Dimension returns the embedding vector dimension.
	Dimension() int
	// Close releases resources (model, tokenizer, HTTP clients).
	Close() error
}

Embedder generates text embeddings.

func New

func New(cfg Config, logger *slog.Logger) (Embedder, error)

New constructs the appropriate Embedder from cfg.

Supported Config.Type values:

Returns an error if the type is unknown or required config is missing. logger=nil falls back to slog.Default() inside each backend constructor.

type ErrDimMismatch

type ErrDimMismatch struct {
	// Got is the length of the vector returned by the backend.
	Got int
	// Want is the dimension declared via WithDim.
	Want int
	// Model is the resolved model name (may be empty for opaque backends).
	Model string
	// Index is the position of the first offending vector in the ORIGINAL
	// (pre-chunking) input slice. Zero when chunking is not in use.
	// When client-side chunking is active, Index equals the chunk's start
	// offset so callers can locate the offending record without iterating
	// all vectors.
	Index int
}

ErrDimMismatch is returned by Client.Embed / Client.EmbedQuery / Client.EmbedWithResult when the backend returns a vector whose length does not match the dimension declared via WithDim.

This guards against silent corruption of downstream pgvector / Qdrant schemas when the backend model is swapped (e.g. via env var change) without a coordinated WithDim update on the consumer side. Without this check, a 1024-dim response would be written to a vector(768) column and only fail at INSERT time — far from the configuration error.

Behaviour:

  • Returned only when WithDim was set to a non-zero value (cfg.dim == 0 disables validation, preserving auto-detection).
  • Embed handlers MUST continue serving — do NOT panic; treat as a normal error and propagate.
  • Each mismatch increments embed_dim_mismatch_total{model} so dashboards can alert on production drift.

func (*ErrDimMismatch) Error

func (e *ErrDimMismatch) Error() string

Error implements the error interface.

type HTTPEmbedder

type HTTPEmbedder struct {
	// contains filtered or unexported fields
}

HTTPEmbedder calls a remote OpenAI-compatible /v1/embeddings endpoint. Designed for the Rust embed-server sidecar on the internal Docker network, but compatible with any provider that speaks the OpenAI shape (Voyage, Mixedbread, Together, vLLM-served encoders, etc.).

func NewHTTPEmbedder

func NewHTTPEmbedder(baseURL, model string, dim int, logger *slog.Logger, opts ...HTTPOption) *HTTPEmbedder

NewHTTPEmbedder creates an HTTPEmbedder pointing at baseURL. baseURL should not include /v1/embeddings — it will be appended automatically. logger=nil falls back to slog.Default().

opts is variadic and backwards-compatible: existing 4-arg callers (e.g. MemDB's memdb-go embedder wrapper) continue to compile unchanged and receive the default 30s timeout.

func (*HTTPEmbedder) Close

func (h *HTTPEmbedder) Close() error

Close is a no-op for the HTTP-based embedder.

func (*HTTPEmbedder) Dimension

func (h *HTTPEmbedder) Dimension() int

Dimension returns the configured embedding dimension.

func (*HTTPEmbedder) Embed

func (h *HTTPEmbedder) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed sends texts to the remote embedding server and returns vectors.

Retries transient failures (timeout, 429, 5xx) with exponential backoff (200ms → 400ms → 800ms, cap 5s, 3 attempts total). Non-retriable errors (4xx validation, unmarshal) fail fast.

func (*HTTPEmbedder) EmbedQuery

func (h *HTTPEmbedder) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery embeds a single query string by delegating to Embed.

type HTTPOption

type HTTPOption func(*HTTPEmbedder)

HTTPOption is a functional option for NewHTTPEmbedder.

Currently used by the factory wiring in [newFromInternal] to forward cfg.timeout (set via WithTimeout on the v2 NewClient). Direct v1 callers can also use it for per-instance customisation without changing the existing 4-arg constructor signature.

func WithHTTPTimeout

func WithHTTPTimeout(d time.Duration) HTTPOption

WithHTTPTimeout overrides the default HTTP client timeout (30s). Pass d=0 to leave the default unchanged.

type Observer

type Observer interface {
	// OnBeforeEmbed fires before the backend call is made.
	// n is the number of texts being embedded.
	//
	// Chunking note (E5): when client-side chunking is active (input length
	// exceeds chunkSize), this fires ONCE PER DISPATCHED CHUNK, not once per
	// user-facing EmbedWithResult call. A 100-text call with chunkSize=32
	// fires 4 OnBeforeEmbed callbacks, each with n equal to that chunk's size
	// (32, 32, 32, 4). Observers tracking call count vs token volume should
	// reflect this. Use `embed_chunks_per_call` histogram to count user-facing
	// calls.
	OnBeforeEmbed(ctx context.Context, model string, n int)
	// OnAfterEmbed fires after the backend call completes (success or error).
	// n is the number of texts in the result.
	//
	// Chunking note (E5): same per-chunk semantics as OnBeforeEmbed.
	OnAfterEmbed(ctx context.Context, status Status, dur time.Duration, n int)
	// OnRetry fires each time a request is retried (E1+).
	OnRetry(ctx context.Context, attempt int, err error)
	// OnCircuitTransition fires when the circuit breaker changes state (E1+).
	OnCircuitTransition(ctx context.Context, from, to CircuitState)
	// OnCacheHit fires when a cache hit short-circuits a backend call (E3+).
	// n is the number of texts whose embeddings were served from cache.
	OnCacheHit(ctx context.Context, n int)
	// OnTruncate fires when a text is truncated before being sent (E4+).
	// textIdx is the index of the truncated text in the input slice.
	OnTruncate(ctx context.Context, textIdx int, beforeTok, afterTok int)
}

Observer receives lifecycle callbacks from the embed client. All methods must be non-blocking. Panics are recovered by safeCall. Implement only the callbacks you care about; embed noopObserver for the rest.

type OllamaClient

type OllamaClient struct {
	// contains filtered or unexported fields
}

OllamaClient calls the Ollama /api/embed endpoint. Supports batch embedding (multiple texts in one request). No CGO, no ONNX Runtime — pure HTTP client. Compatible with Ollama ≥ 0.3.6 which introduced the batch /api/embed endpoint.

func NewOllamaClient

func NewOllamaClient(baseURL, model string, logger *slog.Logger, opts ...OllamaOption) *OllamaClient

NewOllamaClient creates a new Ollama embedding client. baseURL: Ollama server URL (e.g. "http://localhost:11434"), empty = default. model: embedding model name (e.g. "nomic-embed-text", "mxbai-embed-large"), empty = default. logger=nil falls back to slog.Default().

func (*OllamaClient) Close

func (c *OllamaClient) Close() error

Close is a no-op for the HTTP-based Ollama client.

func (*OllamaClient) Dimension

func (c *OllamaClient) Dimension() int

Dimension returns the embedding vector dimension. Returns the auto-detected dimension from the first response if available, otherwise the configured default (1024). Override with WithOllamaDimension.

func (*OllamaClient) Embed

func (c *OllamaClient) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed calls Ollama /api/embed to embed one or more texts (document/storage use case). Applies WithTextPrefix client-side before sending. Returns embeddings in the same order as input texts. Empty input returns nil, nil.

func (*OllamaClient) EmbedQuery

func (c *OllamaClient) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery embeds a single query string (search/retrieval use case). Applies WithQueryPrefix if set, otherwise falls back to WithTextPrefix.

type OllamaOption

type OllamaOption func(*OllamaClient)

OllamaOption is a functional option for OllamaClient.

func WithNormalizeL2

func WithNormalizeL2(enabled bool) OllamaOption

WithNormalizeL2 enables client-side L2 normalization of embeddings. Ollama ≥ 0.3.6 already normalizes server-side, so this is a no-op in most cases. Enable only if using an older Ollama version or a model that does not normalize.

func WithOllamaDimension

func WithOllamaDimension(dim int) OllamaOption

WithOllamaDimension overrides the reported embedding dimension. The default is 1024 to match the existing pgvector/Qdrant schema (vector(1024)). Use this only if deploying a model with a different dimension.

func WithOllamaTimeout

func WithOllamaTimeout(d time.Duration) OllamaOption

WithOllamaTimeout overrides the HTTP client timeout (default 60s). Increase for large batches or slow hardware.

func WithQueryPrefix

func WithQueryPrefix(prefix string) OllamaOption

WithQueryPrefix sets a string prepended client-side to query text in EmbedQuery. Allows different prefixes for storage (Embed) vs retrieval (EmbedQuery).

Example: WithQueryPrefix("query: ") for e5-style retrieval. Default: "" (same as document prefix — no distinction).

func WithTextPrefix

func WithTextPrefix(prefix string) OllamaOption

WithTextPrefix sets a string prepended client-side to every document text before sending to Ollama (used by Embed). Separate from Ollama's server-side Modelfile template.

Example: WithTextPrefix("passage: ") for e5-style document storage. Default: "" (no prefix — raw text, compatible with existing ONNX vectors).

type Opt

type Opt func(*cfgInternal)

Opt is a functional option for NewClient.

func WithBackend

func WithBackend(name string) Opt

WithBackend sets the backend type explicitly. Valid: "http" | "ollama" | "voyage". Mutually exclusive with WithEmbedder — if both are set, WithEmbedder wins.

func WithCache

func WithCache(c Cache) Opt

WithCache wires a Cache. When set, every (model, dim, docPrefix, queryPrefix, text, role) tuple is looked up before backend Embed call. Full-batch hit short-circuits the backend entirely. Partial misses fall through to the backend for the full batch (no cherry-pick; keeps API symmetric across all backends). A nil Cache is ignored (caching stays disabled).

func WithChunkSize added in v0.49.0

func WithChunkSize(n int) Opt

WithChunkSize overrides the per-call chunking limit. When len(texts) exceeds this value, Embed/EmbedWithResult splits the input into sequential sub-batches of at most chunkSize. Default: 32 (matching ox-embed-server EMBED_MAX_INPUT_ARRAY). Override via GOKIT_EMBED_CHUNK_SIZE env when constructing without options. Values <= 0 are ignored (constructor falls back to env then default).

func WithCircuit

func WithCircuit(cfg CircuitConfig) Opt

WithCircuit enables the circuit breaker with the given configuration. By default the circuit breaker is OFF (nil). Wiring the observer for OnCircuitTransition happens in newClientFromInternal after all opts are applied. A sentinel *CircuitBreaker is stored here; the final one (with model+obs hook) is built in newClientFromInternal.

func WithDim

func WithDim(dim int) Opt

WithDim sets the expected embedding dimension. Zero = auto-detect from response. When non-zero, every backend response is validated against this value: a mismatch returns *ErrDimMismatch and increments embed_dim_mismatch_total{model}. The error is non-terminal — fallback chains continue to the next embedder.

func WithEmbedder

func WithEmbedder(e Embedder) Opt

WithEmbedder accepts a pre-built Embedder (e.g. *onnx.Embedder from the embed/onnx subpackage, or a custom impl). NewClient skips backend factory dispatch and wires this Embedder as the inner backend of the returned *Client. Required for ONNX usage via NewClient (avoids forcing cgo on pure-HTTP callers).

ONNX usage:

import "github.com/anatolykoptev/go-kit/embed/onnx"

onnxEmb, _ := onnx.New(onnx.Config{...}, logger)
c, _ := embed.NewClient("", embed.WithEmbedder(onnxEmb))

Note: when WithEmbedder is set, WithBackend is silently ignored. To make the override explicit, set only one of the two.

nil is ignored (backend dispatch proceeds normally).

func WithFallback

func WithFallback(secondary *Client) Opt

WithFallback sets a secondary *Client to try when the primary returns StatusDegraded with a non-4xx error. Fallback depth is capped at 1.

func WithLogger

func WithLogger(l *slog.Logger) Opt

WithLogger sets the slog.Logger. nil-ignored (backends fall back to slog.Default()).

func WithModel

func WithModel(model string) Opt

WithModel sets the backend model name.

func WithObserver

func WithObserver(obs Observer) Opt

WithObserver registers a lifecycle Observer. nil-ignored (noopObserver stays active).

func WithOllamaDim

func WithOllamaDim(dim int) Opt

WithOllamaDim sets the Ollama-side dimension override.

func WithOllamaDocPrefix

func WithOllamaDocPrefix(prefix string) Opt

WithOllamaDocPrefix sets the document-mode prefix for Ollama (e.g. "passage: "). Mirrors existing WithTextPrefix on OllamaClient — exposed at package level.

func WithOllamaQueryPrefix

func WithOllamaQueryPrefix(prefix string) Opt

WithOllamaQueryPrefix sets the query-mode prefix for Ollama (e.g. "query: ").

func WithRetry

func WithRetry(p RetryPolicy) Opt

WithRetry configures the retry policy for transient errors (5xx HTTP status). Pass embed.NoRetry to disable retries entirely. Default: defaultRetryPolicy() (3 attempts, exp backoff 200ms→5s, jitter 10%).

func WithTimeout

func WithTimeout(d time.Duration) Opt

WithTimeout sets the per-request HTTP timeout.

func WithVoyageAPIKey

func WithVoyageAPIKey(key string) Opt

WithVoyageAPIKey sets the API key for the Voyage backend.

type Registry

type Registry struct {
	// contains filtered or unexported fields
}

Registry holds named embedders for multi-model /v1/embeddings support. Thread-safe: all methods are guarded by a read-write mutex.

func NewRegistry

func NewRegistry(fallback string) *Registry

NewRegistry creates a Registry with the given fallback model name. When Get is called with an empty name, the fallback is used.

func (*Registry) Close

func (r *Registry) Close() error

Close releases all registered embedders.

func (*Registry) Get

func (r *Registry) Get(name string) (Embedder, bool)

Get returns the embedder for the given name, or the fallback if name is empty.

func (*Registry) Register

func (r *Registry) Register(name string, e Embedder)

Register adds or replaces a named embedder in the registry.

type Result

type Result struct {
	// Vectors holds one entry per input text. On StatusDegraded/StatusSkipped,
	// entries are zero-length placeholders with their own Status set.
	Vectors []*Vector
	// Status indicates whether the embed call succeeded, was skipped, or degraded.
	Status Status
	// Model reports which model produced the embeddings (may be empty).
	Model string
	// TokensUsed is the total token count across all texts (0 when unavailable).
	// Populated by E4 when backend exposes usage.
	TokensUsed int
	// Err is non-nil iff Status == StatusDegraded.
	Err error
}

Result is the typed return value of EmbedWithResult. Callers should inspect Status before using Vectors.

func EmbedWithResult deprecated

func EmbedWithResult(ctx context.Context, e Embedder, texts []string, opts ...EmbedOpt) (*Result, error)

EmbedWithResult is the package-level v2 API shim — kept for backward compatibility with callers using the old free-function signature.

If e is a *Client, its EmbedWithResult method is called directly (observer hooks fire). For any other Embedder, a temporary *Client wrapper is created with no observer wired — hooks are silent. New code should use NewClient(...).EmbedWithResult(...) directly.

Deprecated: use (*Client).EmbedWithResult for new code.

type RetryPolicy

type RetryPolicy struct {
	// MaxAttempts is the total number of attempts (1 = no retry, 0 treated as 1).
	MaxAttempts int
	// BaseBackoff is the initial sleep duration between attempts.
	BaseBackoff time.Duration
	// MaxBackoff caps exponential growth.
	MaxBackoff time.Duration
	// Multiplier is the factor applied to backoff each attempt (e.g. 2.0 = double).
	Multiplier float64
	// Jitter adds randomness: actual sleep = backoff * (1 + Jitter * rand[0,1)).
	// Range 0..1.
	Jitter float64
	// RetryableStatus lists HTTP status codes that trigger a retry.
	// Non-listed status codes (e.g. 4xx) return immediately without retry.
	RetryableStatus []int
}

RetryPolicy controls how many times and how quickly the embed backend is retried on retryable errors (5xx HTTP status by default).

Default policy: MaxAttempts=3, BaseBackoff=200ms, MaxBackoff=5s, Multiplier=2.0, Jitter=0.1, RetryableStatus={429, 502, 503, 504}. v1 callers using New(cfg, logger) inherit this default via the internal HTTPEmbedder/withRetry path — the public RetryPolicy is active for v2 callers. Opt-out: WithRetry(embed.NoRetry).

type Status

type Status uint8

Status describes the outcome of an Embed call.

const (
	// StatusOk means the request succeeded and vectors are valid.
	StatusOk Status = iota
	// StatusDegraded means the request failed; vectors are zero-length placeholders.
	StatusDegraded
	// StatusFallback means the primary backend failed and a secondary succeeded.
	// Populated by E1 fallback path; E0 never produces this status.
	StatusFallback
	// StatusSkipped means the embedder was nil, texts was empty, or DryRun was set.
	StatusSkipped
)

func (Status) String

func (s Status) String() string

String returns the human-readable status label.

type Vector

type Vector struct {
	Embedding  []float32
	Dim        int    // == len(Embedding) at construction time
	TokenCount int    // 0 when backend doesn't expose
	Status     Status // per-text — usually StatusOk; for partial-batch failures
}

Vector is the per-text result from EmbedWithResult. TokenCount is 0 when the backend does not expose usage; populated by E4.

type VoyageClient

type VoyageClient struct {
	// contains filtered or unexported fields
}

VoyageClient calls the VoyageAI embedding API.

func NewVoyageClient

func NewVoyageClient(apiKey, model string, logger *slog.Logger) *VoyageClient

NewVoyageClient creates a new VoyageAI embedding client. logger=nil falls back to slog.Default().

func (*VoyageClient) Close

func (v *VoyageClient) Close() error

Close is a no-op for the HTTP-based VoyageAI client.

func (*VoyageClient) Dimension

func (v *VoyageClient) Dimension() int

Dimension returns the embedding vector dimension (1024 for voyage-4-lite).

func (*VoyageClient) Embed

func (v *VoyageClient) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed calls VoyageAI to embed one or more texts. Returns embeddings in the same order as input texts. Retries on 429/503 with exponential backoff.

func (*VoyageClient) EmbedQuery

func (v *VoyageClient) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery embeds a single query string (search/retrieval use case). Delegates to Embed — VoyageAI already handles query vs document via input_type.

Directories

Path Synopsis
Package onnx provides a local ONNX Runtime embedder backend.
Package onnx provides a local ONNX Runtime embedder backend.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL