embed

package

v0.53.0 Latest Latest Go to latest Published: May 9, 2026 License: Apache-2.0 Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/anatolykoptev/go-kit

Links

Open Source Insights

Documentation ¶

Overview ¶

Package embed provides text embedding backends with a unified interface.

Backends:

HTTPEmbedder — OpenAI-compatible /v1/embeddings clients (e.g. the self-hosted embed-server sidecar serving multilingual-e5-large + jina-code-v2 on the internal Docker network).
OllamaClient — Ollama /api/embed (batch, ≥ 0.3.6).
VoyageClient — Voyage AI hosted /v1/embeddings.
ONNX (build tag "cgo") — local ONNX Runtime inference; lives in subpackage github.com/anatolykoptev/go-kit/embed/onnx so callers who don't need cgo never link libonnxruntime / libtokenizers.

All backends share the Embedder interface (Embed / EmbedQuery / Dimension / Close), shared retry/backoff (transient errors, 429, 5xx), and shared Prometheus metrics under the embed_* namespace.

Use New for env-driven backend selection (Type ∈ {http, ollama, voyage, onnx}). Use NewHTTPEmbedder / NewOllamaClient / NewVoyageClient directly for explicit construction.

Multi-model wiring uses Registry, which maps model names (e.g. "multilingual-e5-large", "jina-code-v2") to embedders and falls back to a designated default when the lookup name is empty.

Index ¶

Variables
func EmbedQueryViaEmbed(ctx context.Context, e Embedder, text string) ([]float32, error)
func L2Normalize(vec []float32)
type Cache
type CircuitBreaker
- func NewCircuitBreaker(cfg CircuitConfig, model string, onTransition func(CircuitState, CircuitState)) *CircuitBreaker
- func (cb *CircuitBreaker) Allow() bool
- func (cb *CircuitBreaker) MarkFailure()
- func (cb *CircuitBreaker) MarkSuccess()
- func (cb *CircuitBreaker) State() CircuitState
type CircuitConfig
type CircuitState
- func (s CircuitState) String() string
type Client
- func NewClient(url string, opts ...Opt) (*Client, error)
- func (c *Client) Close() error
- func (c *Client) Dimension() int
- func (c *Client) Embed(ctx context.Context, texts []string) ([][]float32, error)
- func (c *Client) EmbedQuery(ctx context.Context, text string) ([]float32, error)
- func (c *Client) EmbedWithResult(ctx context.Context, texts []string, opts ...EmbedOpt) (*Result, error)
- func (c *Client) Model() string
type Config
type EmbedOpt
- func WithDryRun() EmbedOpt
type Embedder
- func New(cfg Config, logger *slog.Logger) (Embedder, error)
type ErrDimMismatch
- func (e *ErrDimMismatch) Error() string
type HTTPEmbedder
- func NewHTTPEmbedder(baseURL, model string, dim int, logger *slog.Logger, opts ...HTTPOption) *HTTPEmbedder
- func (h *HTTPEmbedder) Close() error
- func (h *HTTPEmbedder) Dimension() int
- func (h *HTTPEmbedder) Embed(ctx context.Context, texts []string) ([][]float32, error)
- func (h *HTTPEmbedder) EmbedQuery(ctx context.Context, text string) ([]float32, error)
type HTTPOption
- func WithHTTPTimeout(d time.Duration) HTTPOption
type Observer
type OllamaClient
- func NewOllamaClient(baseURL, model string, logger *slog.Logger, opts ...OllamaOption) *OllamaClient
- func (c *OllamaClient) Close() error
- func (c *OllamaClient) Dimension() int
- func (c *OllamaClient) Embed(ctx context.Context, texts []string) ([][]float32, error)
- func (c *OllamaClient) EmbedQuery(ctx context.Context, text string) ([]float32, error)
type OllamaOption
- func WithNormalizeL2(enabled bool) OllamaOption
- func WithOllamaDimension(dim int) OllamaOption
- func WithOllamaTimeout(d time.Duration) OllamaOption
- func WithQueryPrefix(prefix string) OllamaOption
- func WithTextPrefix(prefix string) OllamaOption
type Opt
- func WithBackend(name string) Opt
- func WithCache(c Cache) Opt
- func WithChunkSize(n int) Opt
- func WithCircuit(cfg CircuitConfig) Opt
- func WithDim(dim int) Opt
- func WithEmbedder(e Embedder) Opt
- func WithFallback(secondary *Client) Opt
- func WithLogger(l *slog.Logger) Opt
- func WithModel(model string) Opt
- func WithObserver(obs Observer) Opt
- func WithOllamaDim(dim int) Opt
- func WithOllamaDocPrefix(prefix string) Opt
- func WithOllamaQueryPrefix(prefix string) Opt
- func WithRetry(p RetryPolicy) Opt
- func WithTimeout(d time.Duration) Opt
- func WithVoyageAPIKey(key string) Opt
type Registry
- func NewRegistry(fallback string) *Registry
- func (r *Registry) Close() error
- func (r *Registry) Get(name string) (Embedder, bool)
- func (r *Registry) Register(name string, e Embedder)
type Result
- func EmbedWithResult(ctx context.Context, e Embedder, texts []string, opts ...EmbedOpt) (*Result, error)deprecated
type RetryPolicy
type Status
- func (s Status) String() string
type Vector
type VoyageClient
- func NewVoyageClient(apiKey, model string, logger *slog.Logger) *VoyageClient
- func (v *VoyageClient) Close() error
- func (v *VoyageClient) Dimension() int
- func (v *VoyageClient) Embed(ctx context.Context, texts []string) ([][]float32, error)
- func (v *VoyageClient) EmbedQuery(ctx context.Context, text string) ([]float32, error)

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrCircuitOpen = errors.New("embed: circuit breaker open")

ErrCircuitOpen is returned by callBackendResilient when the circuit breaker is in the Open state and has blocked the call.

View Source

var ErrONNXNotInFactory = errors.New(
	"embed.New: type=\"onnx\" not supported by this factory; " +
		"import github.com/anatolykoptev/go-kit/embed/onnx and call onnx.New",
)

ErrONNXNotInFactory is returned by New when Config.Type == "onnx".

ONNX requires cgo + libonnxruntime + libtokenizers, which is too heavy a dependency for the default factory. Callers that need ONNX should import the subpackage github.com/anatolykoptev/go-kit/embed/onnx and call onnx.New(cfg, logger) directly. memdb-go does this in its server-init wiring; pure-HTTP/Ollama/Voyage callers never link the cgo deps.

View Source

var NoRetry = RetryPolicy{MaxAttempts: 1}

NoRetry is an explicit opt-out from the default retry policy. Pass via WithRetry(embed.NoRetry) to disable all retries.

Functions ¶

func EmbedQueryViaEmbed ¶

func EmbedQueryViaEmbed(ctx context.Context, e Embedder, text string) ([]float32, error)

EmbedQueryViaEmbed is a helper that implements EmbedQuery by delegating to Embed. Use it in embedder implementations that don't need query-specific behaviour.

func L2Normalize ¶

func L2Normalize(vec []float32)

L2Normalize is the exported alias of l2Normalize. The ONNX subpackage imports this from the parent package so mean pooling stays in one place.

Types ¶

type Cache ¶

type Cache interface {
	// Get returns the cached embedding for the given key. ok=false if not cached.
	// Implementations must NOT panic on ctx cancellation; return ok=false instead.
	Get(ctx context.Context, key string) (vector []float32, ok bool)
	// Set stores the embedding for the given key. Idempotent. Implementations
	// may TTL or evict per their policy.
	Set(ctx context.Context, key string, vector []float32)
}

Cache abstracts a (text → vector) lookup table. go-kit/embed ships NO concrete implementation — callers wire LRU/Redis/sync.Map per their runtime. Implementations MUST be safe for concurrent reads and writes.

TTL semantics, eviction policy, and persistence are caller concerns. Cache key invalidation on model/dim/prefix change is automatic (key includes all parameters that affect output vector).

Trade-offs:

On partial-miss, ALL N vectors are re-Set after the backend call (not only the missing ones). For Redis-backed caches with non-trivial Set cost, implementations may dedupe internally. In-process LRU/sync.Map caches: noop (Set is O(1) and idempotent).

Future-proofing — these vector-affecting fields are NOT YET in cacheKey because they are static or single-valued today; once they become per-call settable, cacheKey will be extended:

Voyage input_type ("document" vs "query") — hardcoded "query" today
Ollama normalize_l2 toggle — applied unconditionally today

Callers persisting a cache across Client lifecycles SHOULD include their own config-hash prefix on keys to avoid cross-config pollution.

type CircuitBreaker ¶

type CircuitBreaker struct {
	// contains filtered or unexported fields
}

CircuitBreaker is a thread-safe Closed/Open/HalfOpen state machine. Reads use RLock; writes (transitions) use Lock.

func NewCircuitBreaker ¶

func NewCircuitBreaker(cfg CircuitConfig, model string, onTransition func(CircuitState, CircuitState)) *CircuitBreaker

NewCircuitBreaker constructs a CircuitBreaker with the given config and an optional transition callback. The callback is invoked (via safeCall) on every state change; pass nil to skip.

func (*CircuitBreaker) Allow ¶

func (cb *CircuitBreaker) Allow() bool

Allow reports whether the current request may proceed.

CircuitClosed: always true.
CircuitOpen: false unless OpenDuration elapsed — then transitions to HalfOpen and returns true for up to HalfOpenProbes concurrent requests.
CircuitHalfOpen: true only for HalfOpenProbes slots; false afterwards.

func (*CircuitBreaker) MarkFailure ¶

func (cb *CircuitBreaker) MarkFailure()

MarkFailure notifies the breaker that the call failed. Closed: increments consecutive failure counter; trips to Open at FailThreshold. HalfOpen: immediately returns to Open (probe failed).

func (*CircuitBreaker) MarkSuccess ¶

func (cb *CircuitBreaker) MarkSuccess()

MarkSuccess notifies the breaker that the call succeeded. HalfOpen → Closed; resets consecutive failure counter.

func (*CircuitBreaker) State ¶

func (cb *CircuitBreaker) State() CircuitState

State returns the current CircuitState. Safe for concurrent reads.

type CircuitConfig ¶

type CircuitConfig struct {
	// FailThreshold is the number of consecutive failures that trip the
	// circuit from Closed to Open. Default: 5.
	FailThreshold int
	// OpenDuration is how long the circuit stays Open before transitioning
	// to HalfOpen for probe requests. Default: 30s.
	OpenDuration time.Duration
	// HalfOpenProbes is the number of requests allowed through when in
	// HalfOpen state. Default: 1.
	HalfOpenProbes int
	// FailRateWindow is reserved for future fail-rate counting (currently
	// consecutive-failure counting is used). Default: 10s.
	FailRateWindow time.Duration
}

CircuitConfig configures a CircuitBreaker instance.

type CircuitState ¶

type CircuitState uint8

CircuitState represents the state of a circuit breaker. Defined here as a placeholder foundation for E1 which implements the full FSM.

const (
	// CircuitClosed is the normal operating state — calls pass through.
	CircuitClosed CircuitState = iota
	// CircuitOpen means the breaker has tripped — calls are short-circuited.
	CircuitOpen
	// CircuitHalfOpen means the breaker is probing for recovery.
	CircuitHalfOpen
)

func (CircuitState) String ¶

func (s CircuitState) String() string

String returns the human-readable label for the circuit state. Used as a Prometheus label value.

type Client ¶

type Client struct {
	// contains filtered or unexported fields
}

Client wraps an Embedder backend with v2 features: Observer hooks, retry, circuit breaker, and multi-model fallback (E1). Built via NewClient(url, opts...).

Client itself implements Embedder, so it is drop-in replaceable for v1 backends. v1 callers that hold the result as Embedder continue to work unchanged; v2 callers cast to *Client to call EmbedWithResult directly.

func NewClient ¶

func NewClient(url string, opts ...Opt) (*Client, error)

NewClient is the v2 entry point — returns a *Client configured via functional options. v1 callers continue to use New(cfg, logger) which calls the per-backend helpers directly.

url is the backend URL when applicable. For Ollama/HTTP backends, pass the base URL. For Voyage, url is ignored (endpoint is hardcoded by the API). For ONNX, use the embed/onnx subpackage directly.

At least one backend-specific Opt must be applied; otherwise NewClient returns an error from the underlying constructor.

The returned *Client implements Embedder, so it is assignable to an Embedder variable for v1-style callers. Cast to *Client to access EmbedWithResult.

func (*Client) Close ¶

func (c *Client) Close() error

Close satisfies Embedder; closes the inner backend.

func (*Client) Dimension ¶

func (c *Client) Dimension() int

Dimension satisfies Embedder.

func (*Client) Embed ¶

func (c *Client) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed satisfies the Embedder interface. Routes through EmbedWithResult so that ALL configured layers — cache (E3), circuit breaker (E1), fallback, and observer hooks — fire on this path identically to EmbedWithResult.

2026-05-01 fix: prior implementation called callBackendResilient directly, which silently bypassed the cache layer (WithCache was effectively a no-op for callers using the simpler Embed API). Verified empirically — memdb-go wired WithCache via NewHTTPEmbedderWithOpts but its embed_cache_total counter stayed at 0 across a full LoCoMo ingest because every Embed() call took the no-cache path. Routing through EmbedWithResult fixes this without changing the public Embed signature.

func (*Client) EmbedQuery ¶

func (c *Client) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery satisfies Embedder; routes through Embed so single-text query embeddings benefit from the same cache + resilience layers as batch calls. When WithDim was set, the returned vector length is validated against c.expectedDim — a mismatch surfaces as *ErrDimMismatch and bumps embed_dim_mismatch_total{model}.

2026-05-01 fix: was c.inner.EmbedQuery directly, also bypassing cache. Now identical resilience semantics whether you call Embed or EmbedQuery.

func (*Client) EmbedWithResult ¶

func (c *Client) EmbedWithResult(ctx context.Context, texts []string, opts ...EmbedOpt) (*Result, error)

EmbedWithResult is the v2 Embed API — returns a typed Result with Status and fires Observer hooks around the backend call.

Lifecycle:

OnBeforeEmbed → (fallback check) → callBackendResilient → OnAfterEmbed

When chunking is active (len(texts) > chunkSize), `OnBeforeEmbed` / `OnAfterEmbed` fire ONCE PER DISPATCHED CHUNK, not once per user-facing call — observers tracking call count vs token count should reflect this. `embed_chunks_per_call` is recorded once per `EmbedWithResult` call (value=1 for non-chunked, value=N for chunked).

Status semantics:

StatusOk — request succeeded, vectors are valid
StatusDegraded — request failed, Err is set
StatusFallback — primary degraded, secondary succeeded (E1)
StatusSkipped — nil inner, empty texts, or DryRun enabled

E1 wires retry/circuit/fallback on top of this call. E2 wires auto-batching, E3 wires cache, E4 wires per-text Status reasoning. E5 wires client-side chunking when len(texts) > c.chunkSize.

func (*Client) Model ¶

func (c *Client) Model() string

Model returns the resolved model name. Satisfies the optional modelGetter interface used by modelFromEmbedder's fallback chain.

type Config ¶

type Config struct {
	Type         string // "http" | "ollama" | "voyage" | "onnx"
	ONNXModelDir string
	VoyageAPIKey string
	Model        string // voyage, ollama, or http model name
	OllamaURL    string
	OllamaDim    int    // 0 = auto-detect from first response
	OllamaPrefix string // client-side document prefix (e.g. "passage: ")
	OllamaQuery  string // client-side query prefix (e.g. "query: ")
	HTTPBaseURL  string // for type="http" — URL of embed-server sidecar
	HTTPDim      int    // dimension override (default 1024)
}

Config holds all embedder configuration in one typed struct. Populated from environment variables by callers.

Type selects the backend:

"http" — OpenAI-compatible /v1/embeddings endpoint (HTTPBaseURL).
"ollama" — Ollama /api/embed (OllamaURL).
"voyage" — Voyage AI hosted /v1/embeddings (VoyageAPIKey).
"onnx" — local ONNX Runtime; requires the embed/onnx subpackage factory because it depends on cgo.

Fields not relevant to the chosen Type are ignored.

type EmbedOpt ¶

type EmbedOpt func(*embedCallCfg)

EmbedOpt is a per-call option for EmbedWithResult.

func WithDryRun ¶

func WithDryRun() EmbedOpt

WithDryRun skips the backend call entirely and returns Status=Skipped vectors of zero length. For testing pipeline wiring without a live server.

type Embedder ¶

type Embedder interface {
	// Embed returns embeddings for the given texts (document/storage use case).
	Embed(ctx context.Context, texts []string) ([][]float32, error)
	// EmbedQuery embeds a single query string (search/retrieval use case).
	// Implementations may apply query-specific prefixes or instructions.
	// Default: delegates to Embed.
	EmbedQuery(ctx context.Context, text string) ([]float32, error)
	// Dimension returns the embedding vector dimension.
	Dimension() int
	// Close releases resources (model, tokenizer, HTTP clients).
	Close() error
}

Embedder generates text embeddings.

func New ¶

func New(cfg Config, logger *slog.Logger) (Embedder, error)

New constructs the appropriate Embedder from cfg.

Supported Config.Type values:

"http" — NewHTTPEmbedder
"ollama" — NewOllamaClient with prefix/dim options applied
"voyage" — NewVoyageClient
"onnx" — returns ErrONNXNotInFactory; use embed/onnx subpackage

Returns an error if the type is unknown or required config is missing. logger=nil falls back to slog.Default() inside each backend constructor.

type ErrDimMismatch ¶

type ErrDimMismatch struct {
	// Got is the length of the vector returned by the backend.
	Got int
	// Want is the dimension declared via WithDim.
	Want int
	// Model is the resolved model name (may be empty for opaque backends).
	Model string
	// Index is the position of the first offending vector in the ORIGINAL
	// (pre-chunking) input slice. Zero when chunking is not in use.
	// When client-side chunking is active, Index equals the chunk's start
	// offset so callers can locate the offending record without iterating
	// all vectors.
	Index int
}

ErrDimMismatch is returned by Client.Embed / Client.EmbedQuery / Client.EmbedWithResult when the backend returns a vector whose length does not match the dimension declared via WithDim.

This guards against silent corruption of downstream pgvector / Qdrant schemas when the backend model is swapped (e.g. via env var change) without a coordinated WithDim update on the consumer side. Without this check, a 1024-dim response would be written to a vector(768) column and only fail at INSERT time — far from the configuration error.

Behaviour:

Returned only when WithDim was set to a non-zero value (cfg.dim == 0 disables validation, preserving auto-detection).
Embed handlers MUST continue serving — do NOT panic; treat as a normal error and propagate.
Each mismatch increments embed_dim_mismatch_total{model} so dashboards can alert on production drift.

func (*ErrDimMismatch) Error ¶

func (e *ErrDimMismatch) Error() string

Error implements the error interface.

type HTTPEmbedder ¶

type HTTPEmbedder struct {
	// contains filtered or unexported fields
}

HTTPEmbedder calls a remote OpenAI-compatible /v1/embeddings endpoint. Designed for the Rust embed-server sidecar on the internal Docker network, but compatible with any provider that speaks the OpenAI shape (Voyage, Mixedbread, Together, vLLM-served encoders, etc.).

func NewHTTPEmbedder ¶

func NewHTTPEmbedder(baseURL, model string, dim int, logger *slog.Logger, opts ...HTTPOption) *HTTPEmbedder

NewHTTPEmbedder creates an HTTPEmbedder pointing at baseURL. baseURL should not include /v1/embeddings — it will be appended automatically. logger=nil falls back to slog.Default().

opts is variadic and backwards-compatible: existing 4-arg callers (e.g. MemDB's memdb-go embedder wrapper) continue to compile unchanged and receive the default 30s timeout.

func (*HTTPEmbedder) Close ¶

func (h *HTTPEmbedder) Close() error

Close is a no-op for the HTTP-based embedder.

func (*HTTPEmbedder) Dimension ¶

func (h *HTTPEmbedder) Dimension() int

Dimension returns the configured embedding dimension.

func (*HTTPEmbedder) Embed ¶

func (h *HTTPEmbedder) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed sends texts to the remote embedding server and returns vectors.

Retries transient failures (timeout, 429, 5xx) with exponential backoff (200ms → 400ms → 800ms, cap 5s, 3 attempts total). Non-retriable errors (4xx validation, unmarshal) fail fast.

func (*HTTPEmbedder) EmbedQuery ¶

func (h *HTTPEmbedder) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery embeds a single query string by delegating to Embed.

type HTTPOption ¶

type HTTPOption func(*HTTPEmbedder)

HTTPOption is a functional option for NewHTTPEmbedder.

Currently used by the factory wiring in [newFromInternal] to forward cfg.timeout (set via WithTimeout on the v2 NewClient). Direct v1 callers can also use it for per-instance customisation without changing the existing 4-arg constructor signature.

func WithHTTPTimeout ¶

func WithHTTPTimeout(d time.Duration) HTTPOption

WithHTTPTimeout overrides the default HTTP client timeout (30s). Pass d=0 to leave the default unchanged.

type Observer ¶

type Observer interface {
	// OnBeforeEmbed fires before the backend call is made.
	// n is the number of texts being embedded.
	//
	// Chunking note (E5): when client-side chunking is active (input length
	// exceeds chunkSize), this fires ONCE PER DISPATCHED CHUNK, not once per
	// user-facing EmbedWithResult call. A 100-text call with chunkSize=32
	// fires 4 OnBeforeEmbed callbacks, each with n equal to that chunk's size
	// (32, 32, 32, 4). Observers tracking call count vs token volume should
	// reflect this. Use `embed_chunks_per_call` histogram to count user-facing
	// calls.
	OnBeforeEmbed(ctx context.Context, model string, n int)
	// OnAfterEmbed fires after the backend call completes (success or error).
	// n is the number of texts in the result.
	//
	// Chunking note (E5): same per-chunk semantics as OnBeforeEmbed.
	OnAfterEmbed(ctx context.Context, status Status, dur time.Duration, n int)
	// OnRetry fires each time a request is retried (E1+).
	OnRetry(ctx context.Context, attempt int, err error)
	// OnCircuitTransition fires when the circuit breaker changes state (E1+).
	OnCircuitTransition(ctx context.Context, from, to CircuitState)
	// OnCacheHit fires when a cache hit short-circuits a backend call (E3+).
	// n is the number of texts whose embeddings were served from cache.
	OnCacheHit(ctx context.Context, n int)
	// OnTruncate fires when a text is truncated before being sent (E4+).
	// textIdx is the index of the truncated text in the input slice.
	OnTruncate(ctx context.Context, textIdx int, beforeTok, afterTok int)
}

Observer receives lifecycle callbacks from the embed client. All methods must be non-blocking. Panics are recovered by safeCall. Implement only the callbacks you care about; embed noopObserver for the rest.

type OllamaClient ¶

type OllamaClient struct {
	// contains filtered or unexported fields
}

OllamaClient calls the Ollama /api/embed endpoint. Supports batch embedding (multiple texts in one request). No CGO, no ONNX Runtime — pure HTTP client. Compatible with Ollama ≥ 0.3.6 which introduced the batch /api/embed endpoint.

func NewOllamaClient ¶

func NewOllamaClient(baseURL, model string, logger *slog.Logger, opts ...OllamaOption) *OllamaClient

NewOllamaClient creates a new Ollama embedding client. baseURL: Ollama server URL (e.g. "http://localhost:11434"), empty = default. model: embedding model name (e.g. "nomic-embed-text", "mxbai-embed-large"), empty = default. logger=nil falls back to slog.Default().

func (*OllamaClient) Close ¶

func (c *OllamaClient) Close() error

Close is a no-op for the HTTP-based Ollama client.

func (*OllamaClient) Dimension ¶

func (c *OllamaClient) Dimension() int

Dimension returns the embedding vector dimension. Returns the auto-detected dimension from the first response if available, otherwise the configured default (1024). Override with WithOllamaDimension.

func (*OllamaClient) Embed ¶

func (c *OllamaClient) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed calls Ollama /api/embed to embed one or more texts (document/storage use case). Applies WithTextPrefix client-side before sending. Returns embeddings in the same order as input texts. Empty input returns nil, nil.

func (*OllamaClient) EmbedQuery ¶

func (c *OllamaClient) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery embeds a single query string (search/retrieval use case). Applies WithQueryPrefix if set, otherwise falls back to WithTextPrefix.

type OllamaOption ¶

type OllamaOption func(*OllamaClient)

OllamaOption is a functional option for OllamaClient.

func WithNormalizeL2 ¶

func WithNormalizeL2(enabled bool) OllamaOption

WithNormalizeL2 enables client-side L2 normalization of embeddings. Ollama ≥ 0.3.6 already normalizes server-side, so this is a no-op in most cases. Enable only if using an older Ollama version or a model that does not normalize.

func WithOllamaDimension ¶

func WithOllamaDimension(dim int) OllamaOption

WithOllamaDimension overrides the reported embedding dimension. The default is 1024 to match the existing pgvector/Qdrant schema (vector(1024)). Use this only if deploying a model with a different dimension.

func WithOllamaTimeout ¶

func WithOllamaTimeout(d time.Duration) OllamaOption

WithOllamaTimeout overrides the HTTP client timeout (default 60s). Increase for large batches or slow hardware.

func WithQueryPrefix ¶

func WithQueryPrefix(prefix string) OllamaOption

WithQueryPrefix sets a string prepended client-side to query text in EmbedQuery. Allows different prefixes for storage (Embed) vs retrieval (EmbedQuery).

Example: WithQueryPrefix("query: ") for e5-style retrieval. Default: "" (same as document prefix — no distinction).

func WithTextPrefix ¶

func WithTextPrefix(prefix string) OllamaOption

WithTextPrefix sets a string prepended client-side to every document text before sending to Ollama (used by Embed). Separate from Ollama's server-side Modelfile template.

Example: WithTextPrefix("passage: ") for e5-style document storage. Default: "" (no prefix — raw text, compatible with existing ONNX vectors).

type Opt ¶

type Opt func(*cfgInternal)

Opt is a functional option for NewClient.

func WithBackend ¶

func WithBackend(name string) Opt

WithBackend sets the backend type explicitly. Valid: "http" | "ollama" | "voyage". Mutually exclusive with WithEmbedder — if both are set, WithEmbedder wins.

func WithCache ¶

func WithCache(c Cache) Opt

WithCache wires a Cache. When set, every (model, dim, docPrefix, queryPrefix, text, role) tuple is looked up before backend Embed call. Full-batch hit short-circuits the backend entirely. Partial misses fall through to the backend for the full batch (no cherry-pick; keeps API symmetric across all backends). A nil Cache is ignored (caching stays disabled).

func WithChunkSize ¶ added in v0.49.0

func WithChunkSize(n int) Opt

WithChunkSize overrides the per-call chunking limit. When len(texts) exceeds this value, Embed/EmbedWithResult splits the input into sequential sub-batches of at most chunkSize. Default: 32 (matching ox-embed-server EMBED_MAX_INPUT_ARRAY). Override via GOKIT_EMBED_CHUNK_SIZE env when constructing without options. Values <= 0 are ignored (constructor falls back to env then default).

func WithCircuit ¶

func WithCircuit(cfg CircuitConfig) Opt

WithCircuit enables the circuit breaker with the given configuration. By default the circuit breaker is OFF (nil). Wiring the observer for OnCircuitTransition happens in newClientFromInternal after all opts are applied. A sentinel *CircuitBreaker is stored here; the final one (with model+obs hook) is built in newClientFromInternal.

func WithDim ¶

func WithDim(dim int) Opt

WithDim sets the expected embedding dimension. Zero = auto-detect from response. When non-zero, every backend response is validated against this value: a mismatch returns *ErrDimMismatch and increments embed_dim_mismatch_total{model}. The error is non-terminal — fallback chains continue to the next embedder.

func WithEmbedder ¶

func WithEmbedder(e Embedder) Opt

WithEmbedder accepts a pre-built Embedder (e.g. *onnx.Embedder from the embed/onnx subpackage, or a custom impl). NewClient skips backend factory dispatch and wires this Embedder as the inner backend of the returned *Client. Required for ONNX usage via NewClient (avoids forcing cgo on pure-HTTP callers).

ONNX usage:

import "github.com/anatolykoptev/go-kit/embed/onnx"

onnxEmb, _ := onnx.New(onnx.Config{...}, logger)
c, _ := embed.NewClient("", embed.WithEmbedder(onnxEmb))

Note: when WithEmbedder is set, WithBackend is silently ignored. To make the override explicit, set only one of the two.

nil is ignored (backend dispatch proceeds normally).

func WithFallback ¶

func WithFallback(secondary *Client) Opt

WithFallback sets a secondary *Client to try when the primary returns StatusDegraded with a non-4xx error. Fallback depth is capped at 1.

func WithLogger ¶

func WithLogger(l *slog.Logger) Opt

WithLogger sets the slog.Logger. nil-ignored (backends fall back to slog.Default()).

func WithModel ¶

func WithModel(model string) Opt

WithModel sets the backend model name.

func WithObserver ¶

func WithObserver(obs Observer) Opt

WithObserver registers a lifecycle Observer. nil-ignored (noopObserver stays active).

func WithOllamaDim ¶

func WithOllamaDim(dim int) Opt

WithOllamaDim sets the Ollama-side dimension override.

func WithOllamaDocPrefix ¶

func WithOllamaDocPrefix(prefix string) Opt

WithOllamaDocPrefix sets the document-mode prefix for Ollama (e.g. "passage: "). Mirrors existing WithTextPrefix on OllamaClient — exposed at package level.

func WithOllamaQueryPrefix ¶

func WithOllamaQueryPrefix(prefix string) Opt

WithOllamaQueryPrefix sets the query-mode prefix for Ollama (e.g. "query: ").

func WithRetry ¶

func WithRetry(p RetryPolicy) Opt

WithRetry configures the retry policy for transient errors (5xx HTTP status). Pass embed.NoRetry to disable retries entirely. Default: defaultRetryPolicy() (3 attempts, exp backoff 200ms→5s, jitter 10%).

func WithTimeout ¶

func WithTimeout(d time.Duration) Opt

WithTimeout sets the per-request HTTP timeout.

func WithVoyageAPIKey ¶

func WithVoyageAPIKey(key string) Opt

WithVoyageAPIKey sets the API key for the Voyage backend.

type Registry ¶

type Registry struct {
	// contains filtered or unexported fields
}

Registry holds named embedders for multi-model /v1/embeddings support. Thread-safe: all methods are guarded by a read-write mutex.

func NewRegistry ¶

func NewRegistry(fallback string) *Registry

NewRegistry creates a Registry with the given fallback model name. When Get is called with an empty name, the fallback is used.

func (*Registry) Close ¶

func (r *Registry) Close() error

Close releases all registered embedders.

func (*Registry) Get ¶

func (r *Registry) Get(name string) (Embedder, bool)

Get returns the embedder for the given name, or the fallback if name is empty.

func (*Registry) Register ¶

func (r *Registry) Register(name string, e Embedder)

Register adds or replaces a named embedder in the registry.

type Result ¶

type Result struct {
	// Vectors holds one entry per input text. On StatusDegraded/StatusSkipped,
	// entries are zero-length placeholders with their own Status set.
	Vectors []*Vector
	// Status indicates whether the embed call succeeded, was skipped, or degraded.
	Status Status
	// Model reports which model produced the embeddings (may be empty).
	Model string
	// TokensUsed is the total token count across all texts (0 when unavailable).
	// Populated by E4 when backend exposes usage.
	TokensUsed int
	// Err is non-nil iff Status == StatusDegraded.
	Err error
}

Result is the typed return value of EmbedWithResult. Callers should inspect Status before using Vectors.

func EmbedWithResult deprecated

func EmbedWithResult(ctx context.Context, e Embedder, texts []string, opts ...EmbedOpt) (*Result, error)

EmbedWithResult is the package-level v2 API shim — kept for backward compatibility with callers using the old free-function signature.

If e is a *Client, its EmbedWithResult method is called directly (observer hooks fire). For any other Embedder, a temporary *Client wrapper is created with no observer wired — hooks are silent. New code should use NewClient(...).EmbedWithResult(...) directly.

Deprecated: use (*Client).EmbedWithResult for new code.

type RetryPolicy ¶

type RetryPolicy struct {
	// MaxAttempts is the total number of attempts (1 = no retry, 0 treated as 1).
	MaxAttempts int
	// BaseBackoff is the initial sleep duration between attempts.
	BaseBackoff time.Duration
	// MaxBackoff caps exponential growth.
	MaxBackoff time.Duration
	// Multiplier is the factor applied to backoff each attempt (e.g. 2.0 = double).
	Multiplier float64
	// Jitter adds randomness: actual sleep = backoff * (1 + Jitter * rand[0,1)).
	// Range 0..1.
	Jitter float64
	// RetryableStatus lists HTTP status codes that trigger a retry.
	// Non-listed status codes (e.g. 4xx) return immediately without retry.
	RetryableStatus []int
}

RetryPolicy controls how many times and how quickly the embed backend is retried on retryable errors (5xx HTTP status by default).

Default policy: MaxAttempts=3, BaseBackoff=200ms, MaxBackoff=5s, Multiplier=2.0, Jitter=0.1, RetryableStatus={429, 502, 503, 504}. v1 callers using New(cfg, logger) inherit this default via the internal HTTPEmbedder/withRetry path — the public RetryPolicy is active for v2 callers. Opt-out: WithRetry(embed.NoRetry).

type Status ¶

type Status uint8

Status describes the outcome of an Embed call.

const (
	// StatusOk means the request succeeded and vectors are valid.
	StatusOk Status = iota
	// StatusDegraded means the request failed; vectors are zero-length placeholders.
	StatusDegraded
	// StatusFallback means the primary backend failed and a secondary succeeded.
	// Populated by E1 fallback path; E0 never produces this status.
	StatusFallback
	// StatusSkipped means the embedder was nil, texts was empty, or DryRun was set.
	StatusSkipped
)

func (Status) String ¶

func (s Status) String() string

String returns the human-readable status label.

type Vector ¶

type Vector struct {
	Embedding  []float32
	Dim        int    // == len(Embedding) at construction time
	TokenCount int    // 0 when backend doesn't expose
	Status     Status // per-text — usually StatusOk; for partial-batch failures
}

Vector is the per-text result from EmbedWithResult. TokenCount is 0 when the backend does not expose usage; populated by E4.

type VoyageClient ¶

type VoyageClient struct {
	// contains filtered or unexported fields
}

VoyageClient calls the VoyageAI embedding API.

func NewVoyageClient ¶

func NewVoyageClient(apiKey, model string, logger *slog.Logger) *VoyageClient

NewVoyageClient creates a new VoyageAI embedding client. logger=nil falls back to slog.Default().

func (*VoyageClient) Close ¶

func (v *VoyageClient) Close() error

Close is a no-op for the HTTP-based VoyageAI client.

func (*VoyageClient) Dimension ¶

func (v *VoyageClient) Dimension() int

Dimension returns the embedding vector dimension (1024 for voyage-4-lite).

func (*VoyageClient) Embed ¶

func (v *VoyageClient) Embed(ctx context.Context, texts []string) ([][]float32, error)

Embed calls VoyageAI to embed one or more texts. Returns embeddings in the same order as input texts. Retries on 429/503 with exponential backoff.

func (*VoyageClient) EmbedQuery ¶

func (v *VoyageClient) EmbedQuery(ctx context.Context, text string) ([]float32, error)

EmbedQuery embeds a single query string (search/retrieval use case). Delegates to Embed — VoyageAI already handles query vs document via input_type.

Directories ¶

Path	Synopsis
onnx Package onnx provides a local ONNX Runtime embedder backend.	Package onnx provides a local ONNX Runtime embedder backend.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL