sparse

package

v0.53.0 Latest Latest Go to latest Published: May 9, 2026 License: Apache-2.0 Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/anatolykoptev/go-kit

Links

Open Source Insights

Documentation ¶

Overview ¶

Package sparse provides SPLADE sparse-embedding clients with a unified interface.

SPLADE (Sparse Lexical and Expansion model for first-stage Ranking) produces term-weight vectors over a model's BPE/WordPiece vocabulary instead of dense floats. Each output is a pair of parallel arrays (Indices, Values): indices are vocab token ids, values are post-ReLU log-saturated weights. Empty vectors are valid (e.g. a query that tokenises to stopwords).

Use sparse for:

Lexical retrieval over a Postgres pgvector sparsevec column or Qdrant sparse vector — symmetric to dense embedding retrieval but trades semantic generalisation for term-match precision.
Hybrid retrieval pipelines: combine sparse top-k with dense top-k via RRF / weighted fusion (see github.com/anatolykoptev/go-kit/rerank for fusion helpers).

Use dense embeddings (github.com/anatolykoptev/go-kit/embed) when paraphrase recall matters more than exact-term match. Use rerank (github.com/anatolykoptev/go-kit/rerank) for the second-stage cross-encoder pass over fused candidates.

Backends:

HTTPSparseEmbedder — POSTs /embed_sparse on the self-hosted Rust embed-server sidecar (the only backend in v1). The endpoint is TEI-style (no /v1/ prefix) and Qdrant-shaped on the response side.

ONNX-local sparse inference is intentionally out of scope for v1 — parallel to the embed/ package, an embed/sparse/onnx subpackage would gate behind cgo + libonnxruntime + libtokenizers and is deferred until a caller needs it. All v1 traffic terminates against embed-server.

All backends share the SparseEmbedder interface (EmbedSparse / EmbedSparseQuery / VocabSize / Close), shared retry/backoff (transient errors, 429, 5xx), and shared Prometheus metrics under the gokit_sparse_* namespace.

Use New for env-driven backend selection. Use NewHTTPSparseEmbedder directly for explicit construction. NewClient is the v2 entry point that stacks observer hooks, retry, optional cache, optional circuit breaker, and optional fallback on top of the underlying backend.

Index ¶

Variables
type CircuitBreaker
- func NewCircuitBreaker(cfg CircuitConfig, model string, onTransition func(CircuitState, CircuitState)) *CircuitBreaker
- func (cb *CircuitBreaker) Allow() bool
- func (cb *CircuitBreaker) MarkFailure()
- func (cb *CircuitBreaker) MarkSuccess()
- func (cb *CircuitBreaker) State() CircuitState
type CircuitConfig
type CircuitState
- func (s CircuitState) String() string
type Client
- func NewClient(url string, opts ...Opt) (*Client, error)
- func (c *Client) Close() error
- func (c *Client) EmbedSparse(ctx context.Context, texts []string) ([]SparseVector, error)
- func (c *Client) EmbedSparseQuery(ctx context.Context, text string) (SparseVector, error)
- func (c *Client) EmbedSparseWithResult(ctx context.Context, texts []string, opts ...EmbedOpt) (*Result, error)
- func (c *Client) Model() string
- func (c *Client) VocabSize() int
type Config
type EmbedOpt
- func WithDryRun() EmbedOpt
type HTTPSparseEmbedder
- func NewHTTPSparseEmbedder(baseURL, model string, logger *slog.Logger, opts ...HTTPSparseOption) *HTTPSparseEmbedder
- func (h *HTTPSparseEmbedder) Close() error
- func (h *HTTPSparseEmbedder) EmbedSparse(ctx context.Context, texts []string) ([]SparseVector, error)
- func (h *HTTPSparseEmbedder) EmbedSparseQuery(ctx context.Context, text string) (SparseVector, error)
- func (h *HTTPSparseEmbedder) Model() string
- func (h *HTTPSparseEmbedder) VocabSize() int
type HTTPSparseOption
- func WithHTTPObserver(obs Observer) HTTPSparseOption
- func WithHTTPRetry(cfg RetryConfig) HTTPSparseOption
- func WithHTTPTimeout(d time.Duration) HTTPSparseOption
- func WithMinWeight(w float32) HTTPSparseOption
- func WithTopK(k int) HTTPSparseOption
- func WithVocabSize(v int) HTTPSparseOption
type Observer
type Opt
- func WithBackend(name string) Opt
- func WithCache(c SparseCache) Opt
- func WithCircuit(cfg CircuitConfig) Opt
- func WithClientMinWeight(w float32) Opt
- func WithClientTopK(k int) Opt
- func WithClientVocabSize(v int) Opt
- func WithEmbedder(e SparseEmbedder) Opt
- func WithFallback(secondary *Client) Opt
- func WithLogger(l *slog.Logger) Opt
- func WithModel(model string) Opt
- func WithObserver(obs Observer) Opt
- func WithRetry(cfg RetryConfig) Opt
- func WithTimeout(d time.Duration) Opt
type Registry
- func NewRegistry(fallback string) *Registry
- func (r *Registry) Close() error
- func (r *Registry) Get(name string) (SparseEmbedder, bool)
- func (r *Registry) Register(name string, e SparseEmbedder)
type Result
type RetryConfig
type SparseCache
type SparseEmbedder
- func New(cfg Config, logger *slog.Logger) (SparseEmbedder, error)
- func NewFromEnv(logger *slog.Logger) (SparseEmbedder, error)
type SparseVector
- func EmbedSparseQueryViaEmbed(ctx context.Context, e SparseEmbedder, text string) (SparseVector, error)
- func (v SparseVector) IsEmpty() bool
- func (v SparseVector) Len() int
type Status
- func (s Status) String() string
type Vector

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrCircuitOpen = errors.New("sparse: circuit breaker open")

ErrCircuitOpen is returned by callBackendResilient when the circuit breaker is in the Open state and has blocked the call.

View Source

var ErrModelNotConfigured = errors.New("sparse: splade model not configured on server")

ErrModelNotConfigured is returned when the embed-server rejects a sparse request because the requested model is not loaded, or because no model name was supplied and zero or 2+ SPLADE models are configured.

Surfaces the Rust handler's resolve_splade_name 400 case as a typed sentinel so callers can branch on configuration drift without parsing the error string.

View Source

var NoRetry = RetryConfig{MaxAttempts: 1}

NoRetry is an explicit opt-out from retries. MaxAttempts=1 means the initial call runs once and any failure is returned without sleeping.

Functions ¶

This section is empty.

Types ¶

type CircuitBreaker ¶

type CircuitBreaker struct {
	// contains filtered or unexported fields
}

CircuitBreaker is a thread-safe Closed/Open/HalfOpen state machine. Reads use RLock; writes (transitions) use Lock. Mirrors embed.CircuitBreaker — kept package-local to avoid cross-package dependency on embed/.

func NewCircuitBreaker ¶

func NewCircuitBreaker(cfg CircuitConfig, model string, onTransition func(CircuitState, CircuitState)) *CircuitBreaker

NewCircuitBreaker constructs a CircuitBreaker with the given config and an optional transition callback. The callback is invoked (via safeCall) on every state change; pass nil to skip.

func (*CircuitBreaker) Allow ¶

func (cb *CircuitBreaker) Allow() bool

Allow reports whether the current request may proceed.

func (*CircuitBreaker) MarkFailure ¶

func (cb *CircuitBreaker) MarkFailure()

MarkFailure notifies the breaker that the call failed.

func (*CircuitBreaker) MarkSuccess ¶

func (cb *CircuitBreaker) MarkSuccess()

MarkSuccess notifies the breaker that the call succeeded.

func (*CircuitBreaker) State ¶

func (cb *CircuitBreaker) State() CircuitState

State returns the current CircuitState. Safe for concurrent reads.

type CircuitConfig ¶

type CircuitConfig struct {
	// FailThreshold is the number of consecutive failures that trip the
	// circuit from Closed to Open. Default: 5.
	FailThreshold int
	// OpenDuration is how long the circuit stays Open before transitioning
	// to HalfOpen for probe requests. Default: 30s.
	OpenDuration time.Duration
	// HalfOpenProbes is the number of requests allowed through when in
	// HalfOpen state. Default: 1.
	HalfOpenProbes int
	// FailRateWindow is reserved for future fail-rate counting (currently
	// consecutive-failure counting is used). Default: 10s.
	FailRateWindow time.Duration
}

CircuitConfig configures a CircuitBreaker instance.

type CircuitState ¶

type CircuitState uint8

CircuitState represents the state of a circuit breaker.

const (
	// CircuitClosed is the normal operating state — calls pass through.
	CircuitClosed CircuitState = iota
	// CircuitOpen means the breaker has tripped — calls are short-circuited.
	CircuitOpen
	// CircuitHalfOpen means the breaker is probing for recovery.
	CircuitHalfOpen
)

func (CircuitState) String ¶

func (s CircuitState) String() string

String returns the human-readable label for the circuit state.

type Client ¶

type Client struct {
	// contains filtered or unexported fields
}

Client wraps a SparseEmbedder backend with v2 features: Observer hooks, retry, optional cache, optional circuit breaker, and optional fallback. Built via NewClient(url, opts...).

Client itself implements SparseEmbedder, so it is drop-in replaceable for v1 backends. v1 callers that hold the result as SparseEmbedder continue to work unchanged; v2 callers cast to *Client to call EmbedSparseWithResult directly.

func NewClient ¶

func NewClient(url string, opts ...Opt) (*Client, error)

NewClient is the v2 entry point — returns a *Client configured via functional options.

url is the embed-server base URL (no /embed_sparse path).

At least one backend-specific Opt must be applied; otherwise NewClient returns an error from the underlying constructor.

func (*Client) Close ¶

func (c *Client) Close() error

Close satisfies SparseEmbedder; closes the inner backend.

func (*Client) EmbedSparse ¶

func (c *Client) EmbedSparse(ctx context.Context, texts []string) ([]SparseVector, error)

EmbedSparse satisfies the SparseEmbedder interface. Routes through EmbedSparseWithResult so cache, circuit, fallback, and observer hooks fire identically.

func (*Client) EmbedSparseQuery ¶

func (c *Client) EmbedSparseQuery(ctx context.Context, text string) (SparseVector, error)

EmbedSparseQuery satisfies SparseEmbedder; routes through EmbedSparse so single-text query embeddings benefit from the same cache + resilience layers as batch calls.

func (*Client) EmbedSparseWithResult ¶

func (c *Client) EmbedSparseWithResult(ctx context.Context, texts []string, opts ...EmbedOpt) (*Result, error)

EmbedSparseWithResult is the v2 EmbedSparse API — returns a typed Result with Status and fires Observer hooks around the backend call.

Lifecycle:

OnBeforeEmbed → (cache check) → (fallback check) → callBackendResilient → OnAfterEmbed

func (*Client) Model ¶

func (c *Client) Model() string

Model returns the resolved model name.

func (*Client) VocabSize ¶

func (c *Client) VocabSize() int

VocabSize satisfies SparseEmbedder. Returns the configured vocab size, preferring the value set via WithClientVocabSize if non-zero, otherwise the inner backend's report.

type Config ¶

type Config struct {
	Type        string  // "http" (only supported value in v1)
	Model       string  // SPLADE model name (default splade-v3-distilbert)
	HTTPBaseURL string  // embed-server URL for type="http"
	VocabSize   int     // 0 = default 30522 (BERT-base)
	TopK        int     // 0 = use server default (256)
	MinWeight   float32 // 0 = use server default (0.0)
}

Config holds all sparse-embedder configuration in one typed struct. Populated from environment variables by callers.

Type selects the backend:

"http" — embed-server /embed_sparse endpoint (HTTPBaseURL).

Fields not relevant to the chosen Type are ignored. Only "http" is supported in v1; ONNX-local sparse inference is parked behind a future sparse/onnx subpackage.

type EmbedOpt ¶

type EmbedOpt func(*embedCallCfg)

EmbedOpt is a per-call option for EmbedSparseWithResult.

func WithDryRun ¶

func WithDryRun() EmbedOpt

WithDryRun skips the backend call entirely and returns Status=Skipped vectors. For testing pipeline wiring without a live server.

type HTTPSparseEmbedder ¶

type HTTPSparseEmbedder struct {
	// contains filtered or unexported fields
}

HTTPSparseEmbedder calls the embed-server /embed_sparse endpoint.

Endpoint: POST /embed_sparse (TEI-convention path, no /v1/ prefix). The path is appended to baseURL — pass only the host (e.g. "http://embed-server:8082").

Concurrent-safe: no mutable state beyond the http.Client which is itself safe for concurrent use.

func NewHTTPSparseEmbedder ¶

func NewHTTPSparseEmbedder(baseURL, model string, logger *slog.Logger, opts ...HTTPSparseOption) *HTTPSparseEmbedder

NewHTTPSparseEmbedder creates an HTTPSparseEmbedder pointing at baseURL.

baseURL should not include /embed_sparse — it will be appended automatically. model="" defaults to splade-v3-distilbert. logger=nil falls back to slog.Default().

func (*HTTPSparseEmbedder) Close ¶

func (h *HTTPSparseEmbedder) Close() error

Close is a no-op for the HTTP-based embedder.

func (*HTTPSparseEmbedder) EmbedSparse ¶

func (h *HTTPSparseEmbedder) EmbedSparse(ctx context.Context, texts []string) ([]SparseVector, error)

EmbedSparse sends texts to the remote embed-server and returns one SparseVector per input, in input order.

Empty input returns (nil, nil) without a network call — mirrors embed/'s HTTPEmbedder. The Rust handler rejects empty input with 400, so this guard avoids a guaranteed failure round-trip.

Retries transient failures (timeout, 429, 5xx) with exponential backoff + jitter (200ms → 400ms with ±10% jitter, max 3 attempts). With MaxAttempts=3 only two sleeps fire — between attempts 1→2 and 2→3 — so the realised delay budget is roughly 200ms + 400ms before the final failure. Non-retriable errors (4xx validation, unmarshal) fail fast.

4xx responses with bodies matching the Rust handler's resolve_splade_name failure messages are wrapped with ErrModelNotConfigured so callers can errors.Is them.

func (*HTTPSparseEmbedder) EmbedSparseQuery ¶

func (h *HTTPSparseEmbedder) EmbedSparseQuery(ctx context.Context, text string) (SparseVector, error)

EmbedSparseQuery embeds a single query string by delegating to EmbedSparse.

func (*HTTPSparseEmbedder) Model ¶

func (h *HTTPSparseEmbedder) Model() string

Model returns the configured SPLADE model name. Satisfies the optional modelGetter interface used by modelFromEmbedder.

func (*HTTPSparseEmbedder) VocabSize ¶

func (h *HTTPSparseEmbedder) VocabSize() int

VocabSize returns the configured vocabulary size (default 30522).

type HTTPSparseOption ¶

type HTTPSparseOption func(*HTTPSparseEmbedder)

HTTPSparseOption is a functional option for NewHTTPSparseEmbedder.

func WithHTTPObserver ¶

func WithHTTPObserver(obs Observer) HTTPSparseOption

WithHTTPObserver registers a lifecycle Observer on the raw HTTPSparseEmbedder. The observer's OnRetry hook fires for each retried failure inside withRetry. nil-ignored. v2 callers who use NewClient get the observer wired automatically from WithObserver via factory.go — this option is for callers that hold the HTTPSparseEmbedder directly.

func WithHTTPRetry ¶

func WithHTTPRetry(cfg RetryConfig) HTTPSparseOption

WithHTTPRetry overrides the default retry policy (3 attempts, 200ms→400ms with 10% jitter on transient failures: timeouts, 429, 5xx). Pass NoRetry to disable retries.

v2 callers who use NewClient get the policy wired automatically from WithRetry via factory.go — this option is for callers that hold the HTTPSparseEmbedder directly.

func WithHTTPTimeout ¶

func WithHTTPTimeout(d time.Duration) HTTPSparseOption

WithHTTPTimeout overrides the default HTTP client timeout (30s). Pass d=0 to leave the default unchanged.

func WithMinWeight ¶

func WithMinWeight(w float32) HTTPSparseOption

WithMinWeight overrides the per-instance weight cutoff. Entries with weight <= w are dropped server-side. Pass w<=0 to omit the field — the server default (0.0) applies.

func WithTopK ¶

func WithTopK(k int) HTTPSparseOption

WithTopK overrides the per-instance top_k cap on sparse entries per output. Pass k<=0 to omit the field — the server default (256) applies. Per-call overrides are not exposed in v1 to keep the API surface minimal; mirrors embed/'s pattern (per-instance options only).

func WithVocabSize ¶

func WithVocabSize(v int) HTTPSparseOption

WithVocabSize overrides the default BERT-base vocab size (30522). Useful for SPLADE variants on a different tokenizer (e.g. RoBERTa-based future models). Configured-not-validated.

type Observer ¶

type Observer interface {
	// OnBeforeEmbed fires before the backend call is made.
	// n is the number of texts being embedded.
	OnBeforeEmbed(ctx context.Context, model string, n int)
	// OnAfterEmbed fires after the backend call completes (success or
	// error). n is the number of texts in the result.
	OnAfterEmbed(ctx context.Context, status Status, dur time.Duration, n int)
	// OnRetry fires each time a request is retried.
	OnRetry(ctx context.Context, attempt int, err error)
	// OnCircuitTransition fires when the circuit breaker changes state.
	OnCircuitTransition(ctx context.Context, from, to CircuitState)
	// OnCacheHit fires when a cache hit short-circuits a backend call.
	// n is the number of texts whose vectors were served from cache.
	OnCacheHit(ctx context.Context, n int)
}

Observer receives lifecycle callbacks from the sparse client. All methods must be non-blocking. Panics are recovered by safeCall. Implement only the callbacks you care about; embed noopObserver for the rest.

type Opt ¶

type Opt func(*cfgInternal)

Opt is a functional option for NewClient.

func WithBackend ¶

func WithBackend(name string) Opt

WithBackend sets the backend type explicitly. Valid: "http". Mutually exclusive with WithEmbedder — if both are set, WithEmbedder wins.

func WithCache ¶

func WithCache(c SparseCache) Opt

WithCache wires a SparseCache. When set, every (model, top_k, min_weight, vocab_size, text) tuple is looked up before the backend EmbedSparse call. Full-batch hit short-circuits the backend entirely. Partial misses fall through to the backend for the full batch (no cherry-pick; keeps the API symmetric and simple). A nil SparseCache is ignored (caching stays disabled).

func WithCircuit ¶

func WithCircuit(cfg CircuitConfig) Opt

WithCircuit enables the circuit breaker with the given configuration. By default the circuit breaker is OFF.

func WithClientMinWeight ¶

func WithClientMinWeight(w float32) Opt

WithClientMinWeight sets the per-instance min_weight cutoff. w<=0 is ignored (server default applies).

func WithClientTopK ¶

func WithClientTopK(k int) Opt

WithClientTopK sets the per-instance top_k passed to the backend. k<=0 is ignored (server default applies).

Named WithClientTopK rather than WithTopK to disambiguate from HTTPSparseOption's WithTopK — they target different config layers (Client vs raw HTTPSparseEmbedder).

func WithClientVocabSize ¶

func WithClientVocabSize(v int) Opt

WithClientVocabSize sets the per-instance vocab size override. Used by VocabSize() and (future) shape validation.

func WithEmbedder ¶

func WithEmbedder(e SparseEmbedder) Opt

WithEmbedder accepts a pre-built SparseEmbedder. NewClient skips backend factory dispatch and wires this Embedder as the inner backend. Useful for custom HTTP variants or in tests.

Cache-key caveat: the Client's cache key is derived from the Client-level (top_k, min_weight, vocab_size) — NOT from the inner embedder's own values. If the inner embedder was constructed with HTTPSparseOption settings (WithTopK, WithMinWeight, WithVocabSize) AND you also enable WithCache, you MUST mirror those into the Client via WithClientTopK / WithClientMinWeight / WithClientVocabSize, otherwise two clients with different inner top_k values will hash to the same cache key and serve stale or wrong vectors. Example:

inner := NewHTTPSparseEmbedder(url, model, log, WithTopK(64))
client, _ := NewClient("",
    WithEmbedder(inner),
    WithCache(myCache),
    WithClientTopK(64), // MUST match inner's TopK
)

func WithFallback ¶

func WithFallback(secondary *Client) Opt

WithFallback sets a secondary *Client to try when the primary returns StatusDegraded with a non-4xx error. Fallback depth is capped at 1.

func WithLogger ¶

func WithLogger(l *slog.Logger) Opt

WithLogger sets the slog.Logger. nil-ignored.

func WithModel ¶

func WithModel(model string) Opt

WithModel sets the SPLADE model name (e.g. "splade-v3-distilbert").

func WithObserver ¶

func WithObserver(obs Observer) Opt

WithObserver registers a lifecycle Observer. nil-ignored.

func WithRetry ¶

func WithRetry(cfg RetryConfig) Opt

WithRetry overrides the retry policy for transient failures (timeouts, 429, 5xx). The policy flows to the HTTP backend via the v2 client factory (newFromInternal → WithHTTPRetry).

Use NoRetry to disable retries entirely. To customise jitter, attempts, or delays, pass a RetryConfig:

c, _ := sparse.NewClient(
    "http://embed-server:8082",
    sparse.WithRetry(sparse.RetryConfig{
        MaxAttempts: 5,
        BaseDelay:   100 * time.Millisecond,
        MaxDelay:    5 * time.Second,
        Jitter:      0.2,
    }),
)

func WithTimeout ¶

func WithTimeout(d time.Duration) Opt

WithTimeout sets the per-request HTTP timeout. Zero leaves the default (30s) untouched.

type Registry ¶

type Registry struct {
	// contains filtered or unexported fields
}

Registry holds named sparse embedders for multi-model dispatch. Thread-safe: all methods are guarded by a read-write mutex.

func NewRegistry ¶

func NewRegistry(fallback string) *Registry

NewRegistry creates a Registry with the given fallback model name. When Get is called with an empty name, the fallback is used.

func (*Registry) Close ¶

func (r *Registry) Close() error

Close releases all registered embedders.

func (*Registry) Get ¶

func (r *Registry) Get(name string) (SparseEmbedder, bool)

Get returns the embedder for the given name, or the fallback if name is empty.

func (*Registry) Register ¶

func (r *Registry) Register(name string, e SparseEmbedder)

Register adds or replaces a named embedder in the registry.

type Result ¶

type Result struct {
	// Vectors holds one entry per input text. On StatusDegraded/StatusSkipped,
	// entries are empty placeholders with their own Status set.
	Vectors []*Vector
	// Status indicates whether the call succeeded, was skipped, or degraded.
	Status Status
	// Model reports which SPLADE model produced the vectors (may be empty).
	Model string
	// Err is non-nil iff Status == StatusDegraded.
	Err error
}

Result is the typed return value of EmbedSparseWithResult. Callers should inspect Status before using Vectors.

type RetryConfig ¶

type RetryConfig struct {
	// MaxAttempts is the total number of attempts (1 = no retry).
	MaxAttempts int
	// BaseDelay is the initial sleep between attempts.
	BaseDelay time.Duration
	// MaxDelay caps exponential growth of the sleep.
	MaxDelay time.Duration
	// Jitter adds randomness: actual sleep = delay * (1 + Jitter * rand[0,1)).
	// Range 0..1; 0 disables jitter (deterministic backoff).
	Jitter float64
}

RetryConfig holds exponential backoff parameters for sparse retries.

Mirrors the v1 internal retry helper from github.com/anatolykoptev/go-kit/embed/retry.go — copied here rather than extracted into a shared internal/transportretry package to keep the two packages independently versionable. Fields are exported so callers can construct a custom policy via WithRetry; pass NoRetry to opt out.

type SparseCache ¶

type SparseCache interface {
	// Get returns the cached SparseVector for the given key. ok=false if
	// not cached. Implementations must NOT panic on ctx cancellation;
	// return ok=false instead.
	Get(ctx context.Context, key string) (SparseVector, bool)
	// Set stores the vector for the given key. Idempotent. Implementations
	// may TTL or evict per their policy.
	Set(ctx context.Context, key string, v SparseVector)
}

SparseCache abstracts a (text → SparseVector) lookup table.

go-kit/sparse ships NO concrete implementation — callers wire LRU / Redis / sync.Map per their runtime. Implementations MUST be safe for concurrent reads and writes.

SparseCache is a sibling of embed.Cache rather than a reuse of it: the value type is SparseVector (Indices + Values), not []float32, so the signature differs at the type level. Encoding is the cache implementation's concern — gob, MessagePack, or a compact custom layout (varint-prefixed indices + IEEE 754 values) all work.

Cache key invalidation on (model, top_k, min_weight, vocab_size, text) change is automatic — see cacheKey below.

Trade-off — SPLADE traffic shape: indexing fresh documents sees each text once, so the cache hit ratio is near zero on the document path and caching is wasted RAM there. Caching IS valuable on the *query* path where the same query may repeat across users / sessions. Because the indexing-vs-query split is a caller concern (the SparseEmbedder doesn't know which side of the pipeline it's on), the cache is opt-in via WithCache and disabled by default — embed/'s pattern.

type SparseEmbedder ¶

type SparseEmbedder interface {
	// EmbedSparse returns one SparseVector per input text, in input order.
	// Document/storage use case. Empty input returns (nil, nil) without
	// hitting the backend.
	EmbedSparse(ctx context.Context, texts []string) ([]SparseVector, error)
	// EmbedSparseQuery embeds a single query string. Search/retrieval use
	// case. Implementations may apply query-specific prefixes or
	// instructions; default delegates to EmbedSparse.
	EmbedSparseQuery(ctx context.Context, text string) (SparseVector, error)
	// VocabSize returns the model's vocabulary size (the dimension of the
	// sparse space — e.g. 30522 for BERT-base SPLADE). Used by callers that
	// need to format pgvector sparsevec literals or allocate dim-sized
	// buffers. The value is configured at construction; the backend does
	// not validate it against the model's actual head.
	VocabSize() int
	// Close releases resources (HTTP clients, model handles).
	Close() error
}

SparseEmbedder generates sparse term-weight vectors for text inputs.

func New ¶

func New(cfg Config, logger *slog.Logger) (SparseEmbedder, error)

New constructs the appropriate SparseEmbedder from cfg.

Supported Config.Type values:

"http" — NewHTTPSparseEmbedder

Any other value (including "") returns an error. logger=nil falls back to slog.Default() inside the backend constructor.

func NewFromEnv ¶

func NewFromEnv(logger *slog.Logger) (SparseEmbedder, error)

NewFromEnv constructs a SparseEmbedder from environment variables.

Recognised variables:

SPARSE_BACKEND       — only "http" supported in v1; default "http"
SPARSE_HTTP_BASE_URL — embed-server URL (required for http)
SPARSE_MODEL         — default "splade-v3-distilbert"
SPARSE_HTTP_TIMEOUT  — Go duration; default 30s
SPARSE_TOP_K         — int; 0/unset uses server default (256)
SPARSE_MIN_WEIGHT    — float; 0/unset uses server default (0.0)
SPARSE_VOCAB_SIZE    — int; 0/unset uses 30522 (BERT base)

Returns an error if SPARSE_BACKEND is "http" and SPARSE_HTTP_BASE_URL is empty. Mirrors embed/'s env-driven factory pattern.

type SparseVector ¶

type SparseVector struct {
	Indices []uint32
	Values  []float32
}

SparseVector is the (Indices, Values) representation of a single SPLADE output. The two slices are aligned by position: Indices[i] is the vocabulary token id whose weight is Values[i]. Length is variable per input — empty vectors are legal (e.g. a query of pure stopwords).

Both slices are populated and owned by the SparseEmbedder; callers must not mutate them. Defensive copies are the caller's responsibility when downstream code needs ordering invariants (e.g. pgvector's sparsevec literal requires sorted ascending indices — see memdb-go's FormatSparseVector helper).

func EmbedSparseQueryViaEmbed ¶

func EmbedSparseQueryViaEmbed(ctx context.Context, e SparseEmbedder, text string) (SparseVector, error)

EmbedSparseQueryViaEmbed is a helper that implements EmbedSparseQuery by delegating to EmbedSparse. Use it in SparseEmbedder implementations that don't need query-specific behaviour.

func (SparseVector) IsEmpty ¶

func (v SparseVector) IsEmpty() bool

IsEmpty reports whether the vector has no terms. Both slices empty (or nil) returns true. A vector where one slice is nil and the other is non-empty is malformed and reported as not-empty so the caller surfaces the error rather than silently dropping it.

func (SparseVector) Len ¶

func (v SparseVector) Len() int

Len returns the number of (index, value) pairs in the vector. When the two slices have different lengths the smaller is returned — callers that care about the malformed case should compare len(v.Indices) and len(v.Values) directly.

type Status ¶

type Status uint8

Status describes the outcome of an EmbedSparse call. Mirrors embed.Status to keep telemetry consumers symmetric across the three encoder families.

const (
	// StatusOk means the request succeeded and vectors are valid.
	StatusOk Status = iota
	// StatusDegraded means the request failed; vectors are empty placeholders.
	StatusDegraded
	// StatusFallback means the primary backend failed and a secondary succeeded.
	StatusFallback
	// StatusSkipped means the embedder was nil, texts was empty, or DryRun was set.
	StatusSkipped
)

func (Status) String ¶

func (s Status) String() string

String returns the human-readable status label.

type Vector ¶

type Vector struct {
	Sparse SparseVector
	Status Status
}

Vector is the per-text result from EmbedSparseWithResult. Status is per-text — usually StatusOk; for partial-batch failures (not produced by the v1 HTTP backend, which fails the whole batch atomically) entries can carry their own Status.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL