embedding

package

v0.1.11 Latest Latest Go to latest Published: Jun 7, 2026 License: MIT Imports: 19 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/peiman/vaultmind

Documentation ¶

Overview ¶

Package embedding provides text embedding infrastructure for VaultMind.

Index ¶

Constants
func Acceleration() string
func BackendName() string
func ColBERTHead(hiddenStates [][]float32, weights [][]float32, bias []float32) [][]float32
func DefaultCacheDir() string
func DefaultModel() string
func DenseHead(hiddenStates [][]float32) []float32
func DownloadBGEM3(cacheDir string) (string, error)
func L2Normalize(vec []float32) []float32
func LoadLinearWeights(path string) (weight [][]float32, bias []float32, err error)
func MaxSimScore(queryTokens, docTokens [][]float32) float64
func SparseDotProduct(a, b map[int32]float32) float64
func SparseHead(hiddenStates [][]float32, tokenIDs, specialMask []uint32, weights []float32, ...) map[int32]float32
func TruncateForEmbedding(text string, maxTokens int) string
type BGEM3Embedder
- func NewBGEM3Embedder(cfg HugotConfig) (*BGEM3Embedder, error)
- func (e *BGEM3Embedder) Close() error
- func (e *BGEM3Embedder) Dims() int
- func (e *BGEM3Embedder) Embed(ctx context.Context, text string) ([]float32, error)
- func (e *BGEM3Embedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
- func (e *BGEM3Embedder) EmbedColBERT(ctx context.Context, text string) ([][]float32, error)
- func (e *BGEM3Embedder) EmbedFull(ctx context.Context, text string) (*BGEM3Output, error)
- func (e *BGEM3Embedder) EmbedFullBatch(_ context.Context, texts []string) ([]*BGEM3Output, error)
- func (e *BGEM3Embedder) EmbedSparse(ctx context.Context, text string) (map[int32]float32, error)
type BGEM3Output
type Embedder
type FullEmbedder
type HugotConfig
- func BGEM3Config() HugotConfig
- func DefaultHugotConfig() HugotConfig
type HugotEmbedder
- func NewHugotEmbedder(cfg HugotConfig) (*HugotEmbedder, error)
- func (e *HugotEmbedder) Close() error
- func (e *HugotEmbedder) Dims() int
- func (e *HugotEmbedder) Embed(ctx context.Context, text string) ([]float32, error)
- func (e *HugotEmbedder) EmbedBatch(_ context.Context, texts []string) ([][]float32, error)
type SidecarBGEM3Config
type SidecarBGEM3Embedder
- func NewSidecarBGEM3(cfg SidecarBGEM3Config) (*SidecarBGEM3Embedder, error)
- func (e *SidecarBGEM3Embedder) Close() error
- func (e *SidecarBGEM3Embedder) Device() string
- func (e *SidecarBGEM3Embedder) Dims() int
- func (e *SidecarBGEM3Embedder) Embed(ctx context.Context, text string) ([]float32, error)
- func (e *SidecarBGEM3Embedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
- func (e *SidecarBGEM3Embedder) EmbedFull(ctx context.Context, text string) (*BGEM3Output, error)
- func (e *SidecarBGEM3Embedder) EmbedFullBatch(_ context.Context, texts []string) ([]*BGEM3Output, error)

Constants ¶

View Source

const (
	DefaultModelName    = "sentence-transformers/all-MiniLM-L6-v2"
	DefaultDims         = 384
	DefaultMaxTokens    = 510 // MiniLM max is 512 minus 2 for CLS/SEP tokens
	DefaultOnnxFilePath = "onnx/model.onnx"
)

Default model configuration for the all-MiniLM-L6-v2 embedder.

View Source

const (
	BGEM3ModelName    = "BAAI/bge-m3"
	BGEM3Dims         = 1024
	BGEM3MaxTokens    = 8190 // 8192 minus 2 for CLS/SEP
	BGEM3OnnxFilePath = "onnx/model.onnx"
)

BGE-M3 model configuration.

Variables ¶

This section is empty.

Functions ¶

func Acceleration ¶

func Acceleration() string

Acceleration mirrors the ORT-build's Acceleration() so callers don't need to special-case build tags. Pure-Go has no GPU path; "go-cpu" names the slow path explicitly.

func BackendName ¶

func BackendName() string

BackendName identifies which hugot backend the binary was built against. Consumers (e.g. the index command) use this to warn when BGE-M3 indexing is about to run on the slow pure-Go path so operators don't mistake "hours-long indexing" for a hang or OOM. Reported by the build tag.

func ColBERTHead ¶

func ColBERTHead(hiddenStates [][]float32, weights [][]float32, bias []float32) [][]float32

ColBERTHead projects each non-CLS token through a linear layer and L2-normalizes. Input: hiddenStates[seq_len][dims], weights[out_dims][in_dims], bias[out_dims]. Output: [seq_len-1][out_dims] (CLS at index 0 is skipped).

func DefaultCacheDir ¶

func DefaultCacheDir() string

DefaultCacheDir returns the default model cache directory (~/.vaultmind/models).

func DefaultModel ¶

func DefaultModel() string

DefaultModel returns the embedding model to use when the operator hasn't picked one explicitly. Adapts to the backend the binary was built against:

ORT-tagged binaries → "bge-m3" (4-way hybrid retrieval — fast on this build path; what the README's retrieval description is built around).
Pure-Go binaries → "minilm" (BGE-M3 indexing on pure-Go takes hours per medium vault; minilm is the always-fast baseline).

The default is conservative: it never picks a model the binary can't run reasonably. Users who want minilm on an ORT binary (e.g. for fast re-indexing during development) can pass --model minilm explicitly. Users who want bge-m3 on a pure-Go binary can opt in via --model bge-m3 + --allow-slow-backend.

The 2026-05-05 dogfood surfaced this gap: the prior hardcoded "minilm" default contradicted the system's own framing. A user running `vaultmind index --embed` on an ORT-capable build silently got MiniLM-only embeddings, learning about it only from doctor's post-hoc warning. The runtime-aware default closes that gap by matching the model to what the binary can actually run well.

func DenseHead ¶

func DenseHead(hiddenStates [][]float32) []float32

DenseHead extracts the CLS token embedding (index 0) and L2-normalizes it. Input: hiddenStates[seq_len][dims]. Output: [dims] unit vector.

func DownloadBGEM3 ¶

func DownloadBGEM3(cacheDir string) (string, error)

DownloadBGEM3 downloads BGE-M3 model files from HuggingFace if not already cached. Returns the path to the model directory.

func L2Normalize ¶

func L2Normalize(vec []float32) []float32

L2Normalize returns a unit vector. Returns zero vector if magnitude is zero.

func LoadLinearWeights ¶

func LoadLinearWeights(path string) (weight [][]float32, bias []float32, err error)

LoadLinearWeights loads a PyTorch nn.Linear layer's weight and bias from a .pt file. Returns weight as [out_features][in_features] and bias as [out_features]. The .pt file must be a state_dict saved via torch.save(state_dict, path).

func MaxSimScore ¶

func MaxSimScore(queryTokens, docTokens [][]float32) float64

MaxSimScore computes the ColBERT MaxSim score between query and document token matrices. For each query token, finds max similarity across all doc tokens, then sums. Assumes both query and doc tokens are L2-normalized (from ColBERTHead), so dot product = cosine.

func SparseDotProduct ¶

func SparseDotProduct(a, b map[int32]float32) float64

SparseDotProduct computes the dot product between two sparse vectors. Only overlapping keys contribute.

func SparseHead ¶

func SparseHead(hiddenStates [][]float32, tokenIDs, specialMask []uint32, weights []float32, bias float32) map[int32]float32

SparseHead computes learned lexical weights per token. For each non-special token: weight = ReLU(dot(hidden, w) + bias). Weights scattered to vocabulary positions via tokenIDs. Duplicate token IDs keep the maximum weight.

func TruncateForEmbedding ¶

func TruncateForEmbedding(text string, maxTokens int) string

TruncateForEmbedding truncates text to fit within the model's token limit. Uses a character-based approximation (2 chars/token, empirically derived). Breaks at word boundaries when possible.

Tail loss: content beyond maxTokens × 2 chars is dropped before tokenization — the head is embedded correctly but the tail is invisible to semantic retrieval (lexical FTS still sees the full body). For long-form notes where the tail carries information not in the head, this under-covers retrieval. Tracked as a quality improvement in vaultmind#30 (chunk-and-pool); not a silent failure or robustness bug, just a coverage limit. Build the chunking fix when retrieval visibly misses tail content; don't preempt.

Types ¶

type BGEM3Embedder ¶

type BGEM3Embedder struct {
	// contains filtered or unexported fields
}

BGEM3Embedder produces dense, sparse, and ColBERT embeddings using BGE-M3.

func NewBGEM3Embedder ¶

func NewBGEM3Embedder(cfg HugotConfig) (*BGEM3Embedder, error)

NewBGEM3Embedder creates a BGE-M3 embedder with all three heads.

func (*BGEM3Embedder) Close ¶

func (e *BGEM3Embedder) Close() error

Close releases the hugot session.

func (*BGEM3Embedder) Dims ¶

func (e *BGEM3Embedder) Dims() int

Dims returns the embedding dimensionality (1024).

func (*BGEM3Embedder) Embed ¶

func (e *BGEM3Embedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed returns the dense embedding (Embedder interface compatibility).

func (*BGEM3Embedder) EmbedBatch ¶

func (e *BGEM3Embedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch returns dense embeddings (Embedder interface compatibility).

func (*BGEM3Embedder) EmbedColBERT ¶

func (e *BGEM3Embedder) EmbedColBERT(ctx context.Context, text string) ([][]float32, error)

EmbedColBERT produces only the ColBERT per-token embeddings (used by ColBERTRetriever).

func (*BGEM3Embedder) EmbedFull ¶

func (e *BGEM3Embedder) EmbedFull(ctx context.Context, text string) (*BGEM3Output, error)

EmbedFull produces all three embedding types for a single text.

func (*BGEM3Embedder) EmbedFullBatch ¶

func (e *BGEM3Embedder) EmbedFullBatch(_ context.Context, texts []string) ([]*BGEM3Output, error)

EmbedFullBatch produces all three embedding types for multiple texts. Bypasses hugot's Postprocess to access raw per-token hidden states.

func (*BGEM3Embedder) EmbedSparse ¶

func (e *BGEM3Embedder) EmbedSparse(ctx context.Context, text string) (map[int32]float32, error)

EmbedSparse produces only the sparse embedding (used by SparseRetriever).

type BGEM3Output ¶

type BGEM3Output struct {
	Dense   []float32         // [1024] CLS-pooled, L2-normalized
	Sparse  map[int32]float32 // vocab_id -> weight (non-zero only)
	ColBERT [][]float32       // [seq_len-1][1024] per-token, L2-normalized
}

BGEM3Output contains all three embedding types from a BGE-M3 forward pass.

type Embedder ¶

type Embedder interface {
	// Embed produces a single embedding vector for the given text.
	Embed(ctx context.Context, text string) ([]float32, error)

	// EmbedBatch produces embedding vectors for multiple texts.
	EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

	// Dims returns the dimensionality of the embedding vectors.
	Dims() int

	// Close releases resources (model session, etc.).
	Close() error
}

Embedder converts text into dense vector representations.

type FullEmbedder ¶

type FullEmbedder interface {
	Embedder
	EmbedFullBatch(ctx context.Context, texts []string) ([]*BGEM3Output, error)
}

FullEmbedder extends Embedder with multi-output capability (BGE-M3).

type HugotConfig ¶

type HugotConfig struct {
	// ModelPath is the local path to the ONNX model directory.
	// If empty, the model will be downloaded from HuggingFace.
	ModelPath string

	// ModelName is the HuggingFace model ID (e.g., "sentence-transformers/all-MiniLM-L6-v2").
	// Used for downloading if ModelPath is not set.
	ModelName string

	// CacheDir is where downloaded models are stored.
	CacheDir string

	// Dims is the embedding dimensionality (e.g., 384 for MiniLM, 1024 for BGE-M3).
	Dims int

	// OnnxFilePath specifies which ONNX file to use when a model has multiple variants.
	// E.g., "onnx/model.onnx" for the default, "onnx/model_O2.onnx" for optimized.
	OnnxFilePath string

	// MaxTokens is the model's context window size. Texts longer than this (in approximate
	// tokens) are truncated before embedding. 0 means no truncation.
	MaxTokens int
}

HugotConfig configures the HugotEmbedder.

func BGEM3Config ¶

func BGEM3Config() HugotConfig

BGEM3Config returns the HugotConfig for BGE-M3.

func DefaultHugotConfig ¶

func DefaultHugotConfig() HugotConfig

DefaultHugotConfig returns the standard HugotConfig for all-MiniLM-L6-v2.

type HugotEmbedder ¶

type HugotEmbedder struct {
	// contains filtered or unexported fields
}

HugotEmbedder wraps the hugot library to produce embeddings using ONNX models.

func NewHugotEmbedder ¶

func NewHugotEmbedder(cfg HugotConfig) (*HugotEmbedder, error)

NewHugotEmbedder creates an embedder using hugot with the Go backend. For ORT backend (faster, supports larger models), build with -tags ORT.

func (*HugotEmbedder) Close ¶

func (e *HugotEmbedder) Close() error

Close releases the hugot session.

func (*HugotEmbedder) Dims ¶

func (e *HugotEmbedder) Dims() int

Dims returns the dimensionality of the embedding vectors.

func (*HugotEmbedder) Embed ¶

func (e *HugotEmbedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed produces a single embedding vector.

func (*HugotEmbedder) EmbedBatch ¶

func (e *HugotEmbedder) EmbedBatch(_ context.Context, texts []string) ([][]float32, error)

EmbedBatch produces embedding vectors for multiple texts. Texts exceeding the model's token limit are truncated automatically.

type SidecarBGEM3Config ¶

type SidecarBGEM3Config struct {
	// Python is the interpreter path. Must have torch + transformers
	// installed and be on a platform where torch.backends.mps.is_available()
	// returns true (Apple Silicon). When empty, falls back to "python3" on
	// PATH.
	Python string
	// ScriptPath is the absolute path to embed_server.py. When empty, the
	// embedder looks for the script alongside this Go file (resolved via
	// the project's $CLAUDE_PROJECT_DIR or the executable's directory).
	ScriptPath string
}

SidecarBGEM3Config controls how the sidecar process is launched.

type SidecarBGEM3Embedder ¶

type SidecarBGEM3Embedder struct {
	// contains filtered or unexported fields
}

SidecarBGEM3Embedder runs BGE-M3 inference in an external Python process that uses PyTorch + MPS (Apple Silicon GPU). The Go side handles tokenization context (none — Python sidecar tokenizes with HF tokenizer loaded from cache) and the heads run inside the sidecar so the per-modality tensors flow through MPS without round-tripping to CPU mid-batch.

Why a sidecar instead of in-process: in-process ORT (via hugot) saturates CPU during indexing on Apple Silicon — there's no GPU acceleration path (vaultmind#34). The sidecar pattern moves heavy inference behind a JSON contract, isolating vaultmind core from the inference engine choice. Today the engine is PyTorch+MPS; tomorrow it could be CoreML or MLX without touching the Go side.

Lifecycle: the embedder spawns the Python subprocess in NewSidecarBGEM3. Close() tears it down. Per-batch round-trips happen via Send (write JSON line to stdin, read JSON line from stdout). Mutex serializes access since the protocol is synchronous request/response on a single FD pair.

func NewSidecarBGEM3 ¶

func NewSidecarBGEM3(cfg SidecarBGEM3Config) (*SidecarBGEM3Embedder, error)

NewSidecarBGEM3 spawns the Python sidecar and waits for its ready signal. Returns an error if the subprocess fails to start, the Python imports fail, or the model can't be loaded. The caller MUST defer Close() to reap the subprocess.

func (*SidecarBGEM3Embedder) Close ¶

func (e *SidecarBGEM3Embedder) Close() error

Close terminates the sidecar process. Safe to call multiple times.

func (*SidecarBGEM3Embedder) Device ¶

func (e *SidecarBGEM3Embedder) Device() string

Device reports the device the sidecar selected ("mps" or "cpu"). Useful for the doctor / index summary so the operator sees acceleration.

func (*SidecarBGEM3Embedder) Dims ¶

func (e *SidecarBGEM3Embedder) Dims() int

Dims reports the dense embedding dimensionality.

func (*SidecarBGEM3Embedder) Embed ¶

func (e *SidecarBGEM3Embedder) Embed(ctx context.Context, text string) ([]float32, error)

Embed produces a single dense embedding via the sidecar.

func (*SidecarBGEM3Embedder) EmbedBatch ¶

func (e *SidecarBGEM3Embedder) EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)

EmbedBatch produces dense embeddings for a batch via the sidecar.

func (*SidecarBGEM3Embedder) EmbedFull ¶

func (e *SidecarBGEM3Embedder) EmbedFull(ctx context.Context, text string) (*BGEM3Output, error)

EmbedFull is the singleton form of EmbedFullBatch.

func (*SidecarBGEM3Embedder) EmbedFullBatch ¶

func (e *SidecarBGEM3Embedder) EmbedFullBatch(_ context.Context, texts []string) ([]*BGEM3Output, error)

EmbedFullBatch sends a batch of texts to the sidecar and parses the response. Tokens are sparse-key strings in the JSON; we parse to int32 here so the sidecar protocol stays portable across languages.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL