search

package
v1.0.44 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 2, 2026 License: MIT Imports: 30 Imported by: 0

Documentation

Overview

Package search provides full-text indexing with BM25 scoring.

Package search - HNSW configuration and quality presets.

Package search provides HNSW vector indexing for fast approximate nearest neighbor search.

HNSW Delete/Update Policy:

Delete:

  • Remove() tombstones a vector via a dense `deleted []bool` flag
  • Neighbor lists are not eagerly rewired (tombstones keep deletes cheap)
  • Entry point is re-selected if the removed node was the entry point

Update:

  • Current policy: Remove() + Add() pattern
  • Call Remove(id) then Add(id, newVector) to update a vector
  • This ensures the graph structure is correctly maintained
  • Future: A dedicated Update() method may be added for efficiency

Graph Quality:

  • High-churn workloads (many updates/deletes) can degrade graph quality
  • Periodic rebuilds are recommended (see NORNICDB_VECTOR_ANN_REBUILD_INTERVAL)
  • Rebuilds restore optimal graph structure and improve recall

Package search - Kalman filter integration for stable search ranking.

This file provides the KalmanAdapter that enhances the search service with:

  • Similarity score smoothing: Reduce noise in vector similarity scores
  • Ranking stability: Prevent results from jumping around between queries
  • Trend detection: Identify documents becoming more/less relevant
  • Latency prediction: Smooth query latency for better resource planning

Why Smooth Search Scores?

Vector similarity scores can be noisy due to:

  • Embedding model variability (same query, slightly different embeddings)
  • Approximation in ANN (approximate nearest neighbor) indexes
  • Edge cases where documents are equidistant from query

Kalman filtering smooths these variations to provide:

  • More consistent rankings across similar queries
  • Gradual relevance transitions (not sudden jumps)
  • Confidence estimates for borderline results

Integration Architecture

┌─────────────────────────────────────────────────────────────┐
│                  Kalman Search Adapter                       │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐    ┌─────────────────────────────────┐ │
│  │  Search Service │───▶│ Kalman Filter (scores)          │ │
│  │  (raw scores)   │    │ - Per-document filters          │ │
│  └────────┬────────┘    │ - Score history smoothing       │ │
│           │             └─────────────────────────────────┘ │
│           │                                                 │
│           ▼                                                 │
│  ┌─────────────────────────────────────────────────────────┐│
│  │     Ranking Stabilizer                                  ││
│  │  • Detect score ties                                    ││
│  │  • Apply stability boost to consistent docs             ││
│  │  • Prevent rank oscillation                             ││
│  └─────────────────────────────────────────────────────────┘│
│           │                                                 │
│           ▼                                                 │
│  ┌─────────────────────────────────────────────────────────┐│
│  │     Latency Predictor                                   ││
│  │  • Smooth query latency measurements                    ││
│  │  • Predict P99 latency                                  ││
│  │  • Resource scaling suggestions                         ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘

ELI12 (Explain Like I'm 12)

Imagine you're searching Google for "cute cats":

**Without Kalman:**

  • Search 1: Cat A is #1, Cat B is #2
  • Search 2: Cat B is #1, Cat A is #2 (they swapped!)
  • Search 3: Cat C is #1 (where did that come from?!)

**With Kalman:**

  • Kalman remembers: "Cat A has been top 3 for a while"
  • When Cat B briefly gets a tiny score boost, Kalman says: "That's probably noise. Cat A stays #1 until I see a REAL trend."
  • Result: Rankings stay consistent, you find things where you expect them!

It's like having a wise librarian who remembers "this book was popular before" instead of reshuffling the entire library every time someone asks a question.

Package search - K-means cluster routing candidate generator.

This file implements KMeansCandidateGen, which uses k-means clustering to route queries to the most relevant clusters, then generates candidates from those clusters. This is optimal for very large datasets (N > 100K) where cluster routing provides significant speedup over HNSW.

Trigger Policies:

K-means clustering is triggered automatically:

  • After bulk loads: When BuildIndexes() completes, clustering runs automatically
  • Periodic clustering: Background timer runs clustering at regular intervals (configurable via clustering interval)
  • Manual trigger: Call TriggerClustering() after bulk data loading

The candidate generator automatically uses k-means routing when:

  • Clustering is enabled (EnableClustering() called)
  • ClusterIndex is clustered (Cluster() has been run)
  • Dataset is large enough to benefit (typically N > 100K)

For smaller datasets, the pipeline automatically falls back to HNSW or brute-force.

Package search provides cross-encoder reranking for improved search quality.

Cross-encoder reranking is a two-stage retrieval approach:

  1. Stage 1 (Fast): Bi-encoder retrieval - vector similarity + BM25 - Uses pre-computed embeddings for O(log N) search - Returns top-K candidates (e.g., 100)

  2. Stage 2 (Accurate): Cross-encoder reranking - Passes (query, document) pairs through a cross-encoder model - Model sees both query and document together → more accurate - Re-scores and re-ranks the top-K candidates - Returns final top-N results (e.g., 10)

Why Cross-Encoder?

Bi-encoders (like embeddings) encode query and document separately:

query_emb  = encode(query)
doc_emb    = encode(document)
score      = cosine(query_emb, doc_emb)

Cross-encoders encode them together:

score = cross_encode(query, document)  // sees interaction!

The cross-encoder can capture fine-grained semantic relationships that bi-encoders miss, but it's slower (O(N) vs O(log N)).

ELI12 (Explain Like I'm 12):

Imagine finding a book in a library:

  • Bi-encoder (Stage 1): Like using the card catalog. Fast lookup by category/keywords, but might miss nuances.

  • Cross-encoder (Stage 2): Like actually reading each book's summary. More accurate, but takes longer. So we only do it for the top candidates from Stage 1.

Reference: Nogueira & Cho (2019) "Passage Re-ranking with BERT"

Package search provides unified hybrid search with Reciprocal Rank Fusion (RRF).

This package implements the same hybrid search approach used by production systems like Azure AI Search, Elasticsearch, and Weaviate: combining vector similarity search with BM25 full-text search using Reciprocal Rank Fusion.

Search Capabilities:

  • Vector similarity search (cosine similarity with HNSW index)
  • BM25 full-text search (keyword matching with TF-IDF)
  • RRF hybrid search (fuses vector + BM25 results)
  • Adaptive weighting based on query characteristics
  • Automatic fallback when one method fails

Example Usage:

// Create search service
svc := search.NewService(storageEngine)

// Build indexes from existing nodes
if err := svc.BuildIndexes(ctx); err != nil {
	log.Fatal(err)
}

// Perform hybrid search
query := "machine learning algorithms"
embedding := embedder.Embed(ctx, query) // Get from embed package
opts := search.DefaultSearchOptions()

response, err := svc.Search(ctx, query, embedding, opts)
if err != nil {
	log.Fatal(err)
}

for _, result := range response.Results {
	fmt.Printf("[%.3f] %s\n", result.RRFScore, result.Title)
}

How RRF Works:

RRF (Reciprocal Rank Fusion) combines rankings from multiple search methods. Instead of merging scores directly (which can be incomparable), RRF uses rank positions to create a unified ranking.

Formula: RRF_score = Σ (weight / (k + rank))

Where:

  • k is a constant (typically 60) to reduce the impact of high ranks
  • rank is the position in the result list (1-indexed)
  • weight allows emphasizing one method over another

Example: A document ranked #1 in vector search and #3 in BM25:

RRF = (1.0 / (60 + 1)) + (1.0 / (60 + 3))
    = (1.0 / 61) + (1.0 / 63)
    = 0.0164 + 0.0159
    = 0.0323

Documents that appear in both result sets get boosted scores.

ELI12 (Explain Like I'm 12):

Imagine two friends ranking pizza places:

  • Friend A (vector search) ranks by taste similarity to your favorite
  • Friend B (BM25) ranks by matching your description "spicy pepperoni"

They might disagree! Friend A says place X is #1 (tastes similar), while Friend B says it's #5 (doesn't match keywords well).

RRF solves this by: 1. If a place appears in BOTH lists, it gets bonus points 2. Higher ranks (being at the top) give more points 3. The magic number 60 prevents #1 from completely dominating

This way, a place that's #2 in both lists beats a place that's #1 in one but missing from the other!

Index persistence and WAL alignment:

NornicDB storage uses a write-ahead log (WAL) for durability; graph state is recovered from WAL (and snapshots) on startup. Search indexes (BM25 and vector) are built or loaded after storage recovery, so they always reflect the WAL-consistent state. Index files use semver format versioning (Qdrant-style): only indexes with the same format version are loaded; older or newer versions are rejected so the caller can rebuild. Persisted indexes are saved on a debounced schedule after mutations and on shutdown.

Result caching:

Search() results are cached in-process by query + options (limit, types, rerank, MMR, etc.), with the same semantics as the Cypher query cache: LRU eviction (default 1000 entries), TTL (default 5 minutes), and full invalidation on IndexNode/RemoveNode so results stay correct after index changes. All call paths (HTTP search API, Cypher vector procedures, MCP, etc.) share this cache, so repeated identical searches return immediately.

Package search provides file-backed vector storage for memory-efficient indexing.

VectorFileStore implements append-only vector storage: vectors are written to a .vec file and only id→offset metadata is kept in RAM. This allows BuildIndexes to index large datasets without holding 2–3× vector data in memory. Vectors are stored normalized (one copy per id, cosine-only) per the indexing-memory plan.

Package search provides vector indexing with cosine similarity search.

This file implements a simple but effective vector similarity search index using brute-force cosine similarity calculation. While not as sophisticated as HNSW or other approximate methods, it provides exact results and is suitable for moderate-sized datasets.

Key Features:

  • Exact cosine similarity search
  • Automatic vector normalization for performance
  • Thread-safe concurrent operations
  • Context-aware search with cancellation
  • Configurable similarity thresholds

Example Usage:

// Create vector index for 1024-dimensional embeddings
index := search.NewVectorIndex(1024)

// Add vectors to the index
embedding1 := make([]float32, 1024)
// ... populate embedding1 ...
index.Add("doc-1", embedding1)

embedding2 := make([]float32, 1024)
// ... populate embedding2 ...
index.Add("doc-2", embedding2)

// Search for similar vectors
query := make([]float32, 1024)
// ... populate query vector ...
results, err := index.Search(ctx, query, 10, 0.7)
if err != nil {
	log.Fatal(err)
}

// Process results (sorted by similarity)
for _, result := range results {
	fmt.Printf("ID: %s, Similarity: %.3f\n", result.ID, result.Score)
}

Algorithm Details:

The index uses cosine similarity, which measures the cosine of the angle between two vectors. It's particularly effective for high-dimensional embeddings where magnitude is less important than direction.

Cosine Similarity Formula:

similarity = (A · B) / (||A|| × ||B||)

Where:

  • A · B is the dot product of vectors A and B
  • ||A|| is the magnitude (L2 norm) of vector A
  • ||B|| is the magnitude (L2 norm) of vector B

Optimization:

Vectors are normalized when added to the index, so cosine similarity
becomes just the dot product, making search much faster.

Performance Characteristics:

  • Add: O(d) where d is vector dimensions
  • Search: O(n×d) where n is number of vectors
  • Memory: O(n×d) for storing normalized vectors
  • Thread-safe: Uses RWMutex for concurrent access

When to Use:

✅ Exact similarity search required
✅ Moderate dataset sizes (<100K vectors)
✅ High-dimensional embeddings (>100 dimensions)
✅ Simplicity and reliability preferred
❌ Very large datasets (>1M vectors)
❌ Approximate search acceptable
❌ Sub-linear search time required

Future Improvements:

  • HNSW implementation for O(log n) search
  • GPU acceleration for large batches
  • Quantization for memory efficiency
  • Incremental index updates

ELI12 (Explain Like I'm 12):

Think of vector similarity like comparing the "direction" things are pointing:

  1. **Vectors**: Like arrows pointing in different directions in space. Each arrow represents a document, image, or piece of text.

  2. **Cosine similarity**: Measures how much two arrows point in the same direction. If they point exactly the same way, similarity = 1. If they point opposite ways, similarity = -1.

  3. **Normalization**: We make all arrows the same length so we only care about direction, not how "strong" they are.

  4. **Search**: When you give us a query arrow, we check how similar its direction is to all the arrows we've stored, then give you the most similar ones.

It's like finding documents that "point in the same direction" as your search!

Package search - Vector search pipeline with CandidateGen + ExactScore interfaces.

This file implements the unified vector search pipeline as described in the hybrid-ann-plan.md. It provides:

  • CandidateGenerator interface for approximate candidate generation (brute/HNSW)
  • ExactScorer interface for exact scoring of candidates (CPU/GPU)
  • Auto strategy selection (brute vs HNSW based on dataset size)
  • GPU-accelerated exact scoring when available

Index

Constants

View Source
const (
	// EnvSearchBM25Engine selects the BM25 engine implementation.
	// Supported values: "v1", "v2". Default: "v2".
	EnvSearchBM25Engine = "NORNICDB_SEARCH_BM25_ENGINE"
	// EnvSearchLogTimings enables per-query timing logs for search stages.
	// When true, logs vector/BM25/fusion/total timing to help identify bottlenecks.
	EnvSearchLogTimings = "NORNICDB_SEARCH_LOG_TIMINGS"
	BM25EngineV1        = "v1"
	BM25EngineV2        = "v2"
)
View Source
const (
	// NSmallMax is the maximum dataset size for which brute-force search is used.
	// Below this threshold, brute-force is often faster than ANN overhead.
	NSmallMax = 5000

	// CandidateMultiplier determines how many candidates to generate relative to k.
	// Formula: C = max(k * CandidateMultiplier, 200)
	CandidateMultiplier = 20

	// MaxCandidates is the hard cap on candidate set size.
	MaxCandidates = 5000
)

Configuration constants for the vector search pipeline.

View Source
const DefaultMinEmbeddingsForClustering = 1000

DefaultMinEmbeddingsForClustering is the default minimum number of embeddings needed before k-means clustering provides any benefit. Below this threshold, brute-force search is faster than cluster overhead.

This value can be overridden per-service using SetMinEmbeddingsForClustering().

Performance Scaling (Real-World Benchmarks):

  • <1000 embeddings: Clustering overhead > speedup benefit
  • 2,000 embeddings: ~14% faster with clustering
  • 4,500 embeddings: ~26% faster with clustering
  • 10,000+ embeddings: 10-50x faster with clustering

Tuning Guidelines:

  • 1000 (default): Safe for most workloads, proven performance benefit
  • 500-1000: Use for latency-sensitive apps (14-26% speedup range)
  • 100-500: Testing or small datasets (verify clustering works)
  • 2000+: Very large datasets (maximize speedup, delay until more data)

Environment Variable: NORNICDB_KMEANS_MIN_EMBEDDINGS (overrides default)

View Source
const DefaultVectorDimensions = 1024

DefaultVectorDimensions is the embedding size used by NewService (e.g. bge-m3). Use NewServiceWithDimensions or schema/query-derived dimensions when your model differs.

Variables

View Source
var (
	ErrDimensionMismatch = errors.New("vector dimension mismatch")
)
View Source
var ErrSearchIndexBuilding = errors.New("search index being built, please try again when they are complete")
View Source
var SearchableProperties = []string{
	"content",
	"text",
	"title",
	"name",
	"description",
	"path",
	"workerRole",
	"requirements",
}

SearchableProperties defines PRIORITY properties for full-text search ranking. These properties are indexed first for better BM25 ranking. Note: ALL node properties are indexed, but these get priority weighting. These match Neo4j's fulltext index configuration.

Functions

func BuildIVFPQFromVectorStore

func BuildIVFPQFromVectorStore(ctx context.Context, vfs *VectorFileStore, profile IVFPQProfile, seedDocIDs []string) (*IVFPQIndex, *IVFPQBuildStats, error)

BuildIVFPQFromVectorStore builds a full IVF/PQ index from VectorFileStore.

func ConvertSearchIndexDirFromGobToMsgpack

func ConvertSearchIndexDirFromGobToMsgpack(rootDir string) error

ConvertSearchIndexDirFromGobToMsgpack converts search index .gob files under rootDir to msgpack and removes the .gob extension (writes to bm25, vectors, hnsw, hnsw_ivf/0, 1, ...). rootDir should be the search index root (e.g. data/test/search). Back up the directory before running.

func DefaultBM25Engine

func DefaultBM25Engine() string

DefaultBM25Engine returns the configured BM25 engine from environment. Valid values: "v1", "v2". Invalid/empty values fall back to "v2".

func DeriveIVFCentroidsFromClusters

func DeriveIVFCentroidsFromClusters(hnswPath string, vectorLookup VectorLookup) (centroids [][]float32, idToCluster map[string]int, err error)

DeriveIVFCentroidsFromClusters builds centroids and idToCluster from existing hnsw_ivf/ cluster files (numeric names 0, 1, 2, ...) and vectors from the vector index. No separate centroid file. Returns (nil, nil, nil) if no cluster files exist or derivation fails.

func SaveIVFHNSW

func SaveIVFHNSW(hnswPath string, clusterHNSW map[int]*HNSWIndex) error

SaveIVFHNSW persists per-cluster HNSW indexes to disk under hnsw_ivf/ alongside hnsw. hnswPath is the full path to the single HNSW file (e.g. data/search/dbname/hnsw); per-cluster files are written to hnsw_ivf/0, 1, 2, ... (no extension). Each cluster is saved as graph-only.

func SaveIVFPQBundle

func SaveIVFPQBundle(basePath string, idx *IVFPQIndex) error

SaveIVFPQBundle persists an IVFPQ index as an atomic multipart bundle.

Types

type ANNQuality

type ANNQuality string

ANNQuality selects the high-level ANN strategy mode.

const (
	ANNQualityFast       ANNQuality = "fast"
	ANNQualityBalanced   ANNQuality = "balanced"
	ANNQualityAccurate   ANNQuality = "accurate"
	ANNQualityCompressed ANNQuality = "compressed"
)

func ANNQualityFromEnv

func ANNQualityFromEnv() ANNQuality

ANNQualityFromEnv parses the global ANN quality selector. Unknown values default to fast to preserve historical behavior.

type ANNResult

type ANNResult struct {
	ID    string
	Score float32
}

ANNResult is a minimal search result from the ANN index (HNSW).

This intentionally stays small (ID + float32 score) to keep per-request allocations and copy costs low. Higher-level layers can enrich results as needed (labels, properties, etc.).

type BruteForceCandidateGen

type BruteForceCandidateGen struct {
	// contains filtered or unexported fields
}

BruteForceCandidateGen implements CandidateGenerator using brute-force search.

This is optimal for small datasets (N < NSmallMax) where ANN overhead dominates brute-force computation time.

func NewBruteForceCandidateGen

func NewBruteForceCandidateGen(vectorIndex *VectorIndex) *BruteForceCandidateGen

NewBruteForceCandidateGen creates a new brute-force candidate generator.

func (*BruteForceCandidateGen) SearchCandidates

func (b *BruteForceCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)

SearchCandidates generates candidates using brute-force search.

type BuildProgress

type BuildProgress struct {
	Ready           bool
	Building        bool
	Phase           string
	ProcessedNodes  int64
	TotalNodes      int64
	RateNodesPerSec float64
	ETASeconds      int64 // -1 when unknown
}

BuildProgress reports in-progress indexing state for UI/ops visibility.

type CPUExactScorer

type CPUExactScorer struct {
	// contains filtered or unexported fields
}

CPUExactScorer implements ExactScorer using CPU-based exact scoring.

Uses SIMD-optimized dot product for cosine similarity computation.

func NewCPUExactScorer

func NewCPUExactScorer(getter VectorGetter) *CPUExactScorer

NewCPUExactScorer creates a new CPU-based exact scorer. getter can be *VectorIndex or any type that implements GetVector (e.g. file-store lookup adapter).

func (*CPUExactScorer) ScoreCandidates

func (c *CPUExactScorer) ScoreCandidates(ctx context.Context, query []float32, candidates []Candidate) ([]ScoredCandidate, error)

ScoreCandidates computes exact scores for candidates using CPU.

type Candidate

type Candidate struct {
	ID    string
	Score float64 // Approximate score from candidate generation
}

Candidate represents a candidate vector with approximate score.

type CandidateGenerator

type CandidateGenerator interface {
	// SearchCandidates generates candidate vectors for the given query.
	//
	// Parameters:
	//   - ctx: Context for cancellation
	//   - query: Normalized query vector
	//   - k: Desired number of results
	//   - minSimilarity: Minimum similarity threshold (approximate, may be relaxed)
	//
	// Returns:
	//   - candidates: Candidate IDs with approximate scores (may be more than k)
	//   - error: Context cancellation or other errors
	SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)
}

CandidateGenerator generates candidate vectors for approximate search.

Implementations:

  • BruteForceCandidateGen: Exact search over all vectors (for small N)
  • HNSWCandidateGen: Approximate search using HNSW graph (for large N)

The generator returns candidate IDs and approximate scores. These candidates will be re-scored exactly by ExactScorer before final ranking.

type CompressedANNProfile

type CompressedANNProfile struct {
	Quality             ANNQuality
	Active              bool
	Dimensions          int
	VectorCount         int
	IVFLists            int
	PQSegments          int
	PQBits              int
	NProbe              int
	RerankTopK          int
	TrainingSampleMax   int
	KMeansMaxIterations int
	SeedMaxTerms        int
	SeedDocsPerTerm     int
	RoutingMode         string
	Diagnostics         []CompressedActivationDiagnostic
}

CompressedANNProfile is the resolved runtime contract for compressed ANN mode.

func ResolveCompressedANNProfile

func ResolveCompressedANNProfile(vectorCount, dimensions int, vectorStoreReady bool) CompressedANNProfile

ResolveCompressedANNProfile resolves compressed ANN settings and readiness. Shared knobs are reused directly where semantically compatible.

type CompressedActivationDiagnostic

type CompressedActivationDiagnostic struct {
	Code    string
	Message string
}

CompressedActivationDiagnostic captures deterministic activation decisions.

type CrossEncoder

type CrossEncoder struct {
	// contains filtered or unexported fields
}

CrossEncoder performs cross-encoder reranking.

func NewCrossEncoder

func NewCrossEncoder(config *CrossEncoderConfig) *CrossEncoder

NewCrossEncoder creates a new cross-encoder reranker.

func (*CrossEncoder) Config

func (ce *CrossEncoder) Config() *CrossEncoderConfig

Config returns the current configuration.

func (*CrossEncoder) Enabled

func (ce *CrossEncoder) Enabled() bool

func (*CrossEncoder) IsAvailable

func (ce *CrossEncoder) IsAvailable(ctx context.Context) bool

IsAvailable checks if the reranking service is available.

func (*CrossEncoder) Name

func (ce *CrossEncoder) Name() string

func (*CrossEncoder) Rerank

func (ce *CrossEncoder) Rerank(ctx context.Context, query string, candidates []RerankCandidate) ([]RerankResult, error)

Rerank takes a query and candidates, returns reranked results.

type CrossEncoderConfig

type CrossEncoderConfig struct {
	// Enabled turns on cross-encoder reranking
	Enabled bool

	// APIURL is the reranking service endpoint
	// Supports: Cohere, HuggingFace TEI, local models
	APIURL string

	// APIKey for authentication (if required)
	APIKey string

	// Model name (e.g., "cross-encoder/ms-marco-MiniLM-L-6-v2")
	Model string

	// TopK is how many candidates to rerank (default: 100)
	TopK int

	// Timeout for reranking requests
	Timeout time.Duration

	// MinScore is the minimum rerank score to include (0-1)
	MinScore float64
}

CrossEncoderConfig configures the cross-encoder reranker.

func DefaultCrossEncoderConfig

func DefaultCrossEncoderConfig() *CrossEncoderConfig

DefaultCrossEncoderConfig returns sensible defaults.

type ExactScorer

type ExactScorer interface {
	// ScoreCandidates computes exact similarity scores for the given candidates.
	//
	// Parameters:
	//   - ctx: Context for cancellation
	//   - query: Normalized query vector
	//   - candidates: Candidate IDs to score (may include approximate scores for reference)
	//
	// Returns:
	//   - scored: Candidates with exact scores
	//   - error: Context cancellation or other errors
	ScoreCandidates(ctx context.Context, query []float32, candidates []Candidate) ([]ScoredCandidate, error)
}

ExactScorer computes exact similarity scores for candidate vectors.

Implementations:

  • CPUExactScorer: CPU-based exact scoring (SIMD-optimized)
  • GPUExactScorer: GPU-accelerated exact scoring (when available)

The scorer takes candidate IDs and returns exact scores using the true metric (cosine/dot/euclid), ensuring final ranking accuracy.

type FileStoreBruteForceCandidateGen

type FileStoreBruteForceCandidateGen struct {
	// contains filtered or unexported fields
}

FileStoreBruteForceCandidateGen implements CandidateGenerator using brute-force search directly over the file-backed vector store. This is used for small datasets when vectors are not held in-memory.

func NewFileStoreBruteForceCandidateGen

func NewFileStoreBruteForceCandidateGen(vectorStore *VectorFileStore) *FileStoreBruteForceCandidateGen

NewFileStoreBruteForceCandidateGen creates a brute-force candidate generator over VectorFileStore.

func (*FileStoreBruteForceCandidateGen) SearchCandidates

func (b *FileStoreBruteForceCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)

SearchCandidates scans all vectors from the file store and returns top candidates by exact cosine score.

type FulltextBatchEntry

type FulltextBatchEntry struct {
	ID   string
	Text string
}

FulltextBatchEntry is one (id, text) pair for IndexBatch.

type FulltextIndex

type FulltextIndex struct {
	// contains filtered or unexported fields
}

FulltextIndex provides BM25-based full-text search. It indexes documents and supports keyword search with TF-IDF scoring.

func NewFulltextIndex

func NewFulltextIndex() *FulltextIndex

NewFulltextIndex creates a new full-text search index.

func (*FulltextIndex) Clear

func (f *FulltextIndex) Clear()

Clear removes all indexed documents and terms. This is used to free RAM after persisting during bulk builds.

func (*FulltextIndex) Count

func (f *FulltextIndex) Count() int

Count returns the number of indexed documents.

func (*FulltextIndex) GetDocument

func (f *FulltextIndex) GetDocument(id string) (string, bool)

GetDocument retrieves the original text for a document.

func (*FulltextIndex) Index

func (f *FulltextIndex) Index(id string, text string)

Index adds or updates a document in the index.

func (*FulltextIndex) IndexBatch

func (f *FulltextIndex) IndexBatch(entries []FulltextBatchEntry)

IndexBatch adds or updates many documents under one lock and updates avgDocLength once at the end. Use this during bulk build to reduce lock contention and avoid O(N) avg updates.

func (*FulltextIndex) IsDirty

func (f *FulltextIndex) IsDirty() bool

IsDirty reports whether the index has changes not yet persisted to disk.

func (*FulltextIndex) LexicalSeedDocIDs

func (f *FulltextIndex) LexicalSeedDocIDs(maxTerms, docsPerTerm int) []string

LexicalSeedDocIDs returns document IDs selected from high-IDF terms to provide lexical priors for clustering seed selection.

func (*FulltextIndex) Load

func (f *FulltextIndex) Load(path string) error

Load replaces the fulltext index with the one stored at path (msgpack format). If the file does not exist or decode fails, the index is left empty (or unchanged on missing file) and no error is returned, so the caller can proceed with a full rebuild. Returns an error only for unexpected I/O (e.g. permission denied).

func (*FulltextIndex) PhraseSearch

func (f *FulltextIndex) PhraseSearch(phrase string, limit int) []indexResult

PhraseSearch searches for an exact phrase match. Returns documents containing the exact phrase.

func (*FulltextIndex) Remove

func (f *FulltextIndex) Remove(id string)

Remove removes a document from the index.

func (*FulltextIndex) Save

func (f *FulltextIndex) Save(path string) error

Save writes the fulltext index to path (msgpack format). Dir is created if needed. Copies index data under a short read lock so I/O does not block Search/IndexNode.

func (*FulltextIndex) SaveNoCopy

func (f *FulltextIndex) SaveNoCopy(path string) error

SaveNoCopy writes the fulltext index to path without deep-copying the maps. This avoids doubling memory during large builds, but holds the read lock during I/O. Use during BuildIndexes when search is not serving live traffic.

func (*FulltextIndex) Search

func (f *FulltextIndex) Search(query string, limit int) []indexResult

Search performs BM25 keyword search. Returns results sorted by BM25 score (highest first).

type FulltextIndexV2

type FulltextIndexV2 struct {
	// contains filtered or unexported fields
}

FulltextIndexV2 provides a BM25 index optimized for large datasets. It stores compact postings (docNum/tf) and uses bounded prefix expansion + top-k scoring.

func NewFulltextIndexV2

func NewFulltextIndexV2() *FulltextIndexV2

func (*FulltextIndexV2) Clear

func (f *FulltextIndexV2) Clear()

func (*FulltextIndexV2) Count

func (f *FulltextIndexV2) Count() int

func (*FulltextIndexV2) GetDocument

func (f *FulltextIndexV2) GetDocument(id string) (string, bool)

func (*FulltextIndexV2) Index

func (f *FulltextIndexV2) Index(id string, text string)

func (*FulltextIndexV2) IndexBatch

func (f *FulltextIndexV2) IndexBatch(entries []FulltextBatchEntry)

func (*FulltextIndexV2) IsDirty

func (f *FulltextIndexV2) IsDirty() bool

func (*FulltextIndexV2) LexicalSeedDocIDs

func (f *FulltextIndexV2) LexicalSeedDocIDs(maxTerms, docsPerTerm int) []string

func (*FulltextIndexV2) Load

func (f *FulltextIndexV2) Load(path string) error

func (*FulltextIndexV2) PhraseSearch

func (f *FulltextIndexV2) PhraseSearch(phrase string, limit int) []indexResult

func (*FulltextIndexV2) Remove

func (f *FulltextIndexV2) Remove(id string)

func (*FulltextIndexV2) Save

func (f *FulltextIndexV2) Save(path string) error

func (*FulltextIndexV2) SaveNoCopy

func (f *FulltextIndexV2) SaveNoCopy(path string) error

func (*FulltextIndexV2) Search

func (f *FulltextIndexV2) Search(query string, limit int) []indexResult

type GPUBruteForceCandidateGen

type GPUBruteForceCandidateGen struct {
	// contains filtered or unexported fields
}

GPUBruteForceCandidateGen uses gpu.EmbeddingIndex.Search() as an exact candidate generator.

Note: gpu.EmbeddingIndex.Search() does not currently accept a context, so this generator cannot cancel mid-kernel. It checks ctx before invoking the search.

func NewGPUBruteForceCandidateGen

func NewGPUBruteForceCandidateGen(embeddingIndex *gpu.EmbeddingIndex) *GPUBruteForceCandidateGen

func (*GPUBruteForceCandidateGen) SearchCandidates

func (g *GPUBruteForceCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)

type GPUExactScorer

type GPUExactScorer struct {
	// contains filtered or unexported fields
}

GPUExactScorer implements ExactScorer using GPU-accelerated exact scoring.

Falls back to CPU if GPU is unavailable or unhealthy.

func NewGPUExactScorer

func NewGPUExactScorer(embeddingIndex *gpu.EmbeddingIndex, cpuFallback *CPUExactScorer) *GPUExactScorer

NewGPUExactScorer creates a new GPU-based exact scorer.

func (*GPUExactScorer) ScoreCandidates

func (g *GPUExactScorer) ScoreCandidates(ctx context.Context, query []float32, candidates []Candidate) ([]ScoredCandidate, error)

ScoreCandidates computes exact scores for candidates using GPU when available.

Note: Since EmbeddingIndex.Search() searches all vectors, we use it for candidate scoring by searching with a large k, then filtering to only the candidates we care about. This is not optimal but works with the current EmbeddingIndex API. A future PR will add ScoreSubset() for true subset scoring.

type GPUKMeansCandidateGen

type GPUKMeansCandidateGen struct {
	// contains filtered or unexported fields
}

GPUKMeansCandidateGen routes queries to the nearest k-means clusters and then scores only the vectors in those clusters using the GPU embedding index when available.

This is used as a high-throughput fallback when:

  • GPU brute-force is enabled but the dataset is outside the configured full-scan range, and
  • clustering is available (centroids + cluster membership already built).

It preserves correctness by returning exact cosine scores for the candidate set. When the GPU cannot score (not synced / unhealthy), ScoreSubset falls back to CPU.

func NewGPUKMeansCandidateGen

func NewGPUKMeansCandidateGen(clusterIndex *gpu.ClusterIndex, numClustersToSearch int) *GPUKMeansCandidateGen

func (*GPUKMeansCandidateGen) SearchCandidates

func (g *GPUKMeansCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)

func (*GPUKMeansCandidateGen) SetClusterSelector

func (g *GPUKMeansCandidateGen) SetClusterSelector(fn func(ctx context.Context, query []float32, defaultN int) []int) *GPUKMeansCandidateGen

SetClusterSelector sets an optional custom cluster selector used for routing.

type HNSWCandidateGen

type HNSWCandidateGen struct {
	// contains filtered or unexported fields
}

HNSWCandidateGen implements CandidateGenerator using HNSW approximate search.

This is optimal for large datasets (N >= NSmallMax) where ANN provides significant speedup over brute-force.

func NewHNSWCandidateGen

func NewHNSWCandidateGen(hnswIndex *HNSWIndex) *HNSWCandidateGen

NewHNSWCandidateGen creates a new HNSW candidate generator.

func (*HNSWCandidateGen) SearchCandidates

func (h *HNSWCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)

SearchCandidates generates candidates using HNSW approximate search.

type HNSWConfig

type HNSWConfig struct {
	M               int     // Max connections per node per layer (default: 16)
	EfConstruction  int     // Candidate list size during construction (default: 200)
	EfSearch        int     // Candidate list size during search (default: 100)
	LevelMultiplier float64 // Level multiplier = 1/ln(M)
}

HNSWConfig contains configuration parameters for the HNSW index.

func DefaultHNSWConfig

func DefaultHNSWConfig() HNSWConfig

DefaultHNSWConfig returns sensible defaults for HNSW index.

func HNSWConfigFromEnv

func HNSWConfigFromEnv() HNSWConfig

HNSWConfigFromEnv loads HNSW configuration from environment variables.

Environment Variables:

  • NORNICDB_VECTOR_ANN_QUALITY: Quality preset (fast|balanced|accurate|compressed, default: fast)
  • NORNICDB_VECTOR_HNSW_M: Max connections per node (default: based on preset)
  • NORNICDB_VECTOR_HNSW_EF_CONSTRUCTION: Candidate list size during construction (default: based on preset)
  • NORNICDB_VECTOR_HNSW_EF_SEARCH: Candidate list size during search (default: based on preset)

Quality Presets:

  • fast: M=16, efConstruction=100, efSearch=50 (faster, lower recall)
  • balanced: M=16, efConstruction=200, efSearch=100 (good balance)
  • accurate: M=32, efConstruction=400, efSearch=200 (slower, higher recall)
  • compressed: parsed at ANN quality layer; maps to fast HNSW defaults until compressed ANN strategy routing is active

Example:

// Use fast preset
os.Setenv("NORNICDB_VECTOR_ANN_QUALITY", "fast")
config := HNSWConfigFromEnv()

// Override specific parameter
os.Setenv("NORNICDB_VECTOR_ANN_QUALITY", "balanced")
os.Setenv("NORNICDB_VECTOR_HNSW_EF_SEARCH", "150")
config := HNSWConfigFromEnv()

type HNSWIndex

type HNSWIndex struct {
	// contains filtered or unexported fields
}

HNSWIndex provides fast approximate nearest neighbor search using HNSW algorithm.

func LoadHNSWIndex

func LoadHNSWIndex(path string, vectorLookup VectorLookup) (*HNSWIndex, error)

LoadHNSWIndex loads an HNSW index from path (msgpack format) and returns it. For graph-only format (1.1.0), vectorLookup must be non-nil and vectors are resolved by ID at search time (no in-memory vector copy in HNSW). For legacy full format (1.0.0), vectorLookup is ignored. If the file does not exist or decode fails, returns (nil, nil) so the caller can rebuild. Returns an error only for unexpected I/O (e.g. permission denied).

func LoadIVFHNSWCluster

func LoadIVFHNSWCluster(hnswPath string, clusterID int, vectorLookup VectorLookup) (*HNSWIndex, error)

LoadIVFHNSWCluster loads one cluster's HNSW index from hnsw_ivf/cid in lookup mode (no vector copy in HNSW RAM). Returns (nil, nil) if the file is missing or invalid (caller can build). hnswPath is the full path to the single HNSW file (e.g. data/search/dbname/hnsw).

func NewHNSWIndex

func NewHNSWIndex(dimensions int, config HNSWConfig) *HNSWIndex

NewHNSWIndex creates a new HNSW index with the given dimensions and config.

func (*HNSWIndex) Add

func (h *HNSWIndex) Add(id string, vec []float32) error

Add inserts a vector into the index.

func (*HNSWIndex) Clear

func (h *HNSWIndex) Clear()

Clear removes all vectors from the index and resets it to an empty state. This frees memory by clearing all internal arrays and maps. Use this when you need to completely reset the index (e.g., after deleting a collection).

func (*HNSWIndex) Config

func (h *HNSWIndex) Config() HNSWConfig

Config returns a copy of the index configuration.

func (*HNSWIndex) GetDimensions

func (h *HNSWIndex) GetDimensions() int

GetDimensions returns the vector dimension of the index.

func (*HNSWIndex) Remove

func (h *HNSWIndex) Remove(id string)

Remove removes a vector from the index by ID.

func (*HNSWIndex) Save

func (h *HNSWIndex) Save(path string) error

Save writes the HNSW index to path (msgpack format) as graph-only: graph structure and IDs only, no vector data. Vectors are always loaded from the vector index (vectors) on load, so the file stays small. Dir is created if needed. Copies index data under a short read lock so I/O does not block Search/Add/Remove.

func (*HNSWIndex) Search

func (h *HNSWIndex) Search(ctx context.Context, query []float32, k int, minSimilarity float64) ([]ANNResult, error)

Search finds the k nearest neighbors to the query vector.

func (*HNSWIndex) SearchWithEf

func (h *HNSWIndex) SearchWithEf(ctx context.Context, query []float32, k int, minSimilarity float64, ef int) ([]ANNResult, error)

SearchWithEf finds the k nearest neighbors using a caller-provided `ef`.

In Qdrant terms, `ef` is the beam size for HNSW search: larger values improve recall and usually increase latency. If `ef <= 0`, this falls back to the index's configured `EfSearch`.

func (*HNSWIndex) SetVectorLookup

func (h *HNSWIndex) SetVectorLookup(lookup VectorLookup)

SetVectorLookup sets an optional lookup so vectors are resolved by ID at search time instead of from the in-memory slice. When set, Add does not store vectors (vecOff = -1); Load can leave vectors empty and use the lookup. Saves one full vector copy in RAM.

func (*HNSWIndex) ShouldRebuild

func (h *HNSWIndex) ShouldRebuild() bool

ShouldRebuild returns true if the index has accumulated too many tombstones and should be rebuilt to free memory. Threshold is 50% deleted vectors.

func (*HNSWIndex) Size

func (h *HNSWIndex) Size() int

Size returns the number of vectors in the index.

func (*HNSWIndex) TombstoneRatio

func (h *HNSWIndex) TombstoneRatio() float64

TombstoneRatio returns the ratio of deleted vectors to total vectors. Returns 0.0 if there are no vectors. A high ratio (>0.5) indicates the index should be rebuilt to free memory.

func (*HNSWIndex) Update

func (h *HNSWIndex) Update(id string, vec []float32) error

Update updates an existing vector in the index.

Update policy: Remove + Add pattern

  • Removes the old vector and all its connections
  • Adds the new vector with fresh connections
  • This ensures graph structure is correctly maintained

If the vector doesn't exist, this is equivalent to Add().

Performance: O(M * log(N)) where M is max connections, N is dataset size For high-churn workloads, consider periodic rebuilds to restore graph quality.

type HNSWQualityPreset

type HNSWQualityPreset string

HNSWQualityPreset defines quality presets for HNSW tuning.

const (
	// QualityFast prioritizes speed over recall.
	// Lower efSearch, lower candidate_multiplier for faster queries.
	QualityFast HNSWQualityPreset = "fast"

	// QualityBalanced provides a good balance between speed and recall.
	// Default preset with reasonable defaults.
	QualityBalanced HNSWQualityPreset = "balanced"

	// QualityAccurate prioritizes recall over speed.
	// Higher efSearch and/or bigger candidate pool for better accuracy.
	QualityAccurate HNSWQualityPreset = "accurate"
)

type IVFHNSWCandidateGen

type IVFHNSWCandidateGen struct {
	// contains filtered or unexported fields
}

IVFHNSWCandidateGen implements IVF-HNSW: centroid routing (IVF) into per-cluster HNSW indexes.

This is designed for large CPU-only datasets:

  • K-means provides a coarse routing layer (choose nearest clusters)
  • Per-cluster HNSW provides fast ANN within each cluster

For GPU-enabled setups, prefer the existing GPU brute-force and GPU k-means paths.

func NewIVFHNSWCandidateGen

func NewIVFHNSWCandidateGen(clusterIndex *gpu.ClusterIndex, getClusterHNSW clusterHNSWLookup, numClustersToSearch int) *IVFHNSWCandidateGen

func (*IVFHNSWCandidateGen) SearchCandidates

func (g *IVFHNSWCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)

func (*IVFHNSWCandidateGen) SetClusterSelector

func (g *IVFHNSWCandidateGen) SetClusterSelector(fn func(ctx context.Context, query []float32, defaultN int) []int) *IVFHNSWCandidateGen

SetClusterSelector sets an optional custom cluster selector used for routing.

type IVFPQBuildStats

type IVFPQBuildStats struct {
	VectorCount         int
	TrainingSampleCount int
	ListCount           int
	AvgListSize         float64
	MaxListSize         int
	BytesPerVector      float64
	BuildDuration       time.Duration
}

IVFPQBuildStats captures build observability for acceptance gates.

type IVFPQCandidateGen

type IVFPQCandidateGen struct {
	// contains filtered or unexported fields
}

IVFPQCandidateGen implements CandidateGenerator using IVF/PQ compressed ANN.

func NewIVFPQCandidateGen

func NewIVFPQCandidateGen(index *IVFPQIndex, nprobe int) *IVFPQCandidateGen

func (*IVFPQCandidateGen) SearchCandidates

func (g *IVFPQCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)

type IVFPQIndex

type IVFPQIndex struct {
	// contains filtered or unexported fields
}

IVFPQIndex stores a compressed IVF/PQ ANN structure.

func LoadIVFPQBundle

func LoadIVFPQBundle(basePath string) (*IVFPQIndex, error)

LoadIVFPQBundle loads an IVFPQ multipart snapshot bundle.

func (*IVFPQIndex) Count

func (i *IVFPQIndex) Count() int

func (*IVFPQIndex) Profile

func (i *IVFPQIndex) Profile() IVFPQProfile

func (*IVFPQIndex) SearchApprox

func (i *IVFPQIndex) SearchApprox(ctx context.Context, query []float32, k int, minSimilarity float64, nprobe int) ([]Candidate, error)

SearchApprox searches compressed IVF/PQ lists and returns approximate candidates.

type IVFPQProfile

type IVFPQProfile struct {
	Dimensions          int
	IVFLists            int
	PQSegments          int
	PQBits              int
	NProbe              int
	RerankTopK          int
	TrainingSampleMax   int
	KMeansMaxIterations int
}

IVFPQProfile is the concrete runtime profile used to build/query compressed ANN.

type IdentityExactScorer

type IdentityExactScorer struct{}

IdentityExactScorer is used when the candidate generator already returns exact scores.

func (*IdentityExactScorer) ScoreCandidates

func (i *IdentityExactScorer) ScoreCandidates(ctx context.Context, query []float32, candidates []Candidate) ([]ScoredCandidate, error)

type KMeansCandidateGen

type KMeansCandidateGen struct {
	// contains filtered or unexported fields
}

KMeansCandidateGen implements CandidateGenerator using k-means cluster routing.

This candidate generator:

  1. Finds the top numClustersToSearch clusters nearest to the query
  2. Gets all node IDs from those clusters as candidates
  3. Returns candidates with approximate scores (centroid similarity)

This is optimal for very large datasets (N > 100K) where cluster routing provides significant speedup. For smaller datasets, HNSW or brute-force may be faster.

func NewKMeansCandidateGen

func NewKMeansCandidateGen(clusterIndex *gpu.ClusterIndex, vectorIndex *VectorIndex, numClustersToSearch int) *KMeansCandidateGen

NewKMeansCandidateGen creates a new k-means candidate generator.

Parameters:

  • clusterIndex: The GPU ClusterIndex (must be clustered)
  • vectorIndex: The VectorIndex for ID mapping and fallback
  • numClustersToSearch: Number of clusters to search (default: 3)

func (*KMeansCandidateGen) SearchCandidates

func (k *KMeansCandidateGen) SearchCandidates(ctx context.Context, query []float32, limit int, minSimilarity float64) ([]Candidate, error)

SearchCandidates generates candidates using k-means cluster routing.

Algorithm:

  1. Find top numClustersToSearch clusters nearest to query (centroid similarity)
  2. Get all node IDs from those clusters
  3. Return candidates with approximate scores (centroid similarity)

If clustering is not available or not clustered, falls back to brute-force.

func (*KMeansCandidateGen) SetClusterSelector

func (k *KMeansCandidateGen) SetClusterSelector(fn func(ctx context.Context, query []float32, defaultN int) []int) *KMeansCandidateGen

SetClusterSelector sets an optional custom cluster selector used for routing. When nil, the generator uses the default nearest-centroid routing.

type KalmanSearchAdapter

type KalmanSearchAdapter struct {
	// contains filtered or unexported fields
}

KalmanSearchAdapter wraps a search Service with Kalman filtering.

func NewKalmanSearchAdapter

func NewKalmanSearchAdapter(service *Service, config KalmanSearchConfig) *KalmanSearchAdapter

NewKalmanSearchAdapter creates a new Kalman-enhanced search adapter.

Example:

service := search.NewService(engine, 1024)
adapter := search.NewKalmanSearchAdapter(service, search.DefaultKalmanSearchConfig())

// Search with enhanced ranking
results, _ := adapter.Search(ctx, "semantic search", embedding, &SearchOptions{Limit: 10})

func (*KalmanSearchAdapter) GetDocumentRelevanceTrend

func (ka *KalmanSearchAdapter) GetDocumentRelevanceTrend(docID string) float64

GetDocumentRelevanceTrend returns whether a document is becoming more or less relevant.

Returns:

  • positive velocity: document is becoming more relevant (appearing higher)
  • negative velocity: document is becoming less relevant (appearing lower)
  • zero: stable relevance

func (*KalmanSearchAdapter) GetFallingDocuments

func (ka *KalmanSearchAdapter) GetFallingDocuments(maxVelocity float64) []string

GetFallingDocuments returns documents whose relevance is decreasing.

func (*KalmanSearchAdapter) GetLatencyTrend

func (ka *KalmanSearchAdapter) GetLatencyTrend() float64

GetLatencyTrend returns the current latency velocity (positive = getting slower).

func (*KalmanSearchAdapter) GetPredictedLatency

func (ka *KalmanSearchAdapter) GetPredictedLatency(stepsAhead int) float64

GetPredictedLatency returns the predicted latency for the next query.

func (*KalmanSearchAdapter) GetRisingDocuments

func (ka *KalmanSearchAdapter) GetRisingDocuments(minVelocity float64) []string

GetRisingDocuments returns documents whose relevance is increasing.

func (*KalmanSearchAdapter) GetService

func (ka *KalmanSearchAdapter) GetService() *Service

GetService returns the underlying search service.

func (*KalmanSearchAdapter) GetStats

func (ka *KalmanSearchAdapter) GetStats() SearchAdapterStats

GetStats returns adapter statistics.

func (*KalmanSearchAdapter) Reset

func (ka *KalmanSearchAdapter) Reset()

Reset clears all cached data and filters.

func (*KalmanSearchAdapter) Search

func (ka *KalmanSearchAdapter) Search(ctx context.Context, query string, embedding []float32, opts *SearchOptions) (*SearchResponse, error)

Search performs a Kalman-enhanced search.

The enhancement pipeline:

  1. Execute underlying search
  2. Smooth similarity scores with Kalman filter
  3. Apply ranking stability boost
  4. Track latency for prediction
  5. Return enhanced results

type KalmanSearchConfig

type KalmanSearchConfig struct {
	// EnableScoreSmoothing enables Kalman filtering of similarity scores
	EnableScoreSmoothing bool

	// EnableRankingStability applies stability boost to consistent results
	EnableRankingStability bool

	// EnableLatencyPrediction tracks and predicts query latency
	EnableLatencyPrediction bool

	// SimilarityConfig for score smoothing
	SimilarityConfig filter.Config

	// LatencyConfig for latency prediction
	LatencyConfig filter.Config

	// StabilityBoost is the boost factor for consistently-ranked documents
	StabilityBoost float64

	// StabilityWindow is how many recent queries to consider for stability
	StabilityWindow int

	// ScoreHistoryLimit is max history per document
	ScoreHistoryLimit int
}

KalmanSearchConfig holds configuration for the Kalman-enhanced search adapter.

func DefaultKalmanSearchConfig

func DefaultKalmanSearchConfig() KalmanSearchConfig

DefaultKalmanSearchConfig returns sensible defaults.

type LLMFunc

type LLMFunc func(ctx context.Context, prompt string) (string, error)

LLMFunc is a minimal, dependency-free function signature for calling an LLM. It is intentionally generic so the search package does not depend on Heimdall.

Implementations should be safe for concurrent use.

type LLMReranker

type LLMReranker struct {
	// contains filtered or unexported fields
}

LLMReranker performs Stage-2 reranking using an LLM (e.g., Heimdall).

The model is prompted to output JSON ranking information using candidate indices (not IDs) to keep responses short and parsing robust.

func NewLLMReranker

func NewLLMReranker(cfg *LLMRerankerConfig, llm LLMFunc) *LLMReranker

NewLLMReranker creates a new LLM reranker.

func (*LLMReranker) Enabled

func (r *LLMReranker) Enabled() bool

func (*LLMReranker) IsAvailable

func (r *LLMReranker) IsAvailable(ctx context.Context) bool

func (*LLMReranker) Name

func (r *LLMReranker) Name() string

func (*LLMReranker) Rerank

func (r *LLMReranker) Rerank(ctx context.Context, query string, candidates []RerankCandidate) ([]RerankResult, error)

Rerank takes a query and candidates, returns reranked results.

It is fail-open: if the LLM errors or returns malformed output, it returns the original ranking (pass-through).

type LLMRerankerConfig

type LLMRerankerConfig struct {
	Enabled bool

	// Timeout bounds how long a single LLM rerank call can take.
	Timeout time.Duration

	// MaxCandidates caps how many candidates are included in one prompt.
	// This protects against long prompts on large candidate sets.
	MaxCandidates int

	// MaxDocChars truncates each candidate's text content to this many characters.
	MaxDocChars int

	// MaxQueryChars truncates the query string included in the prompt.
	MaxQueryChars int

	// MinScore filters candidates whose LLM score is below this value.
	// A value of 0 means "no filtering".
	MinScore float64
}

LLMRerankerConfig controls LLM-based reranking behavior.

This reranker is designed to be "fail-open": on errors or malformed output, it returns the original candidate order (pass-through).

func DefaultLLMRerankerConfig

func DefaultLLMRerankerConfig() *LLMRerankerConfig

DefaultLLMRerankerConfig returns conservative defaults intended for small local models.

type LocalReranker

type LocalReranker struct {
	// contains filtered or unexported fields
}

LocalReranker performs Stage-2 reranking using a local GGUF reranker model.

func NewLocalReranker

func NewLocalReranker(scorer RerankScorer, cfg *LocalRerankerConfig) *LocalReranker

NewLocalReranker creates a reranker that uses the given scorer (e.g. *localllm.RerankerModel).

func (*LocalReranker) Enabled

func (r *LocalReranker) Enabled() bool

func (*LocalReranker) IsAvailable

func (r *LocalReranker) IsAvailable(ctx context.Context) bool

func (*LocalReranker) Name

func (r *LocalReranker) Name() string

func (*LocalReranker) Rerank

func (r *LocalReranker) Rerank(ctx context.Context, query string, candidates []RerankCandidate) ([]RerankResult, error)

Rerank scores each candidate with the scorer and returns results sorted by score (desc). Fail-open: on error returns original order.

type LocalRerankerConfig

type LocalRerankerConfig struct {
	Enabled bool

	Timeout time.Duration

	// MaxCandidates caps how many candidates are scored per query.
	MaxCandidates int

	// MaxDocChars truncates each candidate's content before scoring.
	MaxDocChars int

	// MinScore filters out candidates below this score (0 = no filter).
	MinScore float64
}

LocalRerankerConfig configures local GGUF reranker behavior.

func DefaultLocalRerankerConfig

func DefaultLocalRerankerConfig() *LocalRerankerConfig

DefaultLocalRerankerConfig returns sensible defaults for BGE-style reranking.

type NodeIterator

type NodeIterator interface {
	IterateNodes(fn func(*storage.Node) bool) error
}

NodeIterator is an interface for streaming node iteration.

type RerankCandidate

type RerankCandidate struct {
	ID      string
	Content string
	Score   float64 // Original score (from bi-encoder)
}

RerankCandidate represents a document to be reranked.

type RerankResult

type RerankResult struct {
	ID           string
	Content      string
	OriginalRank int
	NewRank      int
	BiScore      float64 // Original bi-encoder score
	CrossScore   float64 // Cross-encoder score
	FinalScore   float64 // Combined or cross-encoder score
}

RerankResult is a reranked document with new score.

type RerankScorer

type RerankScorer interface {
	Score(ctx context.Context, query, document string) (float32, error)
}

RerankScorer scores a (query, document) pair for relevance. Implementations are typically *localllm.RerankerModel (loaded from GGUF).

type Reranker

type Reranker interface {
	// Name identifies the reranker implementation for observability.
	// Examples: "cross_encoder", "heimdall_llm".
	Name() string

	// Enabled indicates whether reranking is configured/enabled.
	Enabled() bool

	// IsAvailable is an optional health check. Implementations should keep this
	// cheap; Search() should not call it on the hot path.
	IsAvailable(ctx context.Context) bool

	// Rerank reorders candidates for a query.
	Rerank(ctx context.Context, query string, candidates []RerankCandidate) ([]RerankResult, error)
}

Reranker is a Stage-2 reranking component.

Implementations MUST be fail-open: if reranking cannot be performed (service unavailable, parse error, timeout), they should return a pass-through ranking rather than failing the overall search request.

type ScoredCandidate

type ScoredCandidate struct {
	ID    string
	Score float64 // Exact score from exact scorer
}

ScoredCandidate represents a candidate with exact score.

type SearchAdapterStats

type SearchAdapterStats struct {
	TotalQueries       int64
	ScoresSmoothed     int64
	StabilityApplied   int64
	LatencyPredictions int64
	AverageLatencyMs   float64
	PredictedLatencyMs float64
}

SearchAdapterStats holds adapter statistics.

type SearchCandidate

type SearchCandidate struct {
	ID    string
	Score float64
}

SearchCandidate is a lightweight vector-search result: just the ID and score.

This is intended for high-throughput call paths that don’t require node enrichment (e.g. Qdrant-compatible gRPC searches that return IDs+scores).

type SearchMetrics

type SearchMetrics struct {
	VectorSearchTimeMs int `json:"vector_search_time_ms"`
	BM25SearchTimeMs   int `json:"bm25_search_time_ms"`
	FusionTimeMs       int `json:"fusion_time_ms"`
	TotalTimeMs        int `json:"total_time_ms"`
	VectorCandidates   int `json:"vector_candidates"`
	BM25Candidates     int `json:"bm25_candidates"`
	FusedCandidates    int `json:"fused_candidates"`
}

SearchMetrics contains timing and statistics.

type SearchOptions

type SearchOptions struct {
	// Limit is the maximum number of results to return
	Limit int

	// MinSimilarity is the minimum similarity threshold for vector search.
	// nil = use service default, otherwise use the provided value.
	MinSimilarity *float64

	// Types filters results by node type (labels)
	Types []string

	// RRF configuration
	RRFK         float64 // RRF constant (default: 60)
	VectorWeight float64 // Weight for vector results (default: 1.0)
	BM25Weight   float64 // Weight for BM25 results (default: 1.0)
	MinRRFScore  float64 // Minimum RRF score threshold (default: 0.01)

	// MMR (Maximal Marginal Relevance) diversification
	// When enabled, results are re-ranked to balance relevance with diversity
	MMREnabled bool    // Enable MMR diversification (default: false)
	MMRLambda  float64 // Balance: 1.0 = pure relevance, 0.0 = pure diversity (default: 0.7)

	// Cross-encoder reranking (Stage 2)
	// When enabled, top candidates are re-scored using a cross-encoder model
	// for higher accuracy at the cost of latency
	RerankEnabled  bool    // Enable cross-encoder reranking (default: false)
	RerankTopK     int     // How many candidates to rerank (default: 100)
	RerankMinScore float64 // Minimum cross-encoder score to include (default: 0)
}

SearchOptions configures the search behavior.

func DefaultSearchOptions

func DefaultSearchOptions() *SearchOptions

DefaultSearchOptions returns sensible defaults.

func GetAdaptiveRRFConfig

func GetAdaptiveRRFConfig(query string) *SearchOptions

GetAdaptiveRRFConfig returns optimized RRF weights based on query characteristics.

This function analyzes the query and adjusts weights to favor the search method most likely to perform well:

  • Short queries (1-2 words): Favor BM25 keyword matching Example: "python" or "graph database" Weights: Vector=0.5, BM25=1.5

  • Long queries (6+ words): Favor vector semantic understanding Example: "How do I implement a distributed consensus algorithm?" Weights: Vector=1.5, BM25=0.5

  • Medium queries (3-5 words): Balanced approach Example: "machine learning algorithms" Weights: Vector=1.0, BM25=1.0

Why this works:

  • Short queries lack context → keywords more reliable
  • Long queries have semantic meaning → embeddings capture intent better

Example:

// Automatic adaptation
query1 := "database"
opts1 := search.GetAdaptiveRRFConfig(query1)
fmt.Printf("Short query weights: V=%.1f, B=%.1f\n",
	opts1.VectorWeight, opts1.BM25Weight)
// Output: V=0.5, B=1.5 (favors keywords)

query2 := "What are the best practices for scaling graph databases?"
opts2 := search.GetAdaptiveRRFConfig(query2)
fmt.Printf("Long query weights: V=%.1f, B=%.1f\n",
	opts2.VectorWeight, opts2.BM25Weight)
// Output: V=1.5, B=0.5 (favors semantics)

Returns SearchOptions with adapted weights. Other options (Limit, MinSimilarity) are set to defaults.

func (*SearchOptions) GetMinSimilarity

func (o *SearchOptions) GetMinSimilarity(fallback float64) float64

GetMinSimilarity returns the MinSimilarity value, or the fallback if nil.

type SearchResponse

type SearchResponse struct {
	Status            string         `json:"status"`
	Query             string         `json:"query"`
	Results           []SearchResult `json:"results"`
	TotalCandidates   int            `json:"total_candidates"`
	Returned          int            `json:"returned"`
	SearchMethod      string         `json:"search_method"`
	FallbackTriggered bool           `json:"fallback_triggered"`
	Message           string         `json:"message,omitempty"`
	Metrics           *SearchMetrics `json:"metrics,omitempty"`
}

SearchResponse is the response from a search operation.

type SearchResult

type SearchResult struct {
	ID             string         `json:"id"`
	NodeID         storage.NodeID `json:"nodeId"`
	Type           string         `json:"type"`
	Labels         []string       `json:"labels"`
	Title          string         `json:"title,omitempty"`
	Description    string         `json:"description,omitempty"`
	ContentPreview string         `json:"content_preview,omitempty"`
	Properties     map[string]any `json:"properties,omitempty"`

	// Scoring
	Score      float64 `json:"score"`
	Similarity float64 `json:"similarity,omitempty"`

	// RRF metadata (vector_rank/bm25_rank are always emitted so clients see original
	// ranks even when Stage-2 reranking is applied; 0 means not in that result set)
	RRFScore   float64 `json:"rrf_score,omitempty"`
	VectorRank int     `json:"vector_rank"`
	BM25Rank   int     `json:"bm25_rank"`
}

SearchResult represents a unified search result.

type Service

type Service struct {
	// contains filtered or unexported fields
}

Service provides unified hybrid search with automatic index management.

The Service maintains:

  • Vector index (HNSW for fast approximate nearest neighbor search)
  • Full-text index (BM25 with inverted index)
  • Connection to storage engine for node data enrichment

Thread-safe: Multiple goroutines can call Search() concurrently.

Example:

svc := search.NewService(engine)
defer svc.Close()

// Index existing data
if err := svc.BuildIndexes(ctx); err != nil {
	log.Fatal(err)
}

// Index new nodes as they're created
node := &storage.Node{...}
if err := svc.IndexNode(node); err != nil {
	log.Printf("Failed to index: %v", err)
}

func NewService

func NewService(engine storage.Engine) *Service

func NewServiceWithDimensions

func NewServiceWithDimensions(engine storage.Engine, dimensions int) *Service

NewServiceWithDimensions creates a search Service with the specified embedding dimensions. Use this when your embedding model produces vectors of a different size than the default 1024.

Example:

// For Apple Intelligence embeddings (512 dimensions)
svc := search.NewServiceWithDimensions(engine, 512)

// For OpenAI text-embedding-3-small (1536 dimensions)
svc := search.NewServiceWithDimensions(engine, 1536)

func NewServiceWithDimensionsAndBM25Engine

func NewServiceWithDimensionsAndBM25Engine(engine storage.Engine, dimensions int, bm25Engine string) *Service

NewServiceWithDimensionsAndBM25Engine creates a search Service with explicit BM25 engine selection. bm25Engine values: "v1", "v2" (invalid values default to "v1").

func (*Service) BM25Engine

func (s *Service) BM25Engine() string

BM25Engine returns the configured BM25 engine for this service ("v1" or "v2").

func (*Service) BuildInProgress

func (s *Service) BuildInProgress() bool

BuildInProgress reports whether BuildIndexes is currently running.

func (*Service) BuildIndexes

func (s *Service) BuildIndexes(ctx context.Context) error

BuildIndexes builds search indexes from all nodes in the engine. Call this after storage (and WAL recovery) so indexes reflect durable state. When both fulltext and vector index paths are set, tries to load both from disk; if both load with count > 0 (and semver format version matches), the full iteration is skipped. Otherwise iterates over storage and saves both indexes at the end when paths are set.

func (*Service) ClearVectorIndex

func (s *Service) ClearVectorIndex()

ClearVectorIndex removes all embeddings from the vector index. This is used when regenerating all embeddings to reset the index count. Also frees memory from HNSW tombstones which can accumulate over time.

func (*Service) ClusterStats

func (s *Service) ClusterStats() *gpu.ClusterStats

ClusterStats returns k-means clustering statistics.

func (*Service) ClusteringInProgress

func (s *Service) ClusteringInProgress() bool

ClusteringInProgress returns true if k-means is currently running for this service. Used to debounce timer ticks and avoid starting a second run while one is in progress.

func (*Service) CrossEncoderAvailable

func (s *Service) CrossEncoderAvailable(ctx context.Context) bool

CrossEncoderAvailable returns true if a cross-encoder reranker is configured and available.

func (*Service) CurrentStrategy

func (s *Service) CurrentStrategy() string

CurrentStrategy returns the currently active vector search strategy label. Returns "unknown" until a vector pipeline has been initialized.

func (*Service) EmbeddingCount

func (s *Service) EmbeddingCount() int

EmbeddingCount returns the total number of nodes with embeddings in the vector index.

func (*Service) EnableClustering

func (s *Service) EnableClustering(gpuManager *gpu.Manager, numClusters int)

EnableClustering enables GPU k-means clustering for accelerated vector search.

Performance Improvements (Real-World Benchmarks):

  • 2,000 embeddings: ~14% faster (61ms vs 65ms avg)
  • 4,500 embeddings: ~26% faster (35ms vs 47ms avg)
  • 10,000+ embeddings: 10-50x faster (scales with dataset size)

The speedup increases with dataset size as the cluster-based search avoids comparing against all vectors.

Parameters:

  • gpuManager: GPU manager for acceleration (can be nil for CPU-only)
  • numClusters: Number of k-means clusters (0 for auto based on dataset size)

Call this BEFORE BuildIndexes(), then call TriggerClustering() after indexing.

Example:

gpuManager, _ := gpu.NewManager(nil)
svc.EnableClustering(gpuManager, 100) // 100 clusters
svc.BuildIndexes(ctx)
svc.TriggerClustering(ctx) // Run k-means

func (*Service) GetBuildProgress

func (s *Service) GetBuildProgress() BuildProgress

GetBuildProgress returns current search build progress for status APIs.

func (*Service) GetDefaultMinSimilarity

func (s *Service) GetDefaultMinSimilarity() float64

GetDefaultMinSimilarity returns the configured minimum similarity threshold.

func (*Service) GetMinEmbeddingsForClustering

func (s *Service) GetMinEmbeddingsForClustering() int

GetMinEmbeddingsForClustering returns the current minimum embeddings threshold.

func (*Service) IndexNode

func (s *Service) IndexNode(node *storage.Node) error

IndexNode adds a node to all search indexes. All embeddings are stored in ChunkEmbeddings (even single chunk = array of 1).

func (*Service) IsClusteringEnabled

func (s *Service) IsClusteringEnabled() bool

IsClusteringEnabled returns true if GPU clustering is enabled.

func (*Service) IsReady

func (s *Service) IsReady() bool

IsReady reports whether the search indexes are fully built and ready to serve queries.

func (*Service) PersistIndexesToDisk

func (s *Service) PersistIndexesToDisk()

PersistIndexesToDisk writes the current BM25, vector, HNSW, and IVF-HNSW (per-cluster) indexes to disk immediately. Call this on shutdown so the latest in-memory state is saved before exit, same as every other persisted index. Cancels any pending debounced persist so no write runs after shutdown.

func (*Service) RemoveNode

func (s *Service) RemoveNode(nodeID storage.NodeID) error

RemoveNode removes a node from all search indexes. Also removes all chunk embeddings (for nodes with multiple chunks).

func (*Service) RerankCandidates

func (s *Service) RerankCandidates(ctx context.Context, query string, candidates []RerankCandidate, opts *SearchOptions) ([]RerankResult, error)

RerankCandidates applies the configured Stage-2 reranker to caller-provided candidates. This is a seam-friendly API for adapters that need rerank semantics without running retrieval.

func (*Service) RerankerAvailable

func (s *Service) RerankerAvailable(ctx context.Context) bool

RerankerAvailable returns true if Stage-2 reranking is configured and available.

func (*Service) Search

func (s *Service) Search(ctx context.Context, query string, embedding []float32, opts *SearchOptions) (*SearchResponse, error)

Search performs hybrid search with automatic fallback.

Search strategy:

  1. Try RRF hybrid search (vector + BM25) if embedding provided
  2. Fall back to vector-only if RRF returns no results
  3. Fall back to BM25-only if vector search fails or no embedding

This ensures you always get results even if one index is empty or fails.

Parameters:

  • ctx: Context for cancellation
  • query: Text query for BM25 search
  • embedding: Vector embedding for similarity search (can be nil)
  • opts: Search options (use DefaultSearchOptions() if unsure)

Example:

svc := search.NewService(engine)

// Hybrid search (best results)
query := "graph database memory"
embedding, _ := embedder.Embed(ctx, query)
opts := search.DefaultSearchOptions()
opts.Limit = 10

resp, err := svc.Search(ctx, query, embedding, opts)
if err != nil {
	return err
}

fmt.Printf("Found %d results using %s\n",
	resp.Returned, resp.SearchMethod)

for i, result := range resp.Results {
	fmt.Printf("%d. [RRF: %.4f] %s\n",
		i+1, result.RRFScore, result.Title)
	fmt.Printf("   Vector rank: #%d, BM25 rank: #%d\n",
		result.VectorRank, result.BM25Rank)
}

Returns a SearchResponse with ranked results and metadata about the search method used.

func (*Service) SetCrossEncoder

func (s *Service) SetCrossEncoder(ce *CrossEncoder)

SetCrossEncoder configures the Stage-2 reranker to use the cross-encoder implementation.

Example:

svc := search.NewService(engine)
svc.SetCrossEncoder(search.NewCrossEncoder(&search.CrossEncoderConfig{
	Enabled: true,
	APIURL:  "http://localhost:8081/rerank",
	Model:   "cross-encoder/ms-marco-MiniLM-L-6-v2",
}))

func (*Service) SetDefaultMinSimilarity

func (s *Service) SetDefaultMinSimilarity(threshold float64)

SetDefaultMinSimilarity sets the default minimum cosine similarity threshold for vector search. Apple Intelligence embeddings produce scores in 0.2-0.8 range, bge-m3/mxbai produce 0.7-0.99. Default: 0.0 (let RRF ranking handle relevance filtering)

func (*Service) SetFulltextIndexPath

func (s *Service) SetFulltextIndexPath(path string)

SetFulltextIndexPath sets the path for persisting the BM25 fulltext index. When both fulltext and vector paths are set, BuildIndexes() will try to load both; if both load with count > 0, the full storage iteration is skipped.

func (*Service) SetGPUManager

func (s *Service) SetGPUManager(manager *gpu.Manager)

SetGPUManager enables GPU acceleration for exact brute-force vector search.

This is independent from k-means clustering. When enabled, the vector pipeline may choose GPU brute-force search (exact) for datasets where it outperforms HNSW.

func (*Service) SetHNSWIndexPath

func (s *Service) SetHNSWIndexPath(path string)

SetHNSWIndexPath sets the path for persisting the HNSW index. When set with persist search indexes, the HNSW index is saved after build/warmup and loaded on startup so the full graph does not need to be rebuilt from vectors.

func (*Service) SetMinEmbeddingsForClustering

func (s *Service) SetMinEmbeddingsForClustering(threshold int)

SetMinEmbeddingsForClustering sets the minimum number of embeddings required before k-means clustering is triggered. Below this threshold, brute-force search is used as it's faster for small datasets.

This should be called BEFORE TriggerClustering() to take effect.

Parameters:

  • threshold: Minimum embeddings (must be > 0, default: 1000)

Tuning Guidelines:

  • 1000 (default): Safe for most workloads
  • 500-1000: Latency-sensitive applications with moderate data
  • 100-500: Testing or small datasets
  • 2000+: Very large datasets, delay clustering until more data arrives

Example:

svc := search.NewService(engine)
svc.SetMinEmbeddingsForClustering(500) // Lower threshold for faster clustering
svc.EnableClustering(gpuManager, 100)
svc.BuildIndexes(ctx)
svc.TriggerClustering(ctx) // Will cluster if >= 500 embeddings

func (*Service) SetPersistenceEnabled

func (s *Service) SetPersistenceEnabled(enabled bool)

SetPersistenceEnabled controls whether index persistence is allowed. When false, all persist paths (debounced writes, build checkpoints, and shutdown flush) are treated as no-ops even if index paths are configured.

func (*Service) SetReranker

func (s *Service) SetReranker(r Reranker)

SetReranker configures the Stage-2 reranker. Uses a quick read-lock check to avoid write-lock contention when the value hasn't changed. This prevents deadlock when multiple goroutines call getOrCreateSearchService concurrently.

func (*Service) SetVectorIndexPath

func (s *Service) SetVectorIndexPath(path string)

SetVectorIndexPath sets the path for persisting the vector index. When both fulltext and vector paths are set, BuildIndexes() will try to load both; if both load with count > 0, the full storage iteration is skipped.

func (*Service) TriggerClustering

func (s *Service) TriggerClustering(ctx context.Context) error

TriggerClustering runs k-means clustering on all indexed embeddings. Stops promptly when ctx is cancelled (e.g. process shutdown). Only one run executes at a time per service; if clustering is already in progress, returns nil immediately (debounced).

Trigger Policies:

  • After bulk loads: Automatically called after BuildIndexes() completes
  • Periodic clustering: Background timer runs clustering at regular intervals
  • Manual trigger: Call this after bulk data loading to enable k-means routing

Once clustering completes, the vector search pipeline automatically uses KMeansCandidateGen for candidate generation, providing significant speedup for very large datasets (N > 100K).

Returns nil (not error) if there are too few embeddings - clustering will be skipped silently as brute-force search is faster for small datasets. Returns error only if clustering is not enabled or fails unexpectedly.

func (*Service) VectorIndexDimensions

func (s *Service) VectorIndexDimensions() int

VectorIndexDimensions returns the configured dimensions of the vector index.

func (*Service) VectorQueryNodes

func (s *Service) VectorQueryNodes(ctx context.Context, queryEmbedding []float32, spec VectorQuerySpec) ([]VectorQueryHit, error)

VectorQueryNodes executes a Cypher-style vector query.

This method preserves Cypher semantics for per-node embedding selection:

  1. Prefer NamedEmbeddings[Property] (or "default" when Property is empty)
  2. If Property is set, next try node.Properties[Property] as a vector array
  3. Fallback to ChunkEmbeddings[0..N] (best score across chunks)

For performance, cosine-similarity queries are executed against the in-memory vector index (unified pipeline) rather than scanning storage.

func (*Service) VectorSearchCandidates

func (s *Service) VectorSearchCandidates(ctx context.Context, embedding []float32, opts *SearchOptions) ([]SearchCandidate, error)

VectorSearchCandidates performs vector-only search and returns lightweight candidates without enrichment. It is optimized for throughput: it skips BM25, RRF fusion, and storage fetches.

This method uses the unified vector search pipeline (CandidateGen + ExactScore) with automatic strategy selection (brute-force for small N, HNSW for large N).

type VectorFileStore

type VectorFileStore struct {
	// contains filtered or unexported fields
}

VectorFileStore is an append-only vector store backed by a file. Only id→offset is kept in RAM; vector data lives on disk. All vectors are stored normalized (one copy per id).

func NewVectorFileStore

func NewVectorFileStore(vecBasePath string, dimensions int) (*VectorFileStore, error)

NewVectorFileStore creates a new file-backed store and opens the vector file for append. vecBasePath is the path prefix: .vec and .meta will be appended. If the .vec file exists it is opened for append; otherwise it is created with a header.

func (*VectorFileStore) Add

func (v *VectorFileStore) Add(id string, vec []float32) error

Add appends a normalized vector to the store. vec is normalized in place/copied; only one copy is stored.

func (*VectorFileStore) Close

func (v *VectorFileStore) Close() error

Close closes the underlying file. The store must not be used after Close.

func (*VectorFileStore) CompactIfNeeded

func (v *VectorFileStore) CompactIfNeeded() (bool, error)

CompactIfNeeded rewrites .vec with only live records when stale entries accumulate. The rewrite is atomic: write temp file, fsync, rename. Returns true when compaction actually ran.

func (*VectorFileStore) Count

func (v *VectorFileStore) Count() int

Count returns the number of vectors in the store.

func (*VectorFileStore) GetBuildIndexedCount

func (v *VectorFileStore) GetBuildIndexedCount() int64

GetBuildIndexedCount returns the last persisted checkpoint count (0 if none). Used at start of BuildIndexes to skip the first N nodes when resuming.

func (*VectorFileStore) GetDimensions

func (v *VectorFileStore) GetDimensions() int

GetDimensions returns the vector dimension.

func (*VectorFileStore) GetVector

func (v *VectorFileStore) GetVector(id string) ([]float32, bool)

GetVector returns a copy of the stored (normalized) vector for id, or (nil, false) if not found.

func (*VectorFileStore) Has

func (v *VectorFileStore) Has(id string) bool

Has reports whether id is present in the id→offset map.

func (*VectorFileStore) IterateChunked

func (v *VectorFileStore) IterateChunked(chunkSize int, fn func(ids []string, vecs [][]float32) error) error

IterateChunked reads the vector file in chunks and calls fn(ids, vecs) for each chunk. Used to build HNSW without loading all vectors into memory. fn may be called with fewer than chunkSize vectors on the last chunk.

func (*VectorFileStore) Load

func (v *VectorFileStore) Load() error

Load populates the store from an existing .vec + .meta. The store must be created with NewVectorFileStore(vecBasePath, dimensions); Load then reads the .meta file to populate idToOff. The .vec file is already open from NewVectorFileStore.

func (*VectorFileStore) Remove

func (v *VectorFileStore) Remove(id string) bool

Remove deletes id from the live id→offset map. The old .vec record is left in-place and reclaimed by compaction.

func (*VectorFileStore) Save

func (v *VectorFileStore) Save() error

Save writes the id→offset map to the .meta file so the store can be loaded later. Copies idToOff under a short lock so the (potentially slow) encode doesn't block Add().

func (*VectorFileStore) SetBuildIndexedCount

func (v *VectorFileStore) SetBuildIndexedCount(n int64)

SetBuildIndexedCount sets the last checkpoint count from BuildIndexes (for resume). Call before Save() when persisting after a checkpoint so the next run can skip already-indexed nodes.

func (*VectorFileStore) Sync

func (v *VectorFileStore) Sync() error

Sync flushes the .vec file to disk so progress is visible and durable.

type VectorFileStoreMeta

type VectorFileStoreMeta struct {
	Version           int              `msgpack:"v"`
	Dimensions        int              `msgpack:"dim"`
	IDToOffset        map[string]int64 `msgpack:"id2off"`
	BuildIndexedCount int64            `msgpack:"build_count,omitempty"` // last checkpoint count during BuildIndexes; used for resume
}

VectorFileStoreMeta is persisted to the .meta file (msgpack).

type VectorGetter

type VectorGetter interface {
	GetVector(id string) ([]float32, bool)
}

VectorGetter is implemented by *VectorIndex and by adapters for VectorLookup (e.g. file-backed store).

type VectorIndex

type VectorIndex struct {
	// contains filtered or unexported fields
}

VectorIndex provides exact vector similarity search using cosine similarity.

The index stores normalized vectors for efficient similarity computation and supports concurrent read/write operations. It uses brute-force search which provides exact results but has O(n) time complexity.

Key features:

  • Exact cosine similarity computation
  • Automatic vector normalization
  • Thread-safe concurrent operations
  • Context-aware search with cancellation
  • Configurable similarity thresholds

Example:

// Create index for 512-dimensional vectors
index := search.NewVectorIndex(512)

// Add vectors
for id, vector := range vectors {
	index.Add(id, vector)
}

// Search with minimum similarity threshold
results, _ := index.Search(ctx, queryVector, 5, 0.8)
for _, result := range results {
	fmt.Printf("%s: %.3f\n", result.ID, result.Score)
}

Performance:

  • Add: O(d) where d = dimensions
  • Search: O(n×d) where n = number of vectors
  • Memory: O(n×d) for normalized vectors

Thread Safety:

All methods are thread-safe using RWMutex for concurrent access.

func NewVectorIndex

func NewVectorIndex(dimensions int) *VectorIndex

NewVectorIndex creates a new vector similarity index for the given dimensions.

The index is initialized empty and ready to accept vectors of the specified dimensionality. All vectors added to the index must have exactly this number of dimensions.

Parameters:

  • dimensions: Number of dimensions for vectors (must be > 0)

Returns:

  • VectorIndex ready for use

Example:

// Create index for OpenAI embeddings (1536 dimensions)
index := search.NewVectorIndex(1536)

// Create index for sentence transformers (384 dimensions)
index = search.NewVectorIndex(384)

// Index is ready to accept vectors
index.Add("doc1", embedding1)

func (*VectorIndex) Add

func (v *VectorIndex) Add(id string, vec []float32) error

Add adds or updates a vector in the index with automatic normalization.

The vector is normalized to unit length for efficient cosine similarity computation. If a vector with the same ID already exists, it is replaced.

Parameters:

  • id: Unique identifier for the vector
  • vector: Vector to add (must match index dimensions)

Returns:

  • ErrDimensionMismatch if vector dimensions don't match index

Example:

// Add document embeddings
for docID, embedding := range documentEmbeddings {
	err := index.Add(docID, embedding)
	if err != nil {
		log.Printf("Failed to add %s: %v", docID, err)
	}
}

// Update existing vector
newEmbedding := getUpdatedEmbedding("doc1")
index.Add("doc1", newEmbedding) // Replaces previous

Performance:

  • Time: O(d) where d is vector dimensions
  • Space: O(d) additional storage per vector
  • Thread-safe: Uses write lock during update

Normalization:

Vectors are automatically normalized to unit length, so cosine
similarity becomes a simple dot product during search.

func (*VectorIndex) Clear

func (v *VectorIndex) Clear()

Clear removes all vectors from the index. This is useful when regenerating all embeddings to reset the index state.

func (*VectorIndex) Count

func (v *VectorIndex) Count() int

Count returns the number of vectors in the index.

func (*VectorIndex) GetDimensions

func (v *VectorIndex) GetDimensions() int

GetDimensions returns the vector dimensions.

func (*VectorIndex) GetVector

func (v *VectorIndex) GetVector(id string) ([]float32, bool)

GetVector returns a copy of the normalized vector for id, or (nil, false) if not found. Used when loading a graph-only HNSW index to populate vectors from the vector index.

func (*VectorIndex) HasVector

func (v *VectorIndex) HasVector(id string) bool

HasVector checks if a vector exists for the given ID.

func (*VectorIndex) Load

func (v *VectorIndex) Load(path string) error

Load replaces the vector index with the one stored at path (msgpack format). If the file does not exist or decode fails, the index is left empty (or unchanged on missing file) and no error is returned, so the caller can proceed with a full rebuild. Returns an error only for unexpected I/O (e.g. permission denied).

func (*VectorIndex) Remove

func (v *VectorIndex) Remove(id string)

Remove removes a vector from the index by its ID.

If the vector doesn't exist, this operation is a no-op.

Parameters:

  • id: Identifier of the vector to remove

Example:

// Remove a document that was deleted
index.Remove("doc-123")

// Remove multiple vectors
for _, docID := range deletedDocs {
	index.Remove(docID)
}

Performance:

  • Time: O(1) hash map deletion
  • Thread-safe: Uses write lock during removal

func (*VectorIndex) Save

func (v *VectorIndex) Save(path string) error

Save writes the vector index to path (msgpack format). Dir is created if needed. Copies index data under a short read lock so I/O does not block Search/IndexNode.

func (*VectorIndex) Search

func (v *VectorIndex) Search(ctx context.Context, query []float32, limit int, minSimilarity float64) ([]indexResult, error)

Search finds vectors similar to the query vector using cosine similarity.

The search computes cosine similarity between the query and all indexed vectors, filters by minimum similarity threshold, and returns the top results sorted by similarity score (highest first).

Parameters:

  • ctx: Context for cancellation and timeouts
  • query: Query vector (must match index dimensions)
  • limit: Maximum number of results to return
  • minSimilarity: Minimum similarity threshold (0.0 to 1.0)

Returns:

  • Slice of indexResult sorted by similarity (descending)
  • ErrDimensionMismatch if query dimensions don't match

Example:

// Search for top 10 most similar vectors
results, err := index.Search(ctx, queryVector, 10, 0.0)
if err != nil {
	log.Fatal(err)
}

// Search with high similarity threshold
results, err = index.Search(ctx, queryVector, 5, 0.8)
for _, result := range results {
	fmt.Printf("ID: %s, Similarity: %.3f\n", result.ID, result.Score)
}

// Search with timeout
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
results, err = index.Search(ctx, queryVector, 10, 0.7)

Performance:

  • Time: O(n×d) where n=vectors, d=dimensions
  • Space: O(k) where k=number of results
  • Cancellation: Respects context cancellation during search

Similarity Range:

  • 1.0: Identical vectors (same direction)
  • 0.0: Orthogonal vectors (perpendicular)
  • -1.0: Opposite vectors (opposite directions)

Thread Safety:

Uses read lock during search, allowing concurrent searches.

type VectorLookup

type VectorLookup func(id string) ([]float32, bool)

VectorLookup returns a vector by ID (e.g. from the vector index). Used when loading a graph-only HNSW file.

type VectorQueryHit

type VectorQueryHit struct {
	ID    string
	Score float64
}

VectorQueryHit is a lightweight result row for Cypher-compatible vector queries. ID is the node ID (not a chunk/named vector ID).

type VectorQuerySpec

type VectorQuerySpec struct {
	// IndexName is informational only (used for debugging/observability).
	IndexName string

	// Label optionally filters candidate nodes to those that have this label.
	Label string

	// Property is the Cypher vector index "property" name.
	//
	// Resolution semantics:
	//   1) Prefer NamedEmbeddings[Property] (or "default" when Property is empty)
	//   2) If Property is set, next try node.Properties[Property] as a vector array
	//   3) Fallback to ChunkEmbeddings[0..N] (best score across chunks)
	Property string

	// Similarity is the similarity function name. Supported values:
	// "cosine" (default), "dot", "euclidean".
	Similarity string

	// Limit is the maximum number of results to return (top-K).
	Limit int
}

VectorQuerySpec describes how to resolve embeddings for a Cypher-style vector query.

This is intentionally a "core" representation of Cypher vector index metadata: the Cypher layer can remain syntax-compatible, while the search layer owns the implementation details and can choose the best execution strategy.

type VectorSearchPipeline

type VectorSearchPipeline struct {
	// contains filtered or unexported fields
}

VectorSearchPipeline implements the unified vector search pipeline.

Pipeline stages:

  1. CandidateGen: Generate candidates (brute-force or HNSW)
  2. ExactScore: Re-score candidates exactly (CPU or GPU)
  3. Filter: Apply minSimilarity threshold
  4. TopK: Return top-k results

func NewVectorSearchPipeline

func NewVectorSearchPipeline(candidateGen CandidateGenerator, exactScorer ExactScorer) *VectorSearchPipeline

NewVectorSearchPipeline creates a new vector search pipeline.

func (*VectorSearchPipeline) Search

func (p *VectorSearchPipeline) Search(ctx context.Context, query []float32, k int, minSimilarity float64) ([]ScoredCandidate, error)

Search performs vector search using the pipeline.

Parameters:

  • ctx: Context for cancellation
  • query: Query vector (will be normalized)
  • k: Desired number of results
  • minSimilarity: Minimum similarity threshold

Returns:

  • candidates: Top-k candidates with exact scores
  • error: Context cancellation or other errors

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL