Documentation
¶
Overview ¶
Package search provides full-text indexing with BM25 scoring.
Package search - HNSW configuration and quality presets.
Package search provides HNSW vector indexing for fast approximate nearest neighbor search.
HNSW Delete/Update Policy:
Delete:
- Remove() tombstones a vector via a dense `deleted []bool` flag
- Neighbor lists are not eagerly rewired (tombstones keep deletes cheap)
- Entry point is re-selected if the removed node was the entry point
Update:
- Current policy: Remove() + Add() pattern
- Call Remove(id) then Add(id, newVector) to update a vector
- This ensures the graph structure is correctly maintained
- Future: A dedicated Update() method may be added for efficiency
Graph Quality:
- High-churn workloads (many updates/deletes) can degrade graph quality
- Periodic rebuilds are recommended (see NORNICDB_VECTOR_ANN_REBUILD_INTERVAL)
- Rebuilds restore optimal graph structure and improve recall
Package search - Kalman filter integration for stable search ranking.
This file provides the KalmanAdapter that enhances the search service with:
- Similarity score smoothing: Reduce noise in vector similarity scores
- Ranking stability: Prevent results from jumping around between queries
- Trend detection: Identify documents becoming more/less relevant
- Latency prediction: Smooth query latency for better resource planning
Why Smooth Search Scores? ¶
Vector similarity scores can be noisy due to:
- Embedding model variability (same query, slightly different embeddings)
- Approximation in ANN (approximate nearest neighbor) indexes
- Edge cases where documents are equidistant from query
Kalman filtering smooths these variations to provide:
- More consistent rankings across similar queries
- Gradual relevance transitions (not sudden jumps)
- Confidence estimates for borderline results
Integration Architecture ¶
┌─────────────────────────────────────────────────────────────┐ │ Kalman Search Adapter │ ├─────────────────────────────────────────────────────────────┤ │ ┌─────────────────┐ ┌─────────────────────────────────┐ │ │ │ Search Service │───▶│ Kalman Filter (scores) │ │ │ │ (raw scores) │ │ - Per-document filters │ │ │ └────────┬────────┘ │ - Score history smoothing │ │ │ │ └─────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────┐│ │ │ Ranking Stabilizer ││ │ │ • Detect score ties ││ │ │ • Apply stability boost to consistent docs ││ │ │ • Prevent rank oscillation ││ │ └─────────────────────────────────────────────────────────┘│ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────┐│ │ │ Latency Predictor ││ │ │ • Smooth query latency measurements ││ │ │ • Predict P99 latency ││ │ │ • Resource scaling suggestions ││ │ └─────────────────────────────────────────────────────────┘│ └─────────────────────────────────────────────────────────────┘
ELI12 (Explain Like I'm 12) ¶
Imagine you're searching Google for "cute cats":
**Without Kalman:**
- Search 1: Cat A is #1, Cat B is #2
- Search 2: Cat B is #1, Cat A is #2 (they swapped!)
- Search 3: Cat C is #1 (where did that come from?!)
**With Kalman:**
- Kalman remembers: "Cat A has been top 3 for a while"
- When Cat B briefly gets a tiny score boost, Kalman says: "That's probably noise. Cat A stays #1 until I see a REAL trend."
- Result: Rankings stay consistent, you find things where you expect them!
It's like having a wise librarian who remembers "this book was popular before" instead of reshuffling the entire library every time someone asks a question.
Package search - K-means cluster routing candidate generator.
This file implements KMeansCandidateGen, which uses k-means clustering to route queries to the most relevant clusters, then generates candidates from those clusters. This is optimal for very large datasets (N > 100K) where cluster routing provides significant speedup over HNSW.
Trigger Policies:
K-means clustering is triggered automatically:
- After bulk loads: When BuildIndexes() completes, clustering runs automatically
- Periodic clustering: Background timer runs clustering at regular intervals (configurable via clustering interval)
- Manual trigger: Call TriggerClustering() after bulk data loading
The candidate generator automatically uses k-means routing when:
- Clustering is enabled (EnableClustering() called)
- ClusterIndex is clustered (Cluster() has been run)
- Dataset is large enough to benefit (typically N > 100K)
For smaller datasets, the pipeline automatically falls back to HNSW or brute-force.
Package search provides cross-encoder reranking for improved search quality.
Cross-encoder reranking is a two-stage retrieval approach:
Stage 1 (Fast): Bi-encoder retrieval - vector similarity + BM25 - Uses pre-computed embeddings for O(log N) search - Returns top-K candidates (e.g., 100)
Stage 2 (Accurate): Cross-encoder reranking - Passes (query, document) pairs through a cross-encoder model - Model sees both query and document together → more accurate - Re-scores and re-ranks the top-K candidates - Returns final top-N results (e.g., 10)
Why Cross-Encoder?
Bi-encoders (like embeddings) encode query and document separately:
query_emb = encode(query) doc_emb = encode(document) score = cosine(query_emb, doc_emb)
Cross-encoders encode them together:
score = cross_encode(query, document) // sees interaction!
The cross-encoder can capture fine-grained semantic relationships that bi-encoders miss, but it's slower (O(N) vs O(log N)).
ELI12 (Explain Like I'm 12):
Imagine finding a book in a library:
Bi-encoder (Stage 1): Like using the card catalog. Fast lookup by category/keywords, but might miss nuances.
Cross-encoder (Stage 2): Like actually reading each book's summary. More accurate, but takes longer. So we only do it for the top candidates from Stage 1.
Reference: Nogueira & Cho (2019) "Passage Re-ranking with BERT"
Package search provides unified hybrid search with Reciprocal Rank Fusion (RRF).
This package implements the same hybrid search approach used by production systems like Azure AI Search, Elasticsearch, and Weaviate: combining vector similarity search with BM25 full-text search using Reciprocal Rank Fusion.
Search Capabilities:
- Vector similarity search (cosine similarity with HNSW index)
- BM25 full-text search (keyword matching with TF-IDF)
- RRF hybrid search (fuses vector + BM25 results)
- Adaptive weighting based on query characteristics
- Automatic fallback when one method fails
Example Usage:
// Create search service
svc := search.NewService(storageEngine)
// Build indexes from existing nodes
if err := svc.BuildIndexes(ctx); err != nil {
log.Fatal(err)
}
// Perform hybrid search
query := "machine learning algorithms"
embedding := embedder.Embed(ctx, query) // Get from embed package
opts := search.DefaultSearchOptions()
response, err := svc.Search(ctx, query, embedding, opts)
if err != nil {
log.Fatal(err)
}
for _, result := range response.Results {
fmt.Printf("[%.3f] %s\n", result.RRFScore, result.Title)
}
How RRF Works:
RRF (Reciprocal Rank Fusion) combines rankings from multiple search methods. Instead of merging scores directly (which can be incomparable), RRF uses rank positions to create a unified ranking.
Formula: RRF_score = Σ (weight / (k + rank))
Where:
- k is a constant (typically 60) to reduce the impact of high ranks
- rank is the position in the result list (1-indexed)
- weight allows emphasizing one method over another
Example: A document ranked #1 in vector search and #3 in BM25:
RRF = (1.0 / (60 + 1)) + (1.0 / (60 + 3))
= (1.0 / 61) + (1.0 / 63)
= 0.0164 + 0.0159
= 0.0323
Documents that appear in both result sets get boosted scores.
ELI12 (Explain Like I'm 12):
Imagine two friends ranking pizza places:
- Friend A (vector search) ranks by taste similarity to your favorite
- Friend B (BM25) ranks by matching your description "spicy pepperoni"
They might disagree! Friend A says place X is #1 (tastes similar), while Friend B says it's #5 (doesn't match keywords well).
RRF solves this by: 1. If a place appears in BOTH lists, it gets bonus points 2. Higher ranks (being at the top) give more points 3. The magic number 60 prevents #1 from completely dominating
This way, a place that's #2 in both lists beats a place that's #1 in one but missing from the other!
Index persistence and WAL alignment:
NornicDB storage uses a write-ahead log (WAL) for durability; graph state is recovered from WAL (and snapshots) on startup. Search indexes (BM25 and vector) are built or loaded after storage recovery, so they always reflect the WAL-consistent state. Index files use semver format versioning (Qdrant-style): only indexes with the same format version are loaded; older or newer versions are rejected so the caller can rebuild. Persisted indexes are saved on a debounced schedule after mutations and on shutdown.
Result caching:
Search() results are cached in-process by query + options (limit, types, rerank, MMR, etc.), with the same semantics as the Cypher query cache: LRU eviction (default 1000 entries), TTL (default 5 minutes), and full invalidation on IndexNode/RemoveNode so results stay correct after index changes. All call paths (HTTP search API, Cypher vector procedures, MCP, etc.) share this cache, so repeated identical searches return immediately.
Package search provides file-backed vector storage for memory-efficient indexing.
VectorFileStore implements append-only vector storage: vectors are written to a .vec file and only id→offset metadata is kept in RAM. This allows BuildIndexes to index large datasets without holding 2–3× vector data in memory. Vectors are stored normalized (one copy per id, cosine-only) per the indexing-memory plan.
Package search provides vector indexing with cosine similarity search.
This file implements a simple but effective vector similarity search index using brute-force cosine similarity calculation. While not as sophisticated as HNSW or other approximate methods, it provides exact results and is suitable for moderate-sized datasets.
Key Features:
- Exact cosine similarity search
- Automatic vector normalization for performance
- Thread-safe concurrent operations
- Context-aware search with cancellation
- Configurable similarity thresholds
Example Usage:
// Create vector index for 1024-dimensional embeddings
index := search.NewVectorIndex(1024)
// Add vectors to the index
embedding1 := make([]float32, 1024)
// ... populate embedding1 ...
index.Add("doc-1", embedding1)
embedding2 := make([]float32, 1024)
// ... populate embedding2 ...
index.Add("doc-2", embedding2)
// Search for similar vectors
query := make([]float32, 1024)
// ... populate query vector ...
results, err := index.Search(ctx, query, 10, 0.7)
if err != nil {
log.Fatal(err)
}
// Process results (sorted by similarity)
for _, result := range results {
fmt.Printf("ID: %s, Similarity: %.3f\n", result.ID, result.Score)
}
Algorithm Details:
The index uses cosine similarity, which measures the cosine of the angle between two vectors. It's particularly effective for high-dimensional embeddings where magnitude is less important than direction.
Cosine Similarity Formula:
similarity = (A · B) / (||A|| × ||B||)
Where:
- A · B is the dot product of vectors A and B
- ||A|| is the magnitude (L2 norm) of vector A
- ||B|| is the magnitude (L2 norm) of vector B
Optimization:
Vectors are normalized when added to the index, so cosine similarity becomes just the dot product, making search much faster.
Performance Characteristics:
- Add: O(d) where d is vector dimensions
- Search: O(n×d) where n is number of vectors
- Memory: O(n×d) for storing normalized vectors
- Thread-safe: Uses RWMutex for concurrent access
When to Use:
✅ Exact similarity search required ✅ Moderate dataset sizes (<100K vectors) ✅ High-dimensional embeddings (>100 dimensions) ✅ Simplicity and reliability preferred ❌ Very large datasets (>1M vectors) ❌ Approximate search acceptable ❌ Sub-linear search time required
Future Improvements:
- HNSW implementation for O(log n) search
- GPU acceleration for large batches
- Quantization for memory efficiency
- Incremental index updates
ELI12 (Explain Like I'm 12):
Think of vector similarity like comparing the "direction" things are pointing:
**Vectors**: Like arrows pointing in different directions in space. Each arrow represents a document, image, or piece of text.
**Cosine similarity**: Measures how much two arrows point in the same direction. If they point exactly the same way, similarity = 1. If they point opposite ways, similarity = -1.
**Normalization**: We make all arrows the same length so we only care about direction, not how "strong" they are.
**Search**: When you give us a query arrow, we check how similar its direction is to all the arrows we've stored, then give you the most similar ones.
It's like finding documents that "point in the same direction" as your search!
Package search - Vector search pipeline with CandidateGen + ExactScore interfaces.
This file implements the unified vector search pipeline as described in the hybrid-ann-plan.md. It provides:
- CandidateGenerator interface for approximate candidate generation (brute/HNSW)
- ExactScorer interface for exact scoring of candidates (CPU/GPU)
- Auto strategy selection (brute vs HNSW based on dataset size)
- GPU-accelerated exact scoring when available
Index ¶
- Constants
- Variables
- func BuildIVFPQFromVectorStore(ctx context.Context, vfs *VectorFileStore, profile IVFPQProfile, ...) (*IVFPQIndex, *IVFPQBuildStats, error)
- func ConvertSearchIndexDirFromGobToMsgpack(rootDir string) error
- func DefaultBM25Engine() string
- func DeriveIVFCentroidsFromClusters(hnswPath string, vectorLookup VectorLookup) (centroids [][]float32, idToCluster map[string]int, err error)
- func SaveIVFHNSW(hnswPath string, clusterHNSW map[int]*HNSWIndex) error
- func SaveIVFPQBundle(basePath string, idx *IVFPQIndex) error
- type ANNQuality
- type ANNResult
- type BruteForceCandidateGen
- type BuildProgress
- type CPUExactScorer
- type Candidate
- type CandidateGenerator
- type CompressedANNProfile
- type CompressedActivationDiagnostic
- type CrossEncoder
- func (ce *CrossEncoder) Config() *CrossEncoderConfig
- func (ce *CrossEncoder) Enabled() bool
- func (ce *CrossEncoder) IsAvailable(ctx context.Context) bool
- func (ce *CrossEncoder) Name() string
- func (ce *CrossEncoder) Rerank(ctx context.Context, query string, candidates []RerankCandidate) ([]RerankResult, error)
- type CrossEncoderConfig
- type ExactScorer
- type FileStoreBruteForceCandidateGen
- type FulltextBatchEntry
- type FulltextIndex
- func (f *FulltextIndex) Clear()
- func (f *FulltextIndex) Count() int
- func (f *FulltextIndex) GetDocument(id string) (string, bool)
- func (f *FulltextIndex) Index(id string, text string)
- func (f *FulltextIndex) IndexBatch(entries []FulltextBatchEntry)
- func (f *FulltextIndex) IsDirty() bool
- func (f *FulltextIndex) LexicalSeedDocIDs(maxTerms, docsPerTerm int) []string
- func (f *FulltextIndex) Load(path string) error
- func (f *FulltextIndex) PhraseSearch(phrase string, limit int) []indexResult
- func (f *FulltextIndex) Remove(id string)
- func (f *FulltextIndex) Save(path string) error
- func (f *FulltextIndex) SaveNoCopy(path string) error
- func (f *FulltextIndex) Search(query string, limit int) []indexResult
- type FulltextIndexV2
- func (f *FulltextIndexV2) Clear()
- func (f *FulltextIndexV2) Count() int
- func (f *FulltextIndexV2) GetDocument(id string) (string, bool)
- func (f *FulltextIndexV2) Index(id string, text string)
- func (f *FulltextIndexV2) IndexBatch(entries []FulltextBatchEntry)
- func (f *FulltextIndexV2) IsDirty() bool
- func (f *FulltextIndexV2) LexicalSeedDocIDs(maxTerms, docsPerTerm int) []string
- func (f *FulltextIndexV2) Load(path string) error
- func (f *FulltextIndexV2) PhraseSearch(phrase string, limit int) []indexResult
- func (f *FulltextIndexV2) Remove(id string)
- func (f *FulltextIndexV2) Save(path string) error
- func (f *FulltextIndexV2) SaveNoCopy(path string) error
- func (f *FulltextIndexV2) Search(query string, limit int) []indexResult
- type GPUBruteForceCandidateGen
- type GPUExactScorer
- type GPUKMeansCandidateGen
- type HNSWCandidateGen
- type HNSWConfig
- type HNSWIndex
- func (h *HNSWIndex) Add(id string, vec []float32) error
- func (h *HNSWIndex) Clear()
- func (h *HNSWIndex) Config() HNSWConfig
- func (h *HNSWIndex) GetDimensions() int
- func (h *HNSWIndex) Remove(id string)
- func (h *HNSWIndex) Save(path string) error
- func (h *HNSWIndex) Search(ctx context.Context, query []float32, k int, minSimilarity float64) ([]ANNResult, error)
- func (h *HNSWIndex) SearchWithEf(ctx context.Context, query []float32, k int, minSimilarity float64, ef int) ([]ANNResult, error)
- func (h *HNSWIndex) SetVectorLookup(lookup VectorLookup)
- func (h *HNSWIndex) ShouldRebuild() bool
- func (h *HNSWIndex) Size() int
- func (h *HNSWIndex) TombstoneRatio() float64
- func (h *HNSWIndex) Update(id string, vec []float32) error
- type HNSWQualityPreset
- type IVFHNSWCandidateGen
- type IVFPQBuildStats
- type IVFPQCandidateGen
- type IVFPQIndex
- type IVFPQProfile
- type IdentityExactScorer
- type KMeansCandidateGen
- type KalmanSearchAdapter
- func (ka *KalmanSearchAdapter) GetDocumentRelevanceTrend(docID string) float64
- func (ka *KalmanSearchAdapter) GetFallingDocuments(maxVelocity float64) []string
- func (ka *KalmanSearchAdapter) GetLatencyTrend() float64
- func (ka *KalmanSearchAdapter) GetPredictedLatency(stepsAhead int) float64
- func (ka *KalmanSearchAdapter) GetRisingDocuments(minVelocity float64) []string
- func (ka *KalmanSearchAdapter) GetService() *Service
- func (ka *KalmanSearchAdapter) GetStats() SearchAdapterStats
- func (ka *KalmanSearchAdapter) Reset()
- func (ka *KalmanSearchAdapter) Search(ctx context.Context, query string, embedding []float32, opts *SearchOptions) (*SearchResponse, error)
- type KalmanSearchConfig
- type LLMFunc
- type LLMReranker
- type LLMRerankerConfig
- type LocalReranker
- type LocalRerankerConfig
- type NodeIterator
- type RerankCandidate
- type RerankResult
- type RerankScorer
- type Reranker
- type ScoredCandidate
- type SearchAdapterStats
- type SearchCandidate
- type SearchMetrics
- type SearchOptions
- type SearchResponse
- type SearchResult
- type Service
- func (s *Service) BM25Engine() string
- func (s *Service) BuildInProgress() bool
- func (s *Service) BuildIndexes(ctx context.Context) error
- func (s *Service) ClearVectorIndex()
- func (s *Service) ClusterStats() *gpu.ClusterStats
- func (s *Service) ClusteringInProgress() bool
- func (s *Service) CrossEncoderAvailable(ctx context.Context) bool
- func (s *Service) CurrentStrategy() string
- func (s *Service) EmbeddingCount() int
- func (s *Service) EnableClustering(gpuManager *gpu.Manager, numClusters int)
- func (s *Service) GetBuildProgress() BuildProgress
- func (s *Service) GetDefaultMinSimilarity() float64
- func (s *Service) GetMinEmbeddingsForClustering() int
- func (s *Service) IndexNode(node *storage.Node) error
- func (s *Service) IsClusteringEnabled() bool
- func (s *Service) IsReady() bool
- func (s *Service) PersistIndexesToDisk()
- func (s *Service) RemoveNode(nodeID storage.NodeID) error
- func (s *Service) RerankCandidates(ctx context.Context, query string, candidates []RerankCandidate, ...) ([]RerankResult, error)
- func (s *Service) RerankerAvailable(ctx context.Context) bool
- func (s *Service) Search(ctx context.Context, query string, embedding []float32, opts *SearchOptions) (*SearchResponse, error)
- func (s *Service) SetCrossEncoder(ce *CrossEncoder)
- func (s *Service) SetDefaultMinSimilarity(threshold float64)
- func (s *Service) SetFulltextIndexPath(path string)
- func (s *Service) SetGPUManager(manager *gpu.Manager)
- func (s *Service) SetHNSWIndexPath(path string)
- func (s *Service) SetMinEmbeddingsForClustering(threshold int)
- func (s *Service) SetPersistenceEnabled(enabled bool)
- func (s *Service) SetReranker(r Reranker)
- func (s *Service) SetVectorIndexPath(path string)
- func (s *Service) TriggerClustering(ctx context.Context) error
- func (s *Service) VectorIndexDimensions() int
- func (s *Service) VectorQueryNodes(ctx context.Context, queryEmbedding []float32, spec VectorQuerySpec) ([]VectorQueryHit, error)
- func (s *Service) VectorSearchCandidates(ctx context.Context, embedding []float32, opts *SearchOptions) ([]SearchCandidate, error)
- type VectorFileStore
- func (v *VectorFileStore) Add(id string, vec []float32) error
- func (v *VectorFileStore) Close() error
- func (v *VectorFileStore) CompactIfNeeded() (bool, error)
- func (v *VectorFileStore) Count() int
- func (v *VectorFileStore) GetBuildIndexedCount() int64
- func (v *VectorFileStore) GetDimensions() int
- func (v *VectorFileStore) GetVector(id string) ([]float32, bool)
- func (v *VectorFileStore) Has(id string) bool
- func (v *VectorFileStore) IterateChunked(chunkSize int, fn func(ids []string, vecs [][]float32) error) error
- func (v *VectorFileStore) Load() error
- func (v *VectorFileStore) Remove(id string) bool
- func (v *VectorFileStore) Save() error
- func (v *VectorFileStore) SetBuildIndexedCount(n int64)
- func (v *VectorFileStore) Sync() error
- type VectorFileStoreMeta
- type VectorGetter
- type VectorIndex
- func (v *VectorIndex) Add(id string, vec []float32) error
- func (v *VectorIndex) Clear()
- func (v *VectorIndex) Count() int
- func (v *VectorIndex) GetDimensions() int
- func (v *VectorIndex) GetVector(id string) ([]float32, bool)
- func (v *VectorIndex) HasVector(id string) bool
- func (v *VectorIndex) Load(path string) error
- func (v *VectorIndex) Remove(id string)
- func (v *VectorIndex) Save(path string) error
- func (v *VectorIndex) Search(ctx context.Context, query []float32, limit int, minSimilarity float64) ([]indexResult, error)
- type VectorLookup
- type VectorQueryHit
- type VectorQuerySpec
- type VectorSearchPipeline
Constants ¶
const ( // EnvSearchBM25Engine selects the BM25 engine implementation. // Supported values: "v1", "v2". Default: "v2". EnvSearchBM25Engine = "NORNICDB_SEARCH_BM25_ENGINE" // EnvSearchLogTimings enables per-query timing logs for search stages. // When true, logs vector/BM25/fusion/total timing to help identify bottlenecks. EnvSearchLogTimings = "NORNICDB_SEARCH_LOG_TIMINGS" BM25EngineV1 = "v1" BM25EngineV2 = "v2" )
const ( // NSmallMax is the maximum dataset size for which brute-force search is used. // Below this threshold, brute-force is often faster than ANN overhead. NSmallMax = 5000 // CandidateMultiplier determines how many candidates to generate relative to k. // Formula: C = max(k * CandidateMultiplier, 200) CandidateMultiplier = 20 // MaxCandidates is the hard cap on candidate set size. MaxCandidates = 5000 )
Configuration constants for the vector search pipeline.
const DefaultMinEmbeddingsForClustering = 1000
DefaultMinEmbeddingsForClustering is the default minimum number of embeddings needed before k-means clustering provides any benefit. Below this threshold, brute-force search is faster than cluster overhead.
This value can be overridden per-service using SetMinEmbeddingsForClustering().
Performance Scaling (Real-World Benchmarks):
- <1000 embeddings: Clustering overhead > speedup benefit
- 2,000 embeddings: ~14% faster with clustering
- 4,500 embeddings: ~26% faster with clustering
- 10,000+ embeddings: 10-50x faster with clustering
Tuning Guidelines:
- 1000 (default): Safe for most workloads, proven performance benefit
- 500-1000: Use for latency-sensitive apps (14-26% speedup range)
- 100-500: Testing or small datasets (verify clustering works)
- 2000+: Very large datasets (maximize speedup, delay until more data)
Environment Variable: NORNICDB_KMEANS_MIN_EMBEDDINGS (overrides default)
const DefaultVectorDimensions = 1024
DefaultVectorDimensions is the embedding size used by NewService (e.g. bge-m3). Use NewServiceWithDimensions or schema/query-derived dimensions when your model differs.
Variables ¶
var (
ErrDimensionMismatch = errors.New("vector dimension mismatch")
)
var ErrSearchIndexBuilding = errors.New("search index being built, please try again when they are complete")
var SearchableProperties = []string{
"content",
"text",
"title",
"name",
"description",
"path",
"workerRole",
"requirements",
}
SearchableProperties defines PRIORITY properties for full-text search ranking. These properties are indexed first for better BM25 ranking. Note: ALL node properties are indexed, but these get priority weighting. These match Neo4j's fulltext index configuration.
Functions ¶
func BuildIVFPQFromVectorStore ¶
func BuildIVFPQFromVectorStore(ctx context.Context, vfs *VectorFileStore, profile IVFPQProfile, seedDocIDs []string) (*IVFPQIndex, *IVFPQBuildStats, error)
BuildIVFPQFromVectorStore builds a full IVF/PQ index from VectorFileStore.
func ConvertSearchIndexDirFromGobToMsgpack ¶
ConvertSearchIndexDirFromGobToMsgpack converts search index .gob files under rootDir to msgpack and removes the .gob extension (writes to bm25, vectors, hnsw, hnsw_ivf/0, 1, ...). rootDir should be the search index root (e.g. data/test/search). Back up the directory before running.
func DefaultBM25Engine ¶
func DefaultBM25Engine() string
DefaultBM25Engine returns the configured BM25 engine from environment. Valid values: "v1", "v2". Invalid/empty values fall back to "v2".
func DeriveIVFCentroidsFromClusters ¶
func DeriveIVFCentroidsFromClusters(hnswPath string, vectorLookup VectorLookup) (centroids [][]float32, idToCluster map[string]int, err error)
DeriveIVFCentroidsFromClusters builds centroids and idToCluster from existing hnsw_ivf/ cluster files (numeric names 0, 1, 2, ...) and vectors from the vector index. No separate centroid file. Returns (nil, nil, nil) if no cluster files exist or derivation fails.
func SaveIVFHNSW ¶
SaveIVFHNSW persists per-cluster HNSW indexes to disk under hnsw_ivf/ alongside hnsw. hnswPath is the full path to the single HNSW file (e.g. data/search/dbname/hnsw); per-cluster files are written to hnsw_ivf/0, 1, 2, ... (no extension). Each cluster is saved as graph-only.
func SaveIVFPQBundle ¶
func SaveIVFPQBundle(basePath string, idx *IVFPQIndex) error
SaveIVFPQBundle persists an IVFPQ index as an atomic multipart bundle.
Types ¶
type ANNQuality ¶
type ANNQuality string
ANNQuality selects the high-level ANN strategy mode.
const ( ANNQualityFast ANNQuality = "fast" ANNQualityBalanced ANNQuality = "balanced" ANNQualityAccurate ANNQuality = "accurate" ANNQualityCompressed ANNQuality = "compressed" )
func ANNQualityFromEnv ¶
func ANNQualityFromEnv() ANNQuality
ANNQualityFromEnv parses the global ANN quality selector. Unknown values default to fast to preserve historical behavior.
type ANNResult ¶
ANNResult is a minimal search result from the ANN index (HNSW).
This intentionally stays small (ID + float32 score) to keep per-request allocations and copy costs low. Higher-level layers can enrich results as needed (labels, properties, etc.).
type BruteForceCandidateGen ¶
type BruteForceCandidateGen struct {
// contains filtered or unexported fields
}
BruteForceCandidateGen implements CandidateGenerator using brute-force search.
This is optimal for small datasets (N < NSmallMax) where ANN overhead dominates brute-force computation time.
func NewBruteForceCandidateGen ¶
func NewBruteForceCandidateGen(vectorIndex *VectorIndex) *BruteForceCandidateGen
NewBruteForceCandidateGen creates a new brute-force candidate generator.
func (*BruteForceCandidateGen) SearchCandidates ¶
func (b *BruteForceCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)
SearchCandidates generates candidates using brute-force search.
type BuildProgress ¶
type BuildProgress struct {
Ready bool
Building bool
Phase string
ProcessedNodes int64
TotalNodes int64
RateNodesPerSec float64
ETASeconds int64 // -1 when unknown
}
BuildProgress reports in-progress indexing state for UI/ops visibility.
type CPUExactScorer ¶
type CPUExactScorer struct {
// contains filtered or unexported fields
}
CPUExactScorer implements ExactScorer using CPU-based exact scoring.
Uses SIMD-optimized dot product for cosine similarity computation.
func NewCPUExactScorer ¶
func NewCPUExactScorer(getter VectorGetter) *CPUExactScorer
NewCPUExactScorer creates a new CPU-based exact scorer. getter can be *VectorIndex or any type that implements GetVector (e.g. file-store lookup adapter).
func (*CPUExactScorer) ScoreCandidates ¶
func (c *CPUExactScorer) ScoreCandidates(ctx context.Context, query []float32, candidates []Candidate) ([]ScoredCandidate, error)
ScoreCandidates computes exact scores for candidates using CPU.
type CandidateGenerator ¶
type CandidateGenerator interface {
// SearchCandidates generates candidate vectors for the given query.
//
// Parameters:
// - ctx: Context for cancellation
// - query: Normalized query vector
// - k: Desired number of results
// - minSimilarity: Minimum similarity threshold (approximate, may be relaxed)
//
// Returns:
// - candidates: Candidate IDs with approximate scores (may be more than k)
// - error: Context cancellation or other errors
SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)
}
CandidateGenerator generates candidate vectors for approximate search.
Implementations:
- BruteForceCandidateGen: Exact search over all vectors (for small N)
- HNSWCandidateGen: Approximate search using HNSW graph (for large N)
The generator returns candidate IDs and approximate scores. These candidates will be re-scored exactly by ExactScorer before final ranking.
type CompressedANNProfile ¶
type CompressedANNProfile struct {
Quality ANNQuality
Active bool
Dimensions int
VectorCount int
IVFLists int
PQSegments int
PQBits int
NProbe int
RerankTopK int
TrainingSampleMax int
KMeansMaxIterations int
SeedMaxTerms int
SeedDocsPerTerm int
RoutingMode string
Diagnostics []CompressedActivationDiagnostic
}
CompressedANNProfile is the resolved runtime contract for compressed ANN mode.
func ResolveCompressedANNProfile ¶
func ResolveCompressedANNProfile(vectorCount, dimensions int, vectorStoreReady bool) CompressedANNProfile
ResolveCompressedANNProfile resolves compressed ANN settings and readiness. Shared knobs are reused directly where semantically compatible.
type CompressedActivationDiagnostic ¶
CompressedActivationDiagnostic captures deterministic activation decisions.
type CrossEncoder ¶
type CrossEncoder struct {
// contains filtered or unexported fields
}
CrossEncoder performs cross-encoder reranking.
func NewCrossEncoder ¶
func NewCrossEncoder(config *CrossEncoderConfig) *CrossEncoder
NewCrossEncoder creates a new cross-encoder reranker.
func (*CrossEncoder) Config ¶
func (ce *CrossEncoder) Config() *CrossEncoderConfig
Config returns the current configuration.
func (*CrossEncoder) Enabled ¶
func (ce *CrossEncoder) Enabled() bool
func (*CrossEncoder) IsAvailable ¶
func (ce *CrossEncoder) IsAvailable(ctx context.Context) bool
IsAvailable checks if the reranking service is available.
func (*CrossEncoder) Name ¶
func (ce *CrossEncoder) Name() string
func (*CrossEncoder) Rerank ¶
func (ce *CrossEncoder) Rerank(ctx context.Context, query string, candidates []RerankCandidate) ([]RerankResult, error)
Rerank takes a query and candidates, returns reranked results.
type CrossEncoderConfig ¶
type CrossEncoderConfig struct {
// Enabled turns on cross-encoder reranking
Enabled bool
// APIURL is the reranking service endpoint
// Supports: Cohere, HuggingFace TEI, local models
APIURL string
// APIKey for authentication (if required)
APIKey string
// Model name (e.g., "cross-encoder/ms-marco-MiniLM-L-6-v2")
Model string
// TopK is how many candidates to rerank (default: 100)
TopK int
// Timeout for reranking requests
Timeout time.Duration
// MinScore is the minimum rerank score to include (0-1)
MinScore float64
}
CrossEncoderConfig configures the cross-encoder reranker.
func DefaultCrossEncoderConfig ¶
func DefaultCrossEncoderConfig() *CrossEncoderConfig
DefaultCrossEncoderConfig returns sensible defaults.
type ExactScorer ¶
type ExactScorer interface {
// ScoreCandidates computes exact similarity scores for the given candidates.
//
// Parameters:
// - ctx: Context for cancellation
// - query: Normalized query vector
// - candidates: Candidate IDs to score (may include approximate scores for reference)
//
// Returns:
// - scored: Candidates with exact scores
// - error: Context cancellation or other errors
ScoreCandidates(ctx context.Context, query []float32, candidates []Candidate) ([]ScoredCandidate, error)
}
ExactScorer computes exact similarity scores for candidate vectors.
Implementations:
- CPUExactScorer: CPU-based exact scoring (SIMD-optimized)
- GPUExactScorer: GPU-accelerated exact scoring (when available)
The scorer takes candidate IDs and returns exact scores using the true metric (cosine/dot/euclid), ensuring final ranking accuracy.
type FileStoreBruteForceCandidateGen ¶
type FileStoreBruteForceCandidateGen struct {
// contains filtered or unexported fields
}
FileStoreBruteForceCandidateGen implements CandidateGenerator using brute-force search directly over the file-backed vector store. This is used for small datasets when vectors are not held in-memory.
func NewFileStoreBruteForceCandidateGen ¶
func NewFileStoreBruteForceCandidateGen(vectorStore *VectorFileStore) *FileStoreBruteForceCandidateGen
NewFileStoreBruteForceCandidateGen creates a brute-force candidate generator over VectorFileStore.
func (*FileStoreBruteForceCandidateGen) SearchCandidates ¶
func (b *FileStoreBruteForceCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)
SearchCandidates scans all vectors from the file store and returns top candidates by exact cosine score.
type FulltextBatchEntry ¶
FulltextBatchEntry is one (id, text) pair for IndexBatch.
type FulltextIndex ¶
type FulltextIndex struct {
// contains filtered or unexported fields
}
FulltextIndex provides BM25-based full-text search. It indexes documents and supports keyword search with TF-IDF scoring.
func NewFulltextIndex ¶
func NewFulltextIndex() *FulltextIndex
NewFulltextIndex creates a new full-text search index.
func (*FulltextIndex) Clear ¶
func (f *FulltextIndex) Clear()
Clear removes all indexed documents and terms. This is used to free RAM after persisting during bulk builds.
func (*FulltextIndex) Count ¶
func (f *FulltextIndex) Count() int
Count returns the number of indexed documents.
func (*FulltextIndex) GetDocument ¶
func (f *FulltextIndex) GetDocument(id string) (string, bool)
GetDocument retrieves the original text for a document.
func (*FulltextIndex) Index ¶
func (f *FulltextIndex) Index(id string, text string)
Index adds or updates a document in the index.
func (*FulltextIndex) IndexBatch ¶
func (f *FulltextIndex) IndexBatch(entries []FulltextBatchEntry)
IndexBatch adds or updates many documents under one lock and updates avgDocLength once at the end. Use this during bulk build to reduce lock contention and avoid O(N) avg updates.
func (*FulltextIndex) IsDirty ¶
func (f *FulltextIndex) IsDirty() bool
IsDirty reports whether the index has changes not yet persisted to disk.
func (*FulltextIndex) LexicalSeedDocIDs ¶
func (f *FulltextIndex) LexicalSeedDocIDs(maxTerms, docsPerTerm int) []string
LexicalSeedDocIDs returns document IDs selected from high-IDF terms to provide lexical priors for clustering seed selection.
func (*FulltextIndex) Load ¶
func (f *FulltextIndex) Load(path string) error
Load replaces the fulltext index with the one stored at path (msgpack format). If the file does not exist or decode fails, the index is left empty (or unchanged on missing file) and no error is returned, so the caller can proceed with a full rebuild. Returns an error only for unexpected I/O (e.g. permission denied).
func (*FulltextIndex) PhraseSearch ¶
func (f *FulltextIndex) PhraseSearch(phrase string, limit int) []indexResult
PhraseSearch searches for an exact phrase match. Returns documents containing the exact phrase.
func (*FulltextIndex) Remove ¶
func (f *FulltextIndex) Remove(id string)
Remove removes a document from the index.
func (*FulltextIndex) Save ¶
func (f *FulltextIndex) Save(path string) error
Save writes the fulltext index to path (msgpack format). Dir is created if needed. Copies index data under a short read lock so I/O does not block Search/IndexNode.
func (*FulltextIndex) SaveNoCopy ¶
func (f *FulltextIndex) SaveNoCopy(path string) error
SaveNoCopy writes the fulltext index to path without deep-copying the maps. This avoids doubling memory during large builds, but holds the read lock during I/O. Use during BuildIndexes when search is not serving live traffic.
func (*FulltextIndex) Search ¶
func (f *FulltextIndex) Search(query string, limit int) []indexResult
Search performs BM25 keyword search. Returns results sorted by BM25 score (highest first).
type FulltextIndexV2 ¶
type FulltextIndexV2 struct {
// contains filtered or unexported fields
}
FulltextIndexV2 provides a BM25 index optimized for large datasets. It stores compact postings (docNum/tf) and uses bounded prefix expansion + top-k scoring.
func NewFulltextIndexV2 ¶
func NewFulltextIndexV2() *FulltextIndexV2
func (*FulltextIndexV2) Clear ¶
func (f *FulltextIndexV2) Clear()
func (*FulltextIndexV2) Count ¶
func (f *FulltextIndexV2) Count() int
func (*FulltextIndexV2) GetDocument ¶
func (f *FulltextIndexV2) GetDocument(id string) (string, bool)
func (*FulltextIndexV2) Index ¶
func (f *FulltextIndexV2) Index(id string, text string)
func (*FulltextIndexV2) IndexBatch ¶
func (f *FulltextIndexV2) IndexBatch(entries []FulltextBatchEntry)
func (*FulltextIndexV2) IsDirty ¶
func (f *FulltextIndexV2) IsDirty() bool
func (*FulltextIndexV2) LexicalSeedDocIDs ¶
func (f *FulltextIndexV2) LexicalSeedDocIDs(maxTerms, docsPerTerm int) []string
func (*FulltextIndexV2) Load ¶
func (f *FulltextIndexV2) Load(path string) error
func (*FulltextIndexV2) PhraseSearch ¶
func (f *FulltextIndexV2) PhraseSearch(phrase string, limit int) []indexResult
func (*FulltextIndexV2) Remove ¶
func (f *FulltextIndexV2) Remove(id string)
func (*FulltextIndexV2) Save ¶
func (f *FulltextIndexV2) Save(path string) error
func (*FulltextIndexV2) SaveNoCopy ¶
func (f *FulltextIndexV2) SaveNoCopy(path string) error
func (*FulltextIndexV2) Search ¶
func (f *FulltextIndexV2) Search(query string, limit int) []indexResult
type GPUBruteForceCandidateGen ¶
type GPUBruteForceCandidateGen struct {
// contains filtered or unexported fields
}
GPUBruteForceCandidateGen uses gpu.EmbeddingIndex.Search() as an exact candidate generator.
Note: gpu.EmbeddingIndex.Search() does not currently accept a context, so this generator cannot cancel mid-kernel. It checks ctx before invoking the search.
func NewGPUBruteForceCandidateGen ¶
func NewGPUBruteForceCandidateGen(embeddingIndex *gpu.EmbeddingIndex) *GPUBruteForceCandidateGen
func (*GPUBruteForceCandidateGen) SearchCandidates ¶
type GPUExactScorer ¶
type GPUExactScorer struct {
// contains filtered or unexported fields
}
GPUExactScorer implements ExactScorer using GPU-accelerated exact scoring.
Falls back to CPU if GPU is unavailable or unhealthy.
func NewGPUExactScorer ¶
func NewGPUExactScorer(embeddingIndex *gpu.EmbeddingIndex, cpuFallback *CPUExactScorer) *GPUExactScorer
NewGPUExactScorer creates a new GPU-based exact scorer.
func (*GPUExactScorer) ScoreCandidates ¶
func (g *GPUExactScorer) ScoreCandidates(ctx context.Context, query []float32, candidates []Candidate) ([]ScoredCandidate, error)
ScoreCandidates computes exact scores for candidates using GPU when available.
Note: Since EmbeddingIndex.Search() searches all vectors, we use it for candidate scoring by searching with a large k, then filtering to only the candidates we care about. This is not optimal but works with the current EmbeddingIndex API. A future PR will add ScoreSubset() for true subset scoring.
type GPUKMeansCandidateGen ¶
type GPUKMeansCandidateGen struct {
// contains filtered or unexported fields
}
GPUKMeansCandidateGen routes queries to the nearest k-means clusters and then scores only the vectors in those clusters using the GPU embedding index when available.
This is used as a high-throughput fallback when:
- GPU brute-force is enabled but the dataset is outside the configured full-scan range, and
- clustering is available (centroids + cluster membership already built).
It preserves correctness by returning exact cosine scores for the candidate set. When the GPU cannot score (not synced / unhealthy), ScoreSubset falls back to CPU.
func NewGPUKMeansCandidateGen ¶
func NewGPUKMeansCandidateGen(clusterIndex *gpu.ClusterIndex, numClustersToSearch int) *GPUKMeansCandidateGen
func (*GPUKMeansCandidateGen) SearchCandidates ¶
func (*GPUKMeansCandidateGen) SetClusterSelector ¶
func (g *GPUKMeansCandidateGen) SetClusterSelector(fn func(ctx context.Context, query []float32, defaultN int) []int) *GPUKMeansCandidateGen
SetClusterSelector sets an optional custom cluster selector used for routing.
type HNSWCandidateGen ¶
type HNSWCandidateGen struct {
// contains filtered or unexported fields
}
HNSWCandidateGen implements CandidateGenerator using HNSW approximate search.
This is optimal for large datasets (N >= NSmallMax) where ANN provides significant speedup over brute-force.
func NewHNSWCandidateGen ¶
func NewHNSWCandidateGen(hnswIndex *HNSWIndex) *HNSWCandidateGen
NewHNSWCandidateGen creates a new HNSW candidate generator.
func (*HNSWCandidateGen) SearchCandidates ¶
func (h *HNSWCandidateGen) SearchCandidates(ctx context.Context, query []float32, k int, minSimilarity float64) ([]Candidate, error)
SearchCandidates generates candidates using HNSW approximate search.
type HNSWConfig ¶
type HNSWConfig struct {
M int // Max connections per node per layer (default: 16)
EfConstruction int // Candidate list size during construction (default: 200)
EfSearch int // Candidate list size during search (default: 100)
LevelMultiplier float64 // Level multiplier = 1/ln(M)
}
HNSWConfig contains configuration parameters for the HNSW index.
func DefaultHNSWConfig ¶
func DefaultHNSWConfig() HNSWConfig
DefaultHNSWConfig returns sensible defaults for HNSW index.
func HNSWConfigFromEnv ¶
func HNSWConfigFromEnv() HNSWConfig
HNSWConfigFromEnv loads HNSW configuration from environment variables.
Environment Variables:
- NORNICDB_VECTOR_ANN_QUALITY: Quality preset (fast|balanced|accurate|compressed, default: fast)
- NORNICDB_VECTOR_HNSW_M: Max connections per node (default: based on preset)
- NORNICDB_VECTOR_HNSW_EF_CONSTRUCTION: Candidate list size during construction (default: based on preset)
- NORNICDB_VECTOR_HNSW_EF_SEARCH: Candidate list size during search (default: based on preset)
Quality Presets:
- fast: M=16, efConstruction=100, efSearch=50 (faster, lower recall)
- balanced: M=16, efConstruction=200, efSearch=100 (good balance)
- accurate: M=32, efConstruction=400, efSearch=200 (slower, higher recall)
- compressed: parsed at ANN quality layer; maps to fast HNSW defaults until compressed ANN strategy routing is active
Example:
// Use fast preset
os.Setenv("NORNICDB_VECTOR_ANN_QUALITY", "fast")
config := HNSWConfigFromEnv()
// Override specific parameter
os.Setenv("NORNICDB_VECTOR_ANN_QUALITY", "balanced")
os.Setenv("NORNICDB_VECTOR_HNSW_EF_SEARCH", "150")
config := HNSWConfigFromEnv()
type HNSWIndex ¶
type HNSWIndex struct {
// contains filtered or unexported fields
}
HNSWIndex provides fast approximate nearest neighbor search using HNSW algorithm.
func LoadHNSWIndex ¶
func LoadHNSWIndex(path string, vectorLookup VectorLookup) (*HNSWIndex, error)
LoadHNSWIndex loads an HNSW index from path (msgpack format) and returns it. For graph-only format (1.1.0), vectorLookup must be non-nil and vectors are resolved by ID at search time (no in-memory vector copy in HNSW). For legacy full format (1.0.0), vectorLookup is ignored. If the file does not exist or decode fails, returns (nil, nil) so the caller can rebuild. Returns an error only for unexpected I/O (e.g. permission denied).
func LoadIVFHNSWCluster ¶
func LoadIVFHNSWCluster(hnswPath string, clusterID int, vectorLookup VectorLookup) (*HNSWIndex, error)
LoadIVFHNSWCluster loads one cluster's HNSW index from hnsw_ivf/cid in lookup mode (no vector copy in HNSW RAM). Returns (nil, nil) if the file is missing or invalid (caller can build). hnswPath is the full path to the single HNSW file (e.g. data/search/dbname/hnsw).
func NewHNSWIndex ¶
func NewHNSWIndex(dimensions int, config HNSWConfig) *HNSWIndex
NewHNSWIndex creates a new HNSW index with the given dimensions and config.
func (*HNSWIndex) Clear ¶
func (h *HNSWIndex) Clear()
Clear removes all vectors from the index and resets it to an empty state. This frees memory by clearing all internal arrays and maps. Use this when you need to completely reset the index (e.g., after deleting a collection).
func (*HNSWIndex) Config ¶
func (h *HNSWIndex) Config() HNSWConfig
Config returns a copy of the index configuration.
func (*HNSWIndex) GetDimensions ¶
GetDimensions returns the vector dimension of the index.
func (*HNSWIndex) Save ¶
Save writes the HNSW index to path (msgpack format) as graph-only: graph structure and IDs only, no vector data. Vectors are always loaded from the vector index (vectors) on load, so the file stays small. Dir is created if needed. Copies index data under a short read lock so I/O does not block Search/Add/Remove.
func (*HNSWIndex) Search ¶
func (h *HNSWIndex) Search(ctx context.Context, query []float32, k int, minSimilarity float64) ([]ANNResult, error)
Search finds the k nearest neighbors to the query vector.
func (*HNSWIndex) SearchWithEf ¶
func (h *HNSWIndex) SearchWithEf(ctx context.Context, query []float32, k int, minSimilarity float64, ef int) ([]ANNResult, error)
SearchWithEf finds the k nearest neighbors using a caller-provided `ef`.
In Qdrant terms, `ef` is the beam size for HNSW search: larger values improve recall and usually increase latency. If `ef <= 0`, this falls back to the index's configured `EfSearch`.
func (*HNSWIndex) SetVectorLookup ¶
func (h *HNSWIndex) SetVectorLookup(lookup VectorLookup)
SetVectorLookup sets an optional lookup so vectors are resolved by ID at search time instead of from the in-memory slice. When set, Add does not store vectors (vecOff = -1); Load can leave vectors empty and use the lookup. Saves one full vector copy in RAM.
func (*HNSWIndex) ShouldRebuild ¶
ShouldRebuild returns true if the index has accumulated too many tombstones and should be rebuilt to free memory. Threshold is 50% deleted vectors.
func (*HNSWIndex) TombstoneRatio ¶
TombstoneRatio returns the ratio of deleted vectors to total vectors. Returns 0.0 if there are no vectors. A high ratio (>0.5) indicates the index should be rebuilt to free memory.
func (*HNSWIndex) Update ¶
Update updates an existing vector in the index.
Update policy: Remove + Add pattern
- Removes the old vector and all its connections
- Adds the new vector with fresh connections
- This ensures graph structure is correctly maintained
If the vector doesn't exist, this is equivalent to Add().
Performance: O(M * log(N)) where M is max connections, N is dataset size For high-churn workloads, consider periodic rebuilds to restore graph quality.
type HNSWQualityPreset ¶
type HNSWQualityPreset string
HNSWQualityPreset defines quality presets for HNSW tuning.
const ( // QualityFast prioritizes speed over recall. // Lower efSearch, lower candidate_multiplier for faster queries. QualityFast HNSWQualityPreset = "fast" // QualityBalanced provides a good balance between speed and recall. // Default preset with reasonable defaults. QualityBalanced HNSWQualityPreset = "balanced" // QualityAccurate prioritizes recall over speed. // Higher efSearch and/or bigger candidate pool for better accuracy. QualityAccurate HNSWQualityPreset = "accurate" )
type IVFHNSWCandidateGen ¶
type IVFHNSWCandidateGen struct {
// contains filtered or unexported fields
}
IVFHNSWCandidateGen implements IVF-HNSW: centroid routing (IVF) into per-cluster HNSW indexes.
This is designed for large CPU-only datasets:
- K-means provides a coarse routing layer (choose nearest clusters)
- Per-cluster HNSW provides fast ANN within each cluster
For GPU-enabled setups, prefer the existing GPU brute-force and GPU k-means paths.
func NewIVFHNSWCandidateGen ¶
func NewIVFHNSWCandidateGen(clusterIndex *gpu.ClusterIndex, getClusterHNSW clusterHNSWLookup, numClustersToSearch int) *IVFHNSWCandidateGen
func (*IVFHNSWCandidateGen) SearchCandidates ¶
func (*IVFHNSWCandidateGen) SetClusterSelector ¶
func (g *IVFHNSWCandidateGen) SetClusterSelector(fn func(ctx context.Context, query []float32, defaultN int) []int) *IVFHNSWCandidateGen
SetClusterSelector sets an optional custom cluster selector used for routing.
type IVFPQBuildStats ¶
type IVFPQBuildStats struct {
VectorCount int
TrainingSampleCount int
ListCount int
AvgListSize float64
MaxListSize int
BytesPerVector float64
BuildDuration time.Duration
}
IVFPQBuildStats captures build observability for acceptance gates.
type IVFPQCandidateGen ¶
type IVFPQCandidateGen struct {
// contains filtered or unexported fields
}
IVFPQCandidateGen implements CandidateGenerator using IVF/PQ compressed ANN.
func NewIVFPQCandidateGen ¶
func NewIVFPQCandidateGen(index *IVFPQIndex, nprobe int) *IVFPQCandidateGen
func (*IVFPQCandidateGen) SearchCandidates ¶
type IVFPQIndex ¶
type IVFPQIndex struct {
// contains filtered or unexported fields
}
IVFPQIndex stores a compressed IVF/PQ ANN structure.
func LoadIVFPQBundle ¶
func LoadIVFPQBundle(basePath string) (*IVFPQIndex, error)
LoadIVFPQBundle loads an IVFPQ multipart snapshot bundle.
func (*IVFPQIndex) Count ¶
func (i *IVFPQIndex) Count() int
func (*IVFPQIndex) Profile ¶
func (i *IVFPQIndex) Profile() IVFPQProfile
type IVFPQProfile ¶
type IVFPQProfile struct {
Dimensions int
IVFLists int
PQSegments int
PQBits int
NProbe int
RerankTopK int
TrainingSampleMax int
KMeansMaxIterations int
}
IVFPQProfile is the concrete runtime profile used to build/query compressed ANN.
type IdentityExactScorer ¶
type IdentityExactScorer struct{}
IdentityExactScorer is used when the candidate generator already returns exact scores.
func (*IdentityExactScorer) ScoreCandidates ¶
func (i *IdentityExactScorer) ScoreCandidates(ctx context.Context, query []float32, candidates []Candidate) ([]ScoredCandidate, error)
type KMeansCandidateGen ¶
type KMeansCandidateGen struct {
// contains filtered or unexported fields
}
KMeansCandidateGen implements CandidateGenerator using k-means cluster routing.
This candidate generator:
- Finds the top numClustersToSearch clusters nearest to the query
- Gets all node IDs from those clusters as candidates
- Returns candidates with approximate scores (centroid similarity)
This is optimal for very large datasets (N > 100K) where cluster routing provides significant speedup. For smaller datasets, HNSW or brute-force may be faster.
func NewKMeansCandidateGen ¶
func NewKMeansCandidateGen(clusterIndex *gpu.ClusterIndex, vectorIndex *VectorIndex, numClustersToSearch int) *KMeansCandidateGen
NewKMeansCandidateGen creates a new k-means candidate generator.
Parameters:
- clusterIndex: The GPU ClusterIndex (must be clustered)
- vectorIndex: The VectorIndex for ID mapping and fallback
- numClustersToSearch: Number of clusters to search (default: 3)
func (*KMeansCandidateGen) SearchCandidates ¶
func (k *KMeansCandidateGen) SearchCandidates(ctx context.Context, query []float32, limit int, minSimilarity float64) ([]Candidate, error)
SearchCandidates generates candidates using k-means cluster routing.
Algorithm:
- Find top numClustersToSearch clusters nearest to query (centroid similarity)
- Get all node IDs from those clusters
- Return candidates with approximate scores (centroid similarity)
If clustering is not available or not clustered, falls back to brute-force.
func (*KMeansCandidateGen) SetClusterSelector ¶
func (k *KMeansCandidateGen) SetClusterSelector(fn func(ctx context.Context, query []float32, defaultN int) []int) *KMeansCandidateGen
SetClusterSelector sets an optional custom cluster selector used for routing. When nil, the generator uses the default nearest-centroid routing.
type KalmanSearchAdapter ¶
type KalmanSearchAdapter struct {
// contains filtered or unexported fields
}
KalmanSearchAdapter wraps a search Service with Kalman filtering.
func NewKalmanSearchAdapter ¶
func NewKalmanSearchAdapter(service *Service, config KalmanSearchConfig) *KalmanSearchAdapter
NewKalmanSearchAdapter creates a new Kalman-enhanced search adapter.
Example:
service := search.NewService(engine, 1024)
adapter := search.NewKalmanSearchAdapter(service, search.DefaultKalmanSearchConfig())
// Search with enhanced ranking
results, _ := adapter.Search(ctx, "semantic search", embedding, &SearchOptions{Limit: 10})
func (*KalmanSearchAdapter) GetDocumentRelevanceTrend ¶
func (ka *KalmanSearchAdapter) GetDocumentRelevanceTrend(docID string) float64
GetDocumentRelevanceTrend returns whether a document is becoming more or less relevant.
Returns:
- positive velocity: document is becoming more relevant (appearing higher)
- negative velocity: document is becoming less relevant (appearing lower)
- zero: stable relevance
func (*KalmanSearchAdapter) GetFallingDocuments ¶
func (ka *KalmanSearchAdapter) GetFallingDocuments(maxVelocity float64) []string
GetFallingDocuments returns documents whose relevance is decreasing.
func (*KalmanSearchAdapter) GetLatencyTrend ¶
func (ka *KalmanSearchAdapter) GetLatencyTrend() float64
GetLatencyTrend returns the current latency velocity (positive = getting slower).
func (*KalmanSearchAdapter) GetPredictedLatency ¶
func (ka *KalmanSearchAdapter) GetPredictedLatency(stepsAhead int) float64
GetPredictedLatency returns the predicted latency for the next query.
func (*KalmanSearchAdapter) GetRisingDocuments ¶
func (ka *KalmanSearchAdapter) GetRisingDocuments(minVelocity float64) []string
GetRisingDocuments returns documents whose relevance is increasing.
func (*KalmanSearchAdapter) GetService ¶
func (ka *KalmanSearchAdapter) GetService() *Service
GetService returns the underlying search service.
func (*KalmanSearchAdapter) GetStats ¶
func (ka *KalmanSearchAdapter) GetStats() SearchAdapterStats
GetStats returns adapter statistics.
func (*KalmanSearchAdapter) Reset ¶
func (ka *KalmanSearchAdapter) Reset()
Reset clears all cached data and filters.
func (*KalmanSearchAdapter) Search ¶
func (ka *KalmanSearchAdapter) Search(ctx context.Context, query string, embedding []float32, opts *SearchOptions) (*SearchResponse, error)
Search performs a Kalman-enhanced search.
The enhancement pipeline:
- Execute underlying search
- Smooth similarity scores with Kalman filter
- Apply ranking stability boost
- Track latency for prediction
- Return enhanced results
type KalmanSearchConfig ¶
type KalmanSearchConfig struct {
// EnableScoreSmoothing enables Kalman filtering of similarity scores
EnableScoreSmoothing bool
// EnableRankingStability applies stability boost to consistent results
EnableRankingStability bool
// EnableLatencyPrediction tracks and predicts query latency
EnableLatencyPrediction bool
// SimilarityConfig for score smoothing
SimilarityConfig filter.Config
// LatencyConfig for latency prediction
LatencyConfig filter.Config
// StabilityBoost is the boost factor for consistently-ranked documents
StabilityBoost float64
// StabilityWindow is how many recent queries to consider for stability
StabilityWindow int
// ScoreHistoryLimit is max history per document
ScoreHistoryLimit int
}
KalmanSearchConfig holds configuration for the Kalman-enhanced search adapter.
func DefaultKalmanSearchConfig ¶
func DefaultKalmanSearchConfig() KalmanSearchConfig
DefaultKalmanSearchConfig returns sensible defaults.
type LLMFunc ¶
LLMFunc is a minimal, dependency-free function signature for calling an LLM. It is intentionally generic so the search package does not depend on Heimdall.
Implementations should be safe for concurrent use.
type LLMReranker ¶
type LLMReranker struct {
// contains filtered or unexported fields
}
LLMReranker performs Stage-2 reranking using an LLM (e.g., Heimdall).
The model is prompted to output JSON ranking information using candidate indices (not IDs) to keep responses short and parsing robust.
func NewLLMReranker ¶
func NewLLMReranker(cfg *LLMRerankerConfig, llm LLMFunc) *LLMReranker
NewLLMReranker creates a new LLM reranker.
func (*LLMReranker) Enabled ¶
func (r *LLMReranker) Enabled() bool
func (*LLMReranker) IsAvailable ¶
func (r *LLMReranker) IsAvailable(ctx context.Context) bool
func (*LLMReranker) Name ¶
func (r *LLMReranker) Name() string
func (*LLMReranker) Rerank ¶
func (r *LLMReranker) Rerank(ctx context.Context, query string, candidates []RerankCandidate) ([]RerankResult, error)
Rerank takes a query and candidates, returns reranked results.
It is fail-open: if the LLM errors or returns malformed output, it returns the original ranking (pass-through).
type LLMRerankerConfig ¶
type LLMRerankerConfig struct {
Enabled bool
// Timeout bounds how long a single LLM rerank call can take.
Timeout time.Duration
// MaxCandidates caps how many candidates are included in one prompt.
// This protects against long prompts on large candidate sets.
MaxCandidates int
// MaxDocChars truncates each candidate's text content to this many characters.
MaxDocChars int
// MaxQueryChars truncates the query string included in the prompt.
MaxQueryChars int
// MinScore filters candidates whose LLM score is below this value.
// A value of 0 means "no filtering".
MinScore float64
}
LLMRerankerConfig controls LLM-based reranking behavior.
This reranker is designed to be "fail-open": on errors or malformed output, it returns the original candidate order (pass-through).
func DefaultLLMRerankerConfig ¶
func DefaultLLMRerankerConfig() *LLMRerankerConfig
DefaultLLMRerankerConfig returns conservative defaults intended for small local models.
type LocalReranker ¶
type LocalReranker struct {
// contains filtered or unexported fields
}
LocalReranker performs Stage-2 reranking using a local GGUF reranker model.
func NewLocalReranker ¶
func NewLocalReranker(scorer RerankScorer, cfg *LocalRerankerConfig) *LocalReranker
NewLocalReranker creates a reranker that uses the given scorer (e.g. *localllm.RerankerModel).
func (*LocalReranker) Enabled ¶
func (r *LocalReranker) Enabled() bool
func (*LocalReranker) IsAvailable ¶
func (r *LocalReranker) IsAvailable(ctx context.Context) bool
func (*LocalReranker) Name ¶
func (r *LocalReranker) Name() string
func (*LocalReranker) Rerank ¶
func (r *LocalReranker) Rerank(ctx context.Context, query string, candidates []RerankCandidate) ([]RerankResult, error)
Rerank scores each candidate with the scorer and returns results sorted by score (desc). Fail-open: on error returns original order.
type LocalRerankerConfig ¶
type LocalRerankerConfig struct {
Enabled bool
Timeout time.Duration
// MaxCandidates caps how many candidates are scored per query.
MaxCandidates int
// MaxDocChars truncates each candidate's content before scoring.
MaxDocChars int
// MinScore filters out candidates below this score (0 = no filter).
MinScore float64
}
LocalRerankerConfig configures local GGUF reranker behavior.
func DefaultLocalRerankerConfig ¶
func DefaultLocalRerankerConfig() *LocalRerankerConfig
DefaultLocalRerankerConfig returns sensible defaults for BGE-style reranking.
type NodeIterator ¶
NodeIterator is an interface for streaming node iteration.
type RerankCandidate ¶
type RerankCandidate struct {
ID string
Content string
Score float64 // Original score (from bi-encoder)
}
RerankCandidate represents a document to be reranked.
type RerankResult ¶
type RerankResult struct {
ID string
Content string
OriginalRank int
NewRank int
BiScore float64 // Original bi-encoder score
CrossScore float64 // Cross-encoder score
FinalScore float64 // Combined or cross-encoder score
}
RerankResult is a reranked document with new score.
type RerankScorer ¶
RerankScorer scores a (query, document) pair for relevance. Implementations are typically *localllm.RerankerModel (loaded from GGUF).
type Reranker ¶
type Reranker interface {
// Name identifies the reranker implementation for observability.
// Examples: "cross_encoder", "heimdall_llm".
Name() string
// Enabled indicates whether reranking is configured/enabled.
Enabled() bool
// IsAvailable is an optional health check. Implementations should keep this
// cheap; Search() should not call it on the hot path.
IsAvailable(ctx context.Context) bool
// Rerank reorders candidates for a query.
Rerank(ctx context.Context, query string, candidates []RerankCandidate) ([]RerankResult, error)
}
Reranker is a Stage-2 reranking component.
Implementations MUST be fail-open: if reranking cannot be performed (service unavailable, parse error, timeout), they should return a pass-through ranking rather than failing the overall search request.
type ScoredCandidate ¶
ScoredCandidate represents a candidate with exact score.
type SearchAdapterStats ¶
type SearchAdapterStats struct {
TotalQueries int64
ScoresSmoothed int64
StabilityApplied int64
LatencyPredictions int64
AverageLatencyMs float64
PredictedLatencyMs float64
}
SearchAdapterStats holds adapter statistics.
type SearchCandidate ¶
SearchCandidate is a lightweight vector-search result: just the ID and score.
This is intended for high-throughput call paths that don’t require node enrichment (e.g. Qdrant-compatible gRPC searches that return IDs+scores).
type SearchMetrics ¶
type SearchMetrics struct {
VectorSearchTimeMs int `json:"vector_search_time_ms"`
BM25SearchTimeMs int `json:"bm25_search_time_ms"`
FusionTimeMs int `json:"fusion_time_ms"`
TotalTimeMs int `json:"total_time_ms"`
VectorCandidates int `json:"vector_candidates"`
BM25Candidates int `json:"bm25_candidates"`
FusedCandidates int `json:"fused_candidates"`
}
SearchMetrics contains timing and statistics.
type SearchOptions ¶
type SearchOptions struct {
// Limit is the maximum number of results to return
Limit int
// MinSimilarity is the minimum similarity threshold for vector search.
// nil = use service default, otherwise use the provided value.
MinSimilarity *float64
// Types filters results by node type (labels)
Types []string
// RRF configuration
RRFK float64 // RRF constant (default: 60)
VectorWeight float64 // Weight for vector results (default: 1.0)
BM25Weight float64 // Weight for BM25 results (default: 1.0)
MinRRFScore float64 // Minimum RRF score threshold (default: 0.01)
// MMR (Maximal Marginal Relevance) diversification
// When enabled, results are re-ranked to balance relevance with diversity
MMREnabled bool // Enable MMR diversification (default: false)
MMRLambda float64 // Balance: 1.0 = pure relevance, 0.0 = pure diversity (default: 0.7)
// Cross-encoder reranking (Stage 2)
// When enabled, top candidates are re-scored using a cross-encoder model
// for higher accuracy at the cost of latency
RerankEnabled bool // Enable cross-encoder reranking (default: false)
RerankTopK int // How many candidates to rerank (default: 100)
RerankMinScore float64 // Minimum cross-encoder score to include (default: 0)
}
SearchOptions configures the search behavior.
func DefaultSearchOptions ¶
func DefaultSearchOptions() *SearchOptions
DefaultSearchOptions returns sensible defaults.
func GetAdaptiveRRFConfig ¶
func GetAdaptiveRRFConfig(query string) *SearchOptions
GetAdaptiveRRFConfig returns optimized RRF weights based on query characteristics.
This function analyzes the query and adjusts weights to favor the search method most likely to perform well:
Short queries (1-2 words): Favor BM25 keyword matching Example: "python" or "graph database" Weights: Vector=0.5, BM25=1.5
Long queries (6+ words): Favor vector semantic understanding Example: "How do I implement a distributed consensus algorithm?" Weights: Vector=1.5, BM25=0.5
Medium queries (3-5 words): Balanced approach Example: "machine learning algorithms" Weights: Vector=1.0, BM25=1.0
Why this works:
- Short queries lack context → keywords more reliable
- Long queries have semantic meaning → embeddings capture intent better
Example:
// Automatic adaptation
query1 := "database"
opts1 := search.GetAdaptiveRRFConfig(query1)
fmt.Printf("Short query weights: V=%.1f, B=%.1f\n",
opts1.VectorWeight, opts1.BM25Weight)
// Output: V=0.5, B=1.5 (favors keywords)
query2 := "What are the best practices for scaling graph databases?"
opts2 := search.GetAdaptiveRRFConfig(query2)
fmt.Printf("Long query weights: V=%.1f, B=%.1f\n",
opts2.VectorWeight, opts2.BM25Weight)
// Output: V=1.5, B=0.5 (favors semantics)
Returns SearchOptions with adapted weights. Other options (Limit, MinSimilarity) are set to defaults.
func (*SearchOptions) GetMinSimilarity ¶
func (o *SearchOptions) GetMinSimilarity(fallback float64) float64
GetMinSimilarity returns the MinSimilarity value, or the fallback if nil.
type SearchResponse ¶
type SearchResponse struct {
Status string `json:"status"`
Query string `json:"query"`
Results []SearchResult `json:"results"`
TotalCandidates int `json:"total_candidates"`
Returned int `json:"returned"`
SearchMethod string `json:"search_method"`
FallbackTriggered bool `json:"fallback_triggered"`
Message string `json:"message,omitempty"`
Metrics *SearchMetrics `json:"metrics,omitempty"`
}
SearchResponse is the response from a search operation.
type SearchResult ¶
type SearchResult struct {
ID string `json:"id"`
NodeID storage.NodeID `json:"nodeId"`
Type string `json:"type"`
Labels []string `json:"labels"`
Title string `json:"title,omitempty"`
Description string `json:"description,omitempty"`
ContentPreview string `json:"content_preview,omitempty"`
Properties map[string]any `json:"properties,omitempty"`
// Scoring
Score float64 `json:"score"`
Similarity float64 `json:"similarity,omitempty"`
// RRF metadata (vector_rank/bm25_rank are always emitted so clients see original
// ranks even when Stage-2 reranking is applied; 0 means not in that result set)
RRFScore float64 `json:"rrf_score,omitempty"`
VectorRank int `json:"vector_rank"`
BM25Rank int `json:"bm25_rank"`
}
SearchResult represents a unified search result.
type Service ¶
type Service struct {
// contains filtered or unexported fields
}
Service provides unified hybrid search with automatic index management.
The Service maintains:
- Vector index (HNSW for fast approximate nearest neighbor search)
- Full-text index (BM25 with inverted index)
- Connection to storage engine for node data enrichment
Thread-safe: Multiple goroutines can call Search() concurrently.
Example:
svc := search.NewService(engine)
defer svc.Close()
// Index existing data
if err := svc.BuildIndexes(ctx); err != nil {
log.Fatal(err)
}
// Index new nodes as they're created
node := &storage.Node{...}
if err := svc.IndexNode(node); err != nil {
log.Printf("Failed to index: %v", err)
}
func NewService ¶
func NewServiceWithDimensions ¶
NewServiceWithDimensions creates a search Service with the specified embedding dimensions. Use this when your embedding model produces vectors of a different size than the default 1024.
Example:
// For Apple Intelligence embeddings (512 dimensions) svc := search.NewServiceWithDimensions(engine, 512) // For OpenAI text-embedding-3-small (1536 dimensions) svc := search.NewServiceWithDimensions(engine, 1536)
func NewServiceWithDimensionsAndBM25Engine ¶
func NewServiceWithDimensionsAndBM25Engine(engine storage.Engine, dimensions int, bm25Engine string) *Service
NewServiceWithDimensionsAndBM25Engine creates a search Service with explicit BM25 engine selection. bm25Engine values: "v1", "v2" (invalid values default to "v1").
func (*Service) BM25Engine ¶
BM25Engine returns the configured BM25 engine for this service ("v1" or "v2").
func (*Service) BuildInProgress ¶
BuildInProgress reports whether BuildIndexes is currently running.
func (*Service) BuildIndexes ¶
BuildIndexes builds search indexes from all nodes in the engine. Call this after storage (and WAL recovery) so indexes reflect durable state. When both fulltext and vector index paths are set, tries to load both from disk; if both load with count > 0 (and semver format version matches), the full iteration is skipped. Otherwise iterates over storage and saves both indexes at the end when paths are set.
func (*Service) ClearVectorIndex ¶
func (s *Service) ClearVectorIndex()
ClearVectorIndex removes all embeddings from the vector index. This is used when regenerating all embeddings to reset the index count. Also frees memory from HNSW tombstones which can accumulate over time.
func (*Service) ClusterStats ¶
func (s *Service) ClusterStats() *gpu.ClusterStats
ClusterStats returns k-means clustering statistics.
func (*Service) ClusteringInProgress ¶
ClusteringInProgress returns true if k-means is currently running for this service. Used to debounce timer ticks and avoid starting a second run while one is in progress.
func (*Service) CrossEncoderAvailable ¶
CrossEncoderAvailable returns true if a cross-encoder reranker is configured and available.
func (*Service) CurrentStrategy ¶
CurrentStrategy returns the currently active vector search strategy label. Returns "unknown" until a vector pipeline has been initialized.
func (*Service) EmbeddingCount ¶
EmbeddingCount returns the total number of nodes with embeddings in the vector index.
func (*Service) EnableClustering ¶
EnableClustering enables GPU k-means clustering for accelerated vector search.
Performance Improvements (Real-World Benchmarks):
- 2,000 embeddings: ~14% faster (61ms vs 65ms avg)
- 4,500 embeddings: ~26% faster (35ms vs 47ms avg)
- 10,000+ embeddings: 10-50x faster (scales with dataset size)
The speedup increases with dataset size as the cluster-based search avoids comparing against all vectors.
Parameters:
- gpuManager: GPU manager for acceleration (can be nil for CPU-only)
- numClusters: Number of k-means clusters (0 for auto based on dataset size)
Call this BEFORE BuildIndexes(), then call TriggerClustering() after indexing.
Example:
gpuManager, _ := gpu.NewManager(nil) svc.EnableClustering(gpuManager, 100) // 100 clusters svc.BuildIndexes(ctx) svc.TriggerClustering(ctx) // Run k-means
func (*Service) GetBuildProgress ¶
func (s *Service) GetBuildProgress() BuildProgress
GetBuildProgress returns current search build progress for status APIs.
func (*Service) GetDefaultMinSimilarity ¶
GetDefaultMinSimilarity returns the configured minimum similarity threshold.
func (*Service) GetMinEmbeddingsForClustering ¶
GetMinEmbeddingsForClustering returns the current minimum embeddings threshold.
func (*Service) IndexNode ¶
IndexNode adds a node to all search indexes. All embeddings are stored in ChunkEmbeddings (even single chunk = array of 1).
func (*Service) IsClusteringEnabled ¶
IsClusteringEnabled returns true if GPU clustering is enabled.
func (*Service) IsReady ¶
IsReady reports whether the search indexes are fully built and ready to serve queries.
func (*Service) PersistIndexesToDisk ¶
func (s *Service) PersistIndexesToDisk()
PersistIndexesToDisk writes the current BM25, vector, HNSW, and IVF-HNSW (per-cluster) indexes to disk immediately. Call this on shutdown so the latest in-memory state is saved before exit, same as every other persisted index. Cancels any pending debounced persist so no write runs after shutdown.
func (*Service) RemoveNode ¶
RemoveNode removes a node from all search indexes. Also removes all chunk embeddings (for nodes with multiple chunks).
func (*Service) RerankCandidates ¶
func (s *Service) RerankCandidates(ctx context.Context, query string, candidates []RerankCandidate, opts *SearchOptions) ([]RerankResult, error)
RerankCandidates applies the configured Stage-2 reranker to caller-provided candidates. This is a seam-friendly API for adapters that need rerank semantics without running retrieval.
func (*Service) RerankerAvailable ¶
RerankerAvailable returns true if Stage-2 reranking is configured and available.
func (*Service) Search ¶
func (s *Service) Search(ctx context.Context, query string, embedding []float32, opts *SearchOptions) (*SearchResponse, error)
Search performs hybrid search with automatic fallback.
Search strategy:
- Try RRF hybrid search (vector + BM25) if embedding provided
- Fall back to vector-only if RRF returns no results
- Fall back to BM25-only if vector search fails or no embedding
This ensures you always get results even if one index is empty or fails.
Parameters:
- ctx: Context for cancellation
- query: Text query for BM25 search
- embedding: Vector embedding for similarity search (can be nil)
- opts: Search options (use DefaultSearchOptions() if unsure)
Example:
svc := search.NewService(engine)
// Hybrid search (best results)
query := "graph database memory"
embedding, _ := embedder.Embed(ctx, query)
opts := search.DefaultSearchOptions()
opts.Limit = 10
resp, err := svc.Search(ctx, query, embedding, opts)
if err != nil {
return err
}
fmt.Printf("Found %d results using %s\n",
resp.Returned, resp.SearchMethod)
for i, result := range resp.Results {
fmt.Printf("%d. [RRF: %.4f] %s\n",
i+1, result.RRFScore, result.Title)
fmt.Printf(" Vector rank: #%d, BM25 rank: #%d\n",
result.VectorRank, result.BM25Rank)
}
Returns a SearchResponse with ranked results and metadata about the search method used.
func (*Service) SetCrossEncoder ¶
func (s *Service) SetCrossEncoder(ce *CrossEncoder)
SetCrossEncoder configures the Stage-2 reranker to use the cross-encoder implementation.
Example:
svc := search.NewService(engine)
svc.SetCrossEncoder(search.NewCrossEncoder(&search.CrossEncoderConfig{
Enabled: true,
APIURL: "http://localhost:8081/rerank",
Model: "cross-encoder/ms-marco-MiniLM-L-6-v2",
}))
func (*Service) SetDefaultMinSimilarity ¶
SetDefaultMinSimilarity sets the default minimum cosine similarity threshold for vector search. Apple Intelligence embeddings produce scores in 0.2-0.8 range, bge-m3/mxbai produce 0.7-0.99. Default: 0.0 (let RRF ranking handle relevance filtering)
func (*Service) SetFulltextIndexPath ¶
SetFulltextIndexPath sets the path for persisting the BM25 fulltext index. When both fulltext and vector paths are set, BuildIndexes() will try to load both; if both load with count > 0, the full storage iteration is skipped.
func (*Service) SetGPUManager ¶
SetGPUManager enables GPU acceleration for exact brute-force vector search.
This is independent from k-means clustering. When enabled, the vector pipeline may choose GPU brute-force search (exact) for datasets where it outperforms HNSW.
func (*Service) SetHNSWIndexPath ¶
SetHNSWIndexPath sets the path for persisting the HNSW index. When set with persist search indexes, the HNSW index is saved after build/warmup and loaded on startup so the full graph does not need to be rebuilt from vectors.
func (*Service) SetMinEmbeddingsForClustering ¶
SetMinEmbeddingsForClustering sets the minimum number of embeddings required before k-means clustering is triggered. Below this threshold, brute-force search is used as it's faster for small datasets.
This should be called BEFORE TriggerClustering() to take effect.
Parameters:
- threshold: Minimum embeddings (must be > 0, default: 1000)
Tuning Guidelines:
- 1000 (default): Safe for most workloads
- 500-1000: Latency-sensitive applications with moderate data
- 100-500: Testing or small datasets
- 2000+: Very large datasets, delay clustering until more data arrives
Example:
svc := search.NewService(engine) svc.SetMinEmbeddingsForClustering(500) // Lower threshold for faster clustering svc.EnableClustering(gpuManager, 100) svc.BuildIndexes(ctx) svc.TriggerClustering(ctx) // Will cluster if >= 500 embeddings
func (*Service) SetPersistenceEnabled ¶
SetPersistenceEnabled controls whether index persistence is allowed. When false, all persist paths (debounced writes, build checkpoints, and shutdown flush) are treated as no-ops even if index paths are configured.
func (*Service) SetReranker ¶
SetReranker configures the Stage-2 reranker. Uses a quick read-lock check to avoid write-lock contention when the value hasn't changed. This prevents deadlock when multiple goroutines call getOrCreateSearchService concurrently.
func (*Service) SetVectorIndexPath ¶
SetVectorIndexPath sets the path for persisting the vector index. When both fulltext and vector paths are set, BuildIndexes() will try to load both; if both load with count > 0, the full storage iteration is skipped.
func (*Service) TriggerClustering ¶
TriggerClustering runs k-means clustering on all indexed embeddings. Stops promptly when ctx is cancelled (e.g. process shutdown). Only one run executes at a time per service; if clustering is already in progress, returns nil immediately (debounced).
Trigger Policies:
- After bulk loads: Automatically called after BuildIndexes() completes
- Periodic clustering: Background timer runs clustering at regular intervals
- Manual trigger: Call this after bulk data loading to enable k-means routing
Once clustering completes, the vector search pipeline automatically uses KMeansCandidateGen for candidate generation, providing significant speedup for very large datasets (N > 100K).
Returns nil (not error) if there are too few embeddings - clustering will be skipped silently as brute-force search is faster for small datasets. Returns error only if clustering is not enabled or fails unexpectedly.
func (*Service) VectorIndexDimensions ¶
VectorIndexDimensions returns the configured dimensions of the vector index.
func (*Service) VectorQueryNodes ¶
func (s *Service) VectorQueryNodes(ctx context.Context, queryEmbedding []float32, spec VectorQuerySpec) ([]VectorQueryHit, error)
VectorQueryNodes executes a Cypher-style vector query.
This method preserves Cypher semantics for per-node embedding selection:
- Prefer NamedEmbeddings[Property] (or "default" when Property is empty)
- If Property is set, next try node.Properties[Property] as a vector array
- Fallback to ChunkEmbeddings[0..N] (best score across chunks)
For performance, cosine-similarity queries are executed against the in-memory vector index (unified pipeline) rather than scanning storage.
func (*Service) VectorSearchCandidates ¶
func (s *Service) VectorSearchCandidates(ctx context.Context, embedding []float32, opts *SearchOptions) ([]SearchCandidate, error)
VectorSearchCandidates performs vector-only search and returns lightweight candidates without enrichment. It is optimized for throughput: it skips BM25, RRF fusion, and storage fetches.
This method uses the unified vector search pipeline (CandidateGen + ExactScore) with automatic strategy selection (brute-force for small N, HNSW for large N).
type VectorFileStore ¶
type VectorFileStore struct {
// contains filtered or unexported fields
}
VectorFileStore is an append-only vector store backed by a file. Only id→offset is kept in RAM; vector data lives on disk. All vectors are stored normalized (one copy per id).
func NewVectorFileStore ¶
func NewVectorFileStore(vecBasePath string, dimensions int) (*VectorFileStore, error)
NewVectorFileStore creates a new file-backed store and opens the vector file for append. vecBasePath is the path prefix: .vec and .meta will be appended. If the .vec file exists it is opened for append; otherwise it is created with a header.
func (*VectorFileStore) Add ¶
func (v *VectorFileStore) Add(id string, vec []float32) error
Add appends a normalized vector to the store. vec is normalized in place/copied; only one copy is stored.
func (*VectorFileStore) Close ¶
func (v *VectorFileStore) Close() error
Close closes the underlying file. The store must not be used after Close.
func (*VectorFileStore) CompactIfNeeded ¶
func (v *VectorFileStore) CompactIfNeeded() (bool, error)
CompactIfNeeded rewrites .vec with only live records when stale entries accumulate. The rewrite is atomic: write temp file, fsync, rename. Returns true when compaction actually ran.
func (*VectorFileStore) Count ¶
func (v *VectorFileStore) Count() int
Count returns the number of vectors in the store.
func (*VectorFileStore) GetBuildIndexedCount ¶
func (v *VectorFileStore) GetBuildIndexedCount() int64
GetBuildIndexedCount returns the last persisted checkpoint count (0 if none). Used at start of BuildIndexes to skip the first N nodes when resuming.
func (*VectorFileStore) GetDimensions ¶
func (v *VectorFileStore) GetDimensions() int
GetDimensions returns the vector dimension.
func (*VectorFileStore) GetVector ¶
func (v *VectorFileStore) GetVector(id string) ([]float32, bool)
GetVector returns a copy of the stored (normalized) vector for id, or (nil, false) if not found.
func (*VectorFileStore) Has ¶
func (v *VectorFileStore) Has(id string) bool
Has reports whether id is present in the id→offset map.
func (*VectorFileStore) IterateChunked ¶
func (v *VectorFileStore) IterateChunked(chunkSize int, fn func(ids []string, vecs [][]float32) error) error
IterateChunked reads the vector file in chunks and calls fn(ids, vecs) for each chunk. Used to build HNSW without loading all vectors into memory. fn may be called with fewer than chunkSize vectors on the last chunk.
func (*VectorFileStore) Load ¶
func (v *VectorFileStore) Load() error
Load populates the store from an existing .vec + .meta. The store must be created with NewVectorFileStore(vecBasePath, dimensions); Load then reads the .meta file to populate idToOff. The .vec file is already open from NewVectorFileStore.
func (*VectorFileStore) Remove ¶
func (v *VectorFileStore) Remove(id string) bool
Remove deletes id from the live id→offset map. The old .vec record is left in-place and reclaimed by compaction.
func (*VectorFileStore) Save ¶
func (v *VectorFileStore) Save() error
Save writes the id→offset map to the .meta file so the store can be loaded later. Copies idToOff under a short lock so the (potentially slow) encode doesn't block Add().
func (*VectorFileStore) SetBuildIndexedCount ¶
func (v *VectorFileStore) SetBuildIndexedCount(n int64)
SetBuildIndexedCount sets the last checkpoint count from BuildIndexes (for resume). Call before Save() when persisting after a checkpoint so the next run can skip already-indexed nodes.
func (*VectorFileStore) Sync ¶
func (v *VectorFileStore) Sync() error
Sync flushes the .vec file to disk so progress is visible and durable.
type VectorFileStoreMeta ¶
type VectorFileStoreMeta struct {
Version int `msgpack:"v"`
Dimensions int `msgpack:"dim"`
IDToOffset map[string]int64 `msgpack:"id2off"`
BuildIndexedCount int64 `msgpack:"build_count,omitempty"` // last checkpoint count during BuildIndexes; used for resume
}
VectorFileStoreMeta is persisted to the .meta file (msgpack).
type VectorGetter ¶
VectorGetter is implemented by *VectorIndex and by adapters for VectorLookup (e.g. file-backed store).
type VectorIndex ¶
type VectorIndex struct {
// contains filtered or unexported fields
}
VectorIndex provides exact vector similarity search using cosine similarity.
The index stores normalized vectors for efficient similarity computation and supports concurrent read/write operations. It uses brute-force search which provides exact results but has O(n) time complexity.
Key features:
- Exact cosine similarity computation
- Automatic vector normalization
- Thread-safe concurrent operations
- Context-aware search with cancellation
- Configurable similarity thresholds
Example:
// Create index for 512-dimensional vectors
index := search.NewVectorIndex(512)
// Add vectors
for id, vector := range vectors {
index.Add(id, vector)
}
// Search with minimum similarity threshold
results, _ := index.Search(ctx, queryVector, 5, 0.8)
for _, result := range results {
fmt.Printf("%s: %.3f\n", result.ID, result.Score)
}
Performance:
- Add: O(d) where d = dimensions
- Search: O(n×d) where n = number of vectors
- Memory: O(n×d) for normalized vectors
Thread Safety:
All methods are thread-safe using RWMutex for concurrent access.
func NewVectorIndex ¶
func NewVectorIndex(dimensions int) *VectorIndex
NewVectorIndex creates a new vector similarity index for the given dimensions.
The index is initialized empty and ready to accept vectors of the specified dimensionality. All vectors added to the index must have exactly this number of dimensions.
Parameters:
- dimensions: Number of dimensions for vectors (must be > 0)
Returns:
- VectorIndex ready for use
Example:
// Create index for OpenAI embeddings (1536 dimensions)
index := search.NewVectorIndex(1536)
// Create index for sentence transformers (384 dimensions)
index = search.NewVectorIndex(384)
// Index is ready to accept vectors
index.Add("doc1", embedding1)
func (*VectorIndex) Add ¶
func (v *VectorIndex) Add(id string, vec []float32) error
Add adds or updates a vector in the index with automatic normalization.
The vector is normalized to unit length for efficient cosine similarity computation. If a vector with the same ID already exists, it is replaced.
Parameters:
- id: Unique identifier for the vector
- vector: Vector to add (must match index dimensions)
Returns:
- ErrDimensionMismatch if vector dimensions don't match index
Example:
// Add document embeddings
for docID, embedding := range documentEmbeddings {
err := index.Add(docID, embedding)
if err != nil {
log.Printf("Failed to add %s: %v", docID, err)
}
}
// Update existing vector
newEmbedding := getUpdatedEmbedding("doc1")
index.Add("doc1", newEmbedding) // Replaces previous
Performance:
- Time: O(d) where d is vector dimensions
- Space: O(d) additional storage per vector
- Thread-safe: Uses write lock during update
Normalization:
Vectors are automatically normalized to unit length, so cosine similarity becomes a simple dot product during search.
func (*VectorIndex) Clear ¶
func (v *VectorIndex) Clear()
Clear removes all vectors from the index. This is useful when regenerating all embeddings to reset the index state.
func (*VectorIndex) Count ¶
func (v *VectorIndex) Count() int
Count returns the number of vectors in the index.
func (*VectorIndex) GetDimensions ¶
func (v *VectorIndex) GetDimensions() int
GetDimensions returns the vector dimensions.
func (*VectorIndex) GetVector ¶
func (v *VectorIndex) GetVector(id string) ([]float32, bool)
GetVector returns a copy of the normalized vector for id, or (nil, false) if not found. Used when loading a graph-only HNSW index to populate vectors from the vector index.
func (*VectorIndex) HasVector ¶
func (v *VectorIndex) HasVector(id string) bool
HasVector checks if a vector exists for the given ID.
func (*VectorIndex) Load ¶
func (v *VectorIndex) Load(path string) error
Load replaces the vector index with the one stored at path (msgpack format). If the file does not exist or decode fails, the index is left empty (or unchanged on missing file) and no error is returned, so the caller can proceed with a full rebuild. Returns an error only for unexpected I/O (e.g. permission denied).
func (*VectorIndex) Remove ¶
func (v *VectorIndex) Remove(id string)
Remove removes a vector from the index by its ID.
If the vector doesn't exist, this operation is a no-op.
Parameters:
- id: Identifier of the vector to remove
Example:
// Remove a document that was deleted
index.Remove("doc-123")
// Remove multiple vectors
for _, docID := range deletedDocs {
index.Remove(docID)
}
Performance:
- Time: O(1) hash map deletion
- Thread-safe: Uses write lock during removal
func (*VectorIndex) Save ¶
func (v *VectorIndex) Save(path string) error
Save writes the vector index to path (msgpack format). Dir is created if needed. Copies index data under a short read lock so I/O does not block Search/IndexNode.
func (*VectorIndex) Search ¶
func (v *VectorIndex) Search(ctx context.Context, query []float32, limit int, minSimilarity float64) ([]indexResult, error)
Search finds vectors similar to the query vector using cosine similarity.
The search computes cosine similarity between the query and all indexed vectors, filters by minimum similarity threshold, and returns the top results sorted by similarity score (highest first).
Parameters:
- ctx: Context for cancellation and timeouts
- query: Query vector (must match index dimensions)
- limit: Maximum number of results to return
- minSimilarity: Minimum similarity threshold (0.0 to 1.0)
Returns:
- Slice of indexResult sorted by similarity (descending)
- ErrDimensionMismatch if query dimensions don't match
Example:
// Search for top 10 most similar vectors
results, err := index.Search(ctx, queryVector, 10, 0.0)
if err != nil {
log.Fatal(err)
}
// Search with high similarity threshold
results, err = index.Search(ctx, queryVector, 5, 0.8)
for _, result := range results {
fmt.Printf("ID: %s, Similarity: %.3f\n", result.ID, result.Score)
}
// Search with timeout
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
results, err = index.Search(ctx, queryVector, 10, 0.7)
Performance:
- Time: O(n×d) where n=vectors, d=dimensions
- Space: O(k) where k=number of results
- Cancellation: Respects context cancellation during search
Similarity Range:
- 1.0: Identical vectors (same direction)
- 0.0: Orthogonal vectors (perpendicular)
- -1.0: Opposite vectors (opposite directions)
Thread Safety:
Uses read lock during search, allowing concurrent searches.
type VectorLookup ¶
VectorLookup returns a vector by ID (e.g. from the vector index). Used when loading a graph-only HNSW file.
type VectorQueryHit ¶
VectorQueryHit is a lightweight result row for Cypher-compatible vector queries. ID is the node ID (not a chunk/named vector ID).
type VectorQuerySpec ¶
type VectorQuerySpec struct {
// IndexName is informational only (used for debugging/observability).
IndexName string
// Label optionally filters candidate nodes to those that have this label.
Label string
// Property is the Cypher vector index "property" name.
//
// Resolution semantics:
// 1) Prefer NamedEmbeddings[Property] (or "default" when Property is empty)
// 2) If Property is set, next try node.Properties[Property] as a vector array
// 3) Fallback to ChunkEmbeddings[0..N] (best score across chunks)
Property string
// Similarity is the similarity function name. Supported values:
// "cosine" (default), "dot", "euclidean".
Similarity string
// Limit is the maximum number of results to return (top-K).
Limit int
}
VectorQuerySpec describes how to resolve embeddings for a Cypher-style vector query.
This is intentionally a "core" representation of Cypher vector index metadata: the Cypher layer can remain syntax-compatible, while the search layer owns the implementation details and can choose the best execution strategy.
type VectorSearchPipeline ¶
type VectorSearchPipeline struct {
// contains filtered or unexported fields
}
VectorSearchPipeline implements the unified vector search pipeline.
Pipeline stages:
- CandidateGen: Generate candidates (brute-force or HNSW)
- ExactScore: Re-score candidates exactly (CPU or GPU)
- Filter: Apply minSimilarity threshold
- TopK: Return top-k results
func NewVectorSearchPipeline ¶
func NewVectorSearchPipeline(candidateGen CandidateGenerator, exactScorer ExactScorer) *VectorSearchPipeline
NewVectorSearchPipeline creates a new vector search pipeline.
func (*VectorSearchPipeline) Search ¶
func (p *VectorSearchPipeline) Search(ctx context.Context, query []float32, k int, minSimilarity float64) ([]ScoredCandidate, error)
Search performs vector search using the pipeline.
Parameters:
- ctx: Context for cancellation
- query: Query vector (will be normalized)
- k: Desired number of results
- minSimilarity: Minimum similarity threshold
Returns:
- candidates: Top-k candidates with exact scores
- error: Context cancellation or other errors
Source Files
¶
- ann_profile.go
- ann_quality.go
- bm25_seed_provider.go
- build_settings.go
- convert_gob_to_msgpack.go
- fulltext_index.go
- fulltext_index_v2.go
- fulltext_index_v2_persist.go
- gpu_kmeans_candidate_gen.go
- hnsw_config.go
- hnsw_index.go
- hnsw_metal_stub.go
- hybrid_cluster_routing.go
- ivf_hnsw_candidate_gen.go
- ivfpq_build.go
- ivfpq_candidate_gen.go
- ivfpq_index.go
- ivfpq_persist.go
- ivfpq_types.go
- kalman_adapter.go
- kmeans_candidate_gen.go
- llm_rerank.go
- local_rerank.go
- persist_helpers.go
- rerank.go
- search.go
- vector_file_store.go
- vector_index.go
- vector_pipeline.go
- vector_query_spec.go
- version.go