Documentation
¶
Overview ¶
Package hnsw implements a pure Go HNSW (Hierarchical Navigable Small World) graph for approximate nearest neighbor search.
This is a from-scratch implementation that avoids CGo/FAISS dependency, following the algorithm from "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs" by Malkov & Yashunin (2018).
Package hnsw provides a pure-Go HNSW (Hierarchical Navigable Small World) vector index. It is designed as a standalone, dependency-free module that can be used as a drop-in replacement for CGO-based vector libraries.
Index ¶
- Constants
- func CosineDistance(a, b []float32) float32
- func InnerProductDistance(a, b []float32) float32
- func L2DistanceSquared(a, b []float32) float32
- func PruneEmbeddings(g *Graph, csr *CSRGraph, keepFraction float64)
- type BackendBuilder
- type BackendSearcher
- type Builder
- func (b *Builder) AddVectors(ctx context.Context, indexData []byte, embeddings [][]float32, startID int64) ([]byte, error)
- func (b *Builder) Build(ctx context.Context, embeddings [][]float32) ([]byte, error)
- func (b *Builder) RemoveVectors(ctx context.Context, indexData []byte, ids []int64) ([]byte, error)
- type CSRGraph
- type CSRLevel
- type Candidate
- type Config
- type DistanceFunc
- type DistanceMetric
- type EmbeddingRecomputer
- type Factory
- type Graph
- func NewGraph(m, efConstruction, dimensions int) *Graph
- func NewGraphFromTopology(topo GraphTopology, distFunc DistanceFunc) *Graph
- func NewGraphWithDistance(m, efConstruction, dimensions int, distFunc DistanceFunc) *Graph
- func NewGraphWithOptions(m, efConstruction, dimensions int, distFunc DistanceFunc, useHeuristic bool) *Graph
- func (g *Graph) AllNodeIDs() []int64
- func (g *Graph) Dimensions() int
- func (g *Graph) GetEntryPoint() int64
- func (g *Graph) GetNode(id int64) (*Node, bool)
- func (g *Graph) Insert(id int64, vector []float32)
- func (g *Graph) MaxLevel() int
- func (g *Graph) RemoveVector(id int64)
- func (g *Graph) Search(query []float32, topK, ef int) []Candidate
- func (g *Graph) SearchWithRecompute(query []float32, topK, ef int, recompute func(ids []int64) [][]float32) []Candidate
- func (g *Graph) Size() int64
- type GraphTopology
- type HNSWConfig
- type IndexMeta
- type MmapBackendSearcher
- type MmapGraph
- type MmapSearcher
- func (s *MmapSearcher) Close() error
- func (s *MmapSearcher) Load(ctx context.Context, indexData []byte, meta IndexMeta) error
- func (s *MmapSearcher) LoadFromFile(ctx context.Context, path string) error
- func (s *MmapSearcher) Search(ctx context.Context, query []float32, topK int) ([]int64, []float32, error)
- func (s *MmapSearcher) SearchWithRecompute(ctx context.Context, query []float32, topK int, recompute EmbeddingRecomputer) ([]int64, []float32, error)
- type Node
- type Searcher
- func (s *Searcher) Close() error
- func (s *Searcher) Load(ctx context.Context, indexData []byte, meta IndexMeta) error
- func (s *Searcher) Search(ctx context.Context, query []float32, topK int) ([]int64, []float32, error)
- func (s *Searcher) SearchWithRecompute(ctx context.Context, query []float32, topK int, recompute EmbeddingRecomputer) ([]int64, []float32, error)
- type StorageStats
Constants ¶
const CSRMagic uint32 = 0x474C454E // "GLEN"
CSRMagic identifies a gleann CSR file.
const CSRVersion uint32 = 1
CSRVersion is the current format version.
Variables ¶
This section is empty.
Functions ¶
func CosineDistance ¶
CosineDistance computes cosine distance: 1 - cosine_similarity. Values range from 0 (identical) to 2 (opposite).
func InnerProductDistance ¶
InnerProductDistance computes negative inner product distance. We negate because HNSW expects lower = more similar.
func L2DistanceSquared ¶
L2DistanceSquared computes the squared L2 (Euclidean) distance between two vectors.
func PruneEmbeddings ¶
PruneEmbeddings removes embeddings from the graph, keeping only a specified fraction. Always keeps the entry point and high-degree nodes. This is LEANN's core storage optimization.
keepFraction: 0.0 = keep none (except entry point), 1.0 = keep all.
Types ¶
type BackendBuilder ¶
type BackendBuilder interface {
Build(ctx context.Context, embeddings [][]float32) ([]byte, error)
AddVectors(ctx context.Context, indexData []byte, embeddings [][]float32, startID int64) ([]byte, error)
RemoveVectors(ctx context.Context, indexData []byte, ids []int64) ([]byte, error)
}
BackendBuilder constructs and serializes an index.
type BackendSearcher ¶
type BackendSearcher interface {
Load(ctx context.Context, indexData []byte, meta IndexMeta) error
Search(ctx context.Context, query []float32, topK int) ([]int64, []float32, error)
SearchWithRecompute(ctx context.Context, query []float32, topK int, recompute EmbeddingRecomputer) ([]int64, []float32, error)
Close() error
}
BackendSearcher loads and queries an index.
type Builder ¶
type Builder struct {
// contains filtered or unexported fields
}
Builder builds HNSW indexes.
func (*Builder) AddVectors ¶
type CSRGraph ¶
type CSRGraph struct {
// Header fields
NumNodes int64
Dimensions int
MaxLevel int
EntryPoint int64
M int
// Per-level CSR structure
Levels []CSRLevel
// Optional stored embeddings (for high-degree nodes kept during pruning).
// Map from node ID to embedding vector.
StoredEmbeddings map[int64][]float32
// Node-to-internal ID mapping.
IDMap []int64 // internal index -> external ID
// Per-node level in the original HNSW graph.
// NodeLevels[i] is the level of IDMap[i].
NodeLevels []int
}
CSRGraph is a Compressed Sparse Row representation of an HNSW graph. This is the compact format used for storage — embeddings are pruned and recomputed on-demand during search.
Format mirrors Python LEANN's convert_to_csr:
Header: magic, version, numNodes, dimensions, maxLevel, entryPoint, M Per-level: LevelPtr[level] -> offset into NodeOffsets NodeOffsets[node] -> offset into NeighborsData NeighborsData -> flat list of neighbor IDs
Optional: embeddings for a fraction of nodes (entry point + high-degree nodes).
func ConvertToCSR ¶
ConvertToCSR converts an HNSW graph to CSR format.
func (*CSRGraph) Stats ¶
func (csr *CSRGraph) Stats() StorageStats
Stats computes storage statistics for the CSR graph.
func (*CSRGraph) ToGraph ¶
ToGraph reconstructs an HNSW Graph from a CSR representation. Nodes that had their embeddings pruned will have nil Vector fields.
func (*CSRGraph) ToGraphWithDistance ¶
func (csr *CSRGraph) ToGraphWithDistance(distFunc DistanceFunc) *Graph
ToGraphWithDistance reconstructs an HNSW Graph with a custom distance function.
type CSRLevel ¶
type CSRLevel struct {
// NodeOffsets[i] gives the start index in Neighbors for node i.
// NodeOffsets[i+1] - NodeOffsets[i] gives the number of neighbors.
NodeOffsets []int64
// Neighbors is a flat array of neighbor IDs.
Neighbors []int64
// NumNodes at this level.
NumNodes int
}
CSRLevel represents one level of the HNSW graph in CSR format.
type Config ¶
type Config struct {
IndexDir string `json:"index_dir"`
Backend string `json:"backend"`
HNSWConfig HNSWConfig `json:"hnsw_config,omitempty"`
}
Config is a minimal config struct carrying HNSW settings. In the main gleann application this is the gleann.Config struct; here we define a local equivalent to keep gleann-hnsw dependency-free.
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns a Config with sensible defaults.
type DistanceFunc ¶
DistanceFunc is a function that computes distance between two vectors. Lower values indicate more similar vectors.
func GetDistanceFunc ¶
func GetDistanceFunc(metric string) DistanceFunc
GetDistanceFunc returns the distance function for the given metric name.
type DistanceMetric ¶
type DistanceMetric string
DistanceMetric specifies the distance function for vector comparison.
const ( DistanceL2 DistanceMetric = "l2" DistanceCosine DistanceMetric = "cosine" DistanceIP DistanceMetric = "ip" )
type EmbeddingRecomputer ¶
EmbeddingRecomputer is a function that recomputes embeddings for given passage IDs.
type Factory ¶
type Factory struct{}
Factory creates HNSW backend builders and searchers.
func (*Factory) NewBuilder ¶
func (f *Factory) NewBuilder(config Config) BackendBuilder
func (*Factory) NewSearcher ¶
func (f *Factory) NewSearcher(config Config) BackendSearcher
type Graph ¶
type Graph struct {
// contains filtered or unexported fields
}
Graph is a multi-level HNSW graph for approximate nearest neighbor search.
func NewGraphFromTopology ¶
func NewGraphFromTopology(topo GraphTopology, distFunc DistanceFunc) *Graph
NewGraphFromTopology constructs a Graph from an externally-built topology. The returned graph is ready for ConvertToCSR / PruneEmbeddings / Search.
func NewGraphWithDistance ¶
func NewGraphWithDistance(m, efConstruction, dimensions int, distFunc DistanceFunc) *Graph
NewGraphWithDistance creates a new HNSW graph with a custom distance function. If distFunc is nil, L2 squared distance is used.
func NewGraphWithOptions ¶
func NewGraphWithOptions(m, efConstruction, dimensions int, distFunc DistanceFunc, useHeuristic bool) *Graph
NewGraphWithOptions creates a new HNSW graph with all options.
func (*Graph) AllNodeIDs ¶
AllNodeIDs returns all node IDs in the graph.
func (*Graph) Dimensions ¶
Dimensions returns the vector dimensions.
func (*Graph) GetEntryPoint ¶
GetEntryPoint returns the entry point node ID.
func (*Graph) RemoveVector ¶
RemoveVector sets a node's vector to nil (for embedding pruning).
func (*Graph) SearchWithRecompute ¶
func (g *Graph) SearchWithRecompute(query []float32, topK, ef int, recompute func(ids []int64) [][]float32) []Candidate
SearchWithRecompute performs search using on-demand embedding recomputation. Instead of using stored vectors, it calls the recompute function to get embeddings only for nodes visited during graph traversal.
type GraphTopology ¶
type GraphTopology struct {
NumNodes int
Dimensions int
M int
MaxLevel int
EntryPoint int64
// Per-node data (len = NumNodes).
NodeLevels []int
Embeddings [][]float32
// Per-node-per-level neighbor lists.
// Neighbors[i] has len = NodeLevels[i]+1; Neighbors[i][l] = IDs at level l.
Neighbors [][][]int64
}
GraphTopology holds the raw graph structure for importing an externally-constructed HNSW graph (e.g. from FAISS).
type HNSWConfig ¶
type HNSWConfig struct {
M int `json:"m"`
EfConstruction int `json:"ef_construction"`
EfSearch int `json:"ef_search"`
UseMmap bool `json:"use_mmap,omitempty"`
MaxLevel int `json:"max_level,omitempty"`
DistanceMetric DistanceMetric `json:"distance_metric,omitempty"`
UseHeuristic bool `json:"use_heuristic"`
PruneEmbeddings bool `json:"prune_embeddings"`
PruneKeepFraction float64 `json:"prune_keep_fraction"`
}
HNSWConfig holds HNSW-specific build and search parameters.
type IndexMeta ¶
type IndexMeta struct {
Name string `json:"name"`
Backend string `json:"backend"`
EmbeddingModel string `json:"embedding_model"`
Dimensions int `json:"dimensions"`
NumPassages int `json:"num_passages"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
Version string `json:"version"`
}
IndexMeta stores metadata about a built index.
func (IndexMeta) MarshalJSON ¶
MarshalJSON implements custom JSON marshaling for IndexMeta.
type MmapBackendSearcher ¶
type MmapBackendSearcher interface {
BackendSearcher
LoadFromFile(ctx context.Context, path string) error
}
MmapBackendSearcher optionally supports zero-copy memory-mapped search.
type MmapGraph ¶
type MmapGraph struct {
// Extracted header info
NumNodes int64
Dimensions int
MaxLevel int
EntryPoint int64
M int
NumStored int64
// contains filtered or unexported fields
}
MmapGraph is a zero-copy, memory-mapped representation of the CSRGraph. It bypasses the Go Garbage Collector by keeping all node and edge data off-heap in a flat `[]byte` slice backed directly by a file on disk.
func OpenMmapGraph ¶
OpenMmapGraph opens a CSR binary file and maps it directly into memory.
func (*MmapGraph) GetExternalID ¶
GetExternalID returns the external node ID for a given internal index.
func (*MmapGraph) GetNeighbors ¶
GetNeighbors returns the internal neighbor indices for a node at a specific level. This is the core operation of graph traversal, completely GC-free.
func (*MmapGraph) GetNodeLevel ¶
GetNodeLevel returns the maximum HNSW level for a given internal index.
func (*MmapGraph) GetStoredEmbedding ¶
GetStoredEmbedding tries to retrieve a pruned stored embedding for an internal index.
type MmapSearcher ¶
type MmapSearcher struct {
// contains filtered or unexported fields
}
MmapSearcher implements BackendSearcher directly on a memory-mapped graph.
func (*MmapSearcher) Close ¶
func (s *MmapSearcher) Close() error
func (*MmapSearcher) LoadFromFile ¶
func (s *MmapSearcher) LoadFromFile(ctx context.Context, path string) error
LoadFromFile allows mmapping the index directly bypassing bytes parameter.
func (*MmapSearcher) SearchWithRecompute ¶
func (s *MmapSearcher) SearchWithRecompute(ctx context.Context, query []float32, topK int, recompute EmbeddingRecomputer) ([]int64, []float32, error)
type Node ¶
type Node struct {
ID int64
Vector []float32
Level int
// Neighbors at each level. neighbors[level] = list of neighbor IDs.
Neighbors [][]int64
}
Node represents a single node in the HNSW graph.
type Searcher ¶
type Searcher struct {
// contains filtered or unexported fields
}
Searcher searches an HNSW index.
type StorageStats ¶
type StorageStats struct {
NumNodes int64
NumLevels int
TotalEdges int64
StoredEmbeddings int
PrunedEmbeddings int64
GraphSizeBytes int64
EmbeddingSizeBytes int64
TotalSizeBytes int64
CompressionRatio float64
OriginalSizeBytes int64
}
StorageStats returns statistics about CSR storage.