hnswgo

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 28, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

README

hnswgo

Tests Go Reference Go Report Card License

Go bindings for hnswlib — a fast, production-ready library for approximate nearest neighbor (ANN) search using Hierarchical Navigable Small World graphs.

Features

  • Three distance metrics — L2 (Euclidean), Inner Product, and Cosine similarity
  • Thread-safe — all operations protected by sync.RWMutex, safe for concurrent goroutines
  • Batch operations — parallel insert and search with configurable concurrency
  • Persistence — save and load indices to/from disk
  • Dynamic indices — resize capacity, mark/unmark deletions at runtime
  • Zero dependencies — only Go stdlib + vendored hnswlib C++ headers

Prerequisites

  • Go 1.21+
  • CGO enabled (CGO_ENABLED=1)
  • C/C++ compiler with C++11 support (gcc or clang)

Installation

go get github.com/midhunkrishna/hnswgo

Quick Start

package main

import (
	"fmt"
	"math/rand"

	"github.com/midhunkrishna/hnswgo"
)

func main() {
	// Create an index
	index, err := hnswgo.New(
		128,             // vector dimension
		16,              // M — max connections per node
		200,             // efConstruction — build-time accuracy
		42,              // random seed
		10000,           // max elements
		hnswgo.Cosine,   // distance metric
		false,           // allow replace deleted
	)
	if err != nil {
		panic(err)
	}
	defer index.Free()

	// Insert vectors
	vectors := make([][]float32, 1000)
	labels := make([]uint64, 1000)
	for i := range vectors {
		vectors[i] = make([]float32, 128)
		for j := range vectors[i] {
			vectors[i][j] = rand.Float32()
		}
		labels[i] = uint64(i)
	}

	if err := index.AddPoints(vectors, labels, 4, false); err != nil {
		panic(err)
	}

	// Search
	query := [][]float32{vectors[0]} // find nearest neighbors of first vector
	results, err := index.SearchKNN(query, 5, 1)
	if err != nil {
		panic(err)
	}

	for _, r := range results[0] {
		fmt.Printf("label: %d, distance: %f\n", r.Label, r.Distance)
	}

	// Save to disk
	if err := index.Save("my_index.bin"); err != nil {
		panic(err)
	}
}

See example/example.go for a more complete example.

API

Creating and Loading
// Create a new index
index, err := hnswgo.New(dim, M, efConstruction, randSeed, maxElements, spaceType, allowReplaceDeleted)

// Load from disk
index, err := hnswgo.Load(path, spaceType, dim, maxElements, allowReplaceDeleted)

// Release resources (idempotent, safe to call multiple times)
err := index.Free()
Inserting and Deleting
// Batch insert with concurrent workers
err := index.AddPoints(vectors, labels, concurrency, replaceDeleted)

// Soft-delete / restore
err := index.MarkDeleted(label)
err := index.UnmarkDeleted(label)
Searching
// Batch KNN search — returns one result slice per query vector
results, err := index.SearchKNN(queryVectors, topK, concurrency)

// Retrieve a stored vector by label
vector, err := index.GetDataByLabel(label)

Note: In Cosine space, GetDataByLabel returns the normalized vector, not the original input.

Configuration
// Set query-time accuracy/speed tradeoff (not persisted — set after Load)
err := index.SetEf(ef)

// Resize index capacity
err := index.ResizeIndex(newSize)

// Save index to disk
err := index.Save(path)
Index Info
count, err := index.GetCurrentCount()
capacity, err := index.GetMaxElements()
replaceable, err := index.GetAllowReplaceDeleted()
fileSize, err := index.IndexFileSize()

All methods return hnswgo.ErrIndexClosed after Free() has been called.

Parameters

Index Construction
Parameter Type Description
dim int Vector dimensionality
M int Max connections per node. Higher = better recall, more memory. See ALGO_PARAMS.md
efConstruction int Build-time search width. Higher = better index quality, slower builds. See ALGO_PARAMS.md
randSeed int Random seed for reproducibility
maxElements uint64 Maximum index capacity (can be resized later)
allowReplaceDeleted bool Allow new inserts to reuse slots of deleted elements
Query-Time
Parameter Type Description
ef int Search accuracy/speed tradeoff. Must be >= topK. Higher = better recall, slower queries
topK int Number of nearest neighbors to return
concurrency int Number of parallel workers for batch operations
Distance Metrics
SpaceType Metric Typical Use
hnswgo.L2 Euclidean distance General-purpose, geometric data
hnswgo.IP Inner product When vectors are pre-normalized
hnswgo.Cosine Cosine similarity Text/image embeddings, semantic search

Testing

go test -v -race ./...

The test suite covers core operations, error handling, edge cases, concurrency safety (with -race), and round-trip persistence across all space types.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-change)
  3. Commit your changes
  4. Push to your branch (git push origin feature/my-change)
  5. Open a Pull Request

Please ensure tests pass with the race detector enabled (go test -race ./...) before submitting.

License

Apache 2.0 — see LICENSE for details.

Acknowledgments

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrIndexClosed = errors.New("index is closed")

ErrIndexClosed is returned when an operation is attempted on a freed index.

Functions

This section is empty.

Types

type HnswIndex

type HnswIndex struct {
	// contains filtered or unexported fields
}

HnswIndex wraps the C index type and provides a set of useful index manipulation methods. All methods are safe for concurrent use from multiple goroutines.

func Load

func Load(location string, spaceType SpaceType, dim int, maxElements uint64, allowReplaceDeleted bool) (*HnswIndex, error)

Load loads data from an existing HNSW index file.

func New

func New(dim, M, efConstruction, randSeed int, maxElements uint64, spaceType SpaceType, allowReplaceDeleted bool) (*HnswIndex, error)

New creates a new HnswIndex with the specified dimension and other parameters. For details please see hnswlib documents. When allowReplaceDeleted is set, deleted elements can be replaced with new added ones.

func (*HnswIndex) AddPoints

func (idx *HnswIndex) AddPoints(vectors [][]float32, labels []uint64, concurrency int, replaceDeleted bool) error

AddPoints adds points. Updates the point if it is already in the index. If replacement of deleted elements is enabled: replaces previously deleted point if any, updating it with new point.

func (*HnswIndex) Free

func (idx *HnswIndex) Free()

Free releases resources bound to the index. Should be called when index is destroyed on close. Safe to call multiple times.

func (*HnswIndex) GetAllowReplaceDeleted

func (idx *HnswIndex) GetAllowReplaceDeleted() (bool, error)

GetAllowReplaceDeleted returns the setting of allowReplaceDeleted.

func (*HnswIndex) GetCurrentCount

func (idx *HnswIndex) GetCurrentCount() (uint64, error)

GetCurrentCount returns the current number of elements stored in the index.

func (*HnswIndex) GetDataByLabel

func (idx *HnswIndex) GetDataByLabel(label uint64) ([]float32, error)

GetDataByLabel retrieves the stored vector for the given label. For Cosine space, the returned vector is the normalized version that was stored, not the original input vector.

func (*HnswIndex) GetMaxElements

func (idx *HnswIndex) GetMaxElements() (uint64, error)

GetMaxElements returns the current capacity of the index.

func (*HnswIndex) IndexFileSize

func (idx *HnswIndex) IndexFileSize() (uint64, error)

IndexFileSize returns the index file size in bytes.

func (*HnswIndex) MarkDeleted

func (idx *HnswIndex) MarkDeleted(label uint64) error

MarkDeleted marks the element as deleted, so it will be omitted from search results.

func (*HnswIndex) ResizeIndex

func (idx *HnswIndex) ResizeIndex(newSize uint64) error

ResizeIndex changes the maximum capacity of the index.

func (*HnswIndex) Save

func (idx *HnswIndex) Save(location string) error

Save writes index data to disk.

func (*HnswIndex) SearchKNN

func (idx *HnswIndex) SearchKNN(vectors [][]float32, topK int, concurrency int) ([][]*SearchResult, error)

SearchKNN does a batch query against the index using the provided vectors. concurrency sets the threads to use for searching. For each of the queried vectors, topK SearchResults will be returned if no error occurred.

func (*HnswIndex) SetEf

func (idx *HnswIndex) SetEf(ef int) error

SetEf sets the query time accuracy/speed trade-off, defined by the ef parameter (see doc ALGO_PARAMS.md of hnswlib). Note that the parameter is currently not saved along with the index, so you need to set it manually after loading.

func (*HnswIndex) UnmarkDeleted

func (idx *HnswIndex) UnmarkDeleted(label uint64) error

UnmarkDeleted unmarks the element as deleted, so it will not be omitted from search results.

type SearchResult

type SearchResult struct {
	Label    uint64
	Distance float32
}

SearchResult is the result returned by search method. Field Distance may be of euclidean distance or inner product distance, or cosine distance, depending on the chosen space type.

type SpaceType

type SpaceType int
const (
	L2 SpaceType = iota
	IP
	Cosine
)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL