Documentation
¶
Overview ¶
Package core provides the core functionality for GoVector, a vector database library. It includes vector indexing, storage, and search capabilities compatible with Qdrant.
Index ¶
- func CalculateDistance(metric Distance, a, b []float32) float32
- func MatchFilter(payload Payload, filter *Filter) bool
- type Collection
- func (c *Collection) Count() int
- func (c *Collection) Delete(points []string, filter *Filter) (int, error)
- func (c *Collection) GetPointsByFilter(filter *Filter) ([]PointStruct, error)
- func (c *Collection) Search(queryVector []float32, filter *Filter, topK int) ([]ScoredPoint, error)
- func (c *Collection) Upsert(points []PointStruct) error
- type CollectionMeta
- type Condition
- type ConditionType
- type Distance
- type Filter
- type FlatIndex
- func (f *FlatIndex) Count() int
- func (f *FlatIndex) Delete(id string) error
- func (f *FlatIndex) DeleteByFilter(filter *Filter) ([]string, error)
- func (f *FlatIndex) GetIDsByFilter(filter *Filter) []string
- func (f *FlatIndex) GetPointsByFilter(filter *Filter) []PointStruct
- func (f *FlatIndex) Search(query []float32, filter *Filter, topK int) ([]ScoredPoint, error)
- func (f *FlatIndex) Upsert(points []PointStruct) error
- type HNSWIndex
- func (h *HNSWIndex) Count() int
- func (h *HNSWIndex) Delete(id string) error
- func (h *HNSWIndex) DeleteByFilter(filter *Filter) ([]string, error)
- func (h *HNSWIndex) GetIDsByFilter(filter *Filter) []string
- func (h *HNSWIndex) GetPointsByFilter(filter *Filter) []PointStruct
- func (h *HNSWIndex) Search(query []float32, filter *Filter, topK int) ([]ScoredPoint, error)
- func (h *HNSWIndex) Upsert(points []PointStruct) error
- type HNSWParams
- type MatchValue
- type Payload
- type PointStruct
- type Quantizer
- type RangeValue
- type SQ8Quantizer
- type ScoredPoint
- type Storage
- func (s *Storage) Close() error
- func (s *Storage) DeletePoints(colName string, ids []string) error
- func (s *Storage) DropCollection(name string) error
- func (s *Storage) EnsureCollection(colName string) error
- func (s *Storage) ListCollectionMetas() ([]CollectionMeta, error)
- func (s *Storage) ListCollections() ([]string, error)
- func (s *Storage) LoadCollection(colName string) (map[string]*PointStruct, error)
- func (s *Storage) LoadCollectionMeta(name string) (*CollectionMeta, error)
- func (s *Storage) SaveCollectionMeta(name string, meta CollectionMeta) error
- func (s *Storage) UpsertPoints(colName string, points []PointStruct) error
- type VectorIndex
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CalculateDistance ¶
CalculateDistance computes the similarity or distance between two vectors based on the specified metric. For Cosine and Dot, higher values indicate greater similarity. For Euclidean, lower values indicate greater similarity.
func MatchFilter ¶
MatchFilter evaluates whether a given payload matches the filter criteria. It returns true if:
- The filter is nil (no filtering)
- All Must conditions are satisfied
- No MustNot conditions are satisfied
Types ¶
type Collection ¶
type Collection struct {
Name string // Unique name of the collection
VectorLen int // Dimension of vectors in this collection
Metric Distance // Distance metric used for similarity search
// contains filtered or unexported fields
}
Collection represents a single logical group of vectors, similar to a table in SQL databases. It provides thread-safe operations for inserting, searching, and deleting vectors. Each collection has a fixed vector dimension and uses a specific distance metric.
func NewCollection ¶
func NewCollection(name string, vectorLen int, metric Distance, store *Storage, useHNSW bool) (*Collection, error)
NewCollection initializes a new vector collection, optionally loading from storage. If useHNSW is true, it uses the optimized HNSW graph search; otherwise uses flat memory search. When storage is provided, existing points are automatically loaded into memory. Returns an error if the collection cannot be created or if loaded points have invalid dimensions.
func NewCollectionWithParams ¶ added in v0.1.3
func NewCollectionWithParams(name string, vectorLen int, metric Distance, store *Storage, useHNSW bool, hnswParams HNSWParams) (*Collection, error)
NewCollectionWithParams initializes a new vector collection with custom HNSW parameters. If useHNSW is true, it uses the optimized HNSW graph search with the provided parameters; otherwise uses flat memory search. When storage is provided, existing points are automatically loaded into memory. Returns an error if the collection cannot be created or if loaded points have invalid dimensions.
func (*Collection) Count ¶
func (c *Collection) Count() int
Count returns the number of points currently stored in the collection.
func (*Collection) Delete ¶
func (c *Collection) Delete(points []string, filter *Filter) (int, error)
Delete removes points either by explicit IDs or by a filter match. If points slice is provided, those specific points are deleted. If filter is provided, all points matching the filter are deleted. Returns the number of points deleted and any error encountered. Returns an error if neither points nor filter is provided. Ensures data consistency between storage and memory index.
func (*Collection) GetPointsByFilter ¶ added in v0.1.5
func (c *Collection) GetPointsByFilter(filter *Filter) ([]PointStruct, error)
GetPointsByFilter returns all points with full payload and vectors that match the given filter.
func (*Collection) Search ¶
func (c *Collection) Search(queryVector []float32, filter *Filter, topK int) ([]ScoredPoint, error)
Search performs a similarity search using the underlying VectorIndex. It finds the topK nearest neighbors to the query vector, optionally filtered by payload. Returns an error if the query vector dimension doesn't match the collection.
func (*Collection) Upsert ¶
func (c *Collection) Upsert(points []PointStruct) error
Upsert adds or updates points in the collection. Points are first persisted to disk (if storage is configured), then updated in the memory index. Returns an error if any point has an invalid vector length or if persistence fails. Ensures data consistency between storage and memory index.
type CollectionMeta ¶ added in v0.1.3
type CollectionMeta struct {
Name string `json:"name"`
VectorLen int `json:"vector_size"`
Metric Distance `json:"distance"`
UseHNSW bool `json:"hnsw"`
HNSWParams HNSWParams `json:"parameters"`
}
CollectionMeta stores the configuration and metadata for a vector collection. It is used for persisting collection settings and reloading them on server restart.
type Condition ¶
type Condition struct {
Key string `json:"key"`
Type ConditionType `json:"type"`
Match MatchValue `json:"match,omitempty"`
Range *RangeValue `json:"range,omitempty"`
}
Condition represents a single filter condition on a specific payload key.
type ConditionType ¶ added in v0.1.3
type ConditionType string
ConditionType defines the type of filter condition
const ( // MatchTypeExact exact value match MatchTypeExact ConditionType = "exact" // MatchTypeRange range match (greater than, less than, etc.) MatchTypeRange ConditionType = "range" // MatchTypePrefix prefix match MatchTypePrefix ConditionType = "prefix" // MatchTypeContains contains match (for arrays) MatchTypeContains ConditionType = "contains" // MatchTypeRegex regex match MatchTypeRegex ConditionType = "regex" )
type Distance ¶
type Distance string
Distance represents the metric used for vector comparison and similarity search. Different metrics are suitable for different use cases and vector types.
const ( // Cosine measures the cosine of the angle between two vectors. // It is normalized by vector magnitude, making it suitable for // comparing vectors of different scales. Range: [-1, 1] (higher is more similar) Cosine Distance = "Cosine" // Euclid measures the straight-line distance between two vectors. // It is sensitive to vector magnitude. Range: [0, +inf) (lower is more similar) Euclid Distance = "Euclid" // Dot computes the dot product of two vectors. // It measures both magnitude and direction. Range: (-inf, +inf) (higher is more similar) Dot Distance = "Dot" )
type Filter ¶
type Filter struct {
Must []Condition `json:"must,omitempty"`
MustNot []Condition `json:"must_not,omitempty"`
}
Filter defines conditions for querying or deleting points based on their payload. It supports Must (all conditions must match) and MustNot (all conditions must not match) clauses.
type FlatIndex ¶
type FlatIndex struct {
// contains filtered or unexported fields
}
FlatIndex implements VectorIndex using a brute-force search algorithm. It stores all vectors in memory and performs exhaustive distance calculations for each query. This provides exact results but has O(n) search complexity, making it suitable for small to medium datasets.
func NewFlatIndex ¶
NewFlatIndex creates a new flat memory index with the specified distance metric.
func (*FlatIndex) Delete ¶
Delete removes a point from the index by its ID. Returns an error if the point does not exist.
func (*FlatIndex) DeleteByFilter ¶
DeleteByFilter removes all points that match the given filter and returns their IDs.
func (*FlatIndex) GetIDsByFilter ¶ added in v0.1.3
GetIDsByFilter returns all point IDs that match the given filter.
func (*FlatIndex) GetPointsByFilter ¶ added in v0.1.5
func (f *FlatIndex) GetPointsByFilter(filter *Filter) []PointStruct
GetPointsByFilter returns all points with full payload matching the given filter.
func (*FlatIndex) Search ¶
Search performs a brute-force search for the nearest neighbors. It calculates the distance between the query vector and all stored vectors, filters by payload if specified, and returns the topK results. Results are sorted by relevance: descending for Cosine/Dot, ascending for Euclidean.
func (*FlatIndex) Upsert ¶
func (f *FlatIndex) Upsert(points []PointStruct) error
Upsert adds or updates points in the index. If a point with the same ID already exists, it will be overwritten.
type HNSWIndex ¶
type HNSWIndex struct {
// contains filtered or unexported fields
}
HNSWIndex wraps the coder/hnsw graph to provide an approximate nearest neighbor search. It uses Hierarchical Navigable Small World graphs for efficient similarity search with sub-linear complexity, making it suitable for large datasets.
func NewHNSWIndex ¶
NewHNSWIndex creates a new HNSW index engine with the specified distance metric. It configures the underlying HNSW graph with appropriate distance functions for Cosine, Euclidean, or Dot product metrics.
func NewHNSWIndexWithParams ¶ added in v0.1.3
func NewHNSWIndexWithParams(metric Distance, params HNSWParams) *HNSWIndex
NewHNSWIndexWithParams creates a new HNSW index engine with custom parameters. It allows fine-tuning of HNSW parameters for specific use cases.
func (*HNSWIndex) Delete ¶
Delete removes a point from both the HNSW graph and the local points map. Returns an error if the point does not exist in the graph.
func (*HNSWIndex) DeleteByFilter ¶
DeleteByFilter removes all points that match the given filter from both the HNSW graph and the local points map. Returns the IDs of deleted points.
func (*HNSWIndex) GetIDsByFilter ¶ added in v0.1.3
GetIDsByFilter returns all point IDs that match the given filter.
func (*HNSWIndex) GetPointsByFilter ¶ added in v0.1.5
func (h *HNSWIndex) GetPointsByFilter(filter *Filter) []PointStruct
GetPointsByFilter returns all points with full payload matching the given filter.
func (*HNSWIndex) Search ¶
Search performs an approximate nearest neighbor search using the HNSW algorithm. It uses a post-filtering strategy: over-fetches results to account for filtered points, then applies the payload filter and returns the topK matches.
func (*HNSWIndex) Upsert ¶
func (h *HNSWIndex) Upsert(points []PointStruct) error
Upsert adds or updates points in the HNSW graph. Points are added to both the HNSW graph for search and a local map for payload lookup.
type HNSWParams ¶ added in v0.1.3
type HNSWParams struct {
// M is the maximum number of connections per node
// Default: 16
M int
// EfConstruction is the size of the dynamic candidate list during construction
// Default: 200
EfConstruction int
// EfSearch is the size of the dynamic candidate list during search
// Default: 64
EfSearch int
// K is the number of nearest neighbors to return
// Default: 10
K int
}
HNSWParams contains configurable parameters for HNSW index See https://github.com/coder/hnsw for more details on these parameters
func DefaultHNSWParams ¶ added in v0.1.3
func DefaultHNSWParams() HNSWParams
DefaultHNSWParams returns default HNSW parameters
type MatchValue ¶
type MatchValue struct {
Value any `json:"value"`
}
MatchValue holds the value to match against in a filter condition.
type Payload ¶
Payload mimics Qdrant's payload structure, storing metadata as a map of string keys to any values. It is used for filtering points based on their associated metadata.
type PointStruct ¶
type PointStruct struct {
ID string `json:"id"` // UUID or uint64 (using string for now)
Version uint64 `json:"version"` // Incremental or timestamp-based version
Vector []float32 `json:"vector"` // The actual embeddings
Payload Payload `json:"payload,omitempty"` // Metadata for filtering
}
PointStruct represents a single vector data point, compatible with Qdrant's data model. It contains a unique identifier, the vector embedding, and optional metadata.
type Quantizer ¶ added in v0.1.3
type Quantizer interface {
// Quantize compresses a float32 vector to a compressed representation
Quantize(vector []float32) []byte
// Dequantize decompresses a compressed representation back to float32
Dequantize(data []byte) []float32
// GetCompressedSize returns the size in bytes of a quantized vector
GetCompressedSize(dim int) int
}
type RangeValue ¶ added in v0.1.3
type RangeValue struct {
GT any `json:"gt,omitempty"` // Greater than
GTE any `json:"gte,omitempty"` // Greater than or equal
LT any `json:"lt,omitempty"` // Less than
LTE any `json:"lte,omitempty"` // Less than or equal
}
RangeValue holds range values for range conditions
type SQ8Quantizer ¶ added in v0.1.3
type SQ8Quantizer struct{}
func NewSQ8Quantizer ¶ added in v0.1.3
func NewSQ8Quantizer() *SQ8Quantizer
NewSQ8Quantizer creates a new SQ8 quantizer
func (*SQ8Quantizer) Dequantize ¶ added in v0.1.3
func (q *SQ8Quantizer) Dequantize(data []byte) []float32
Dequantize decompresses an 8-bit integer vector back to float32
func (*SQ8Quantizer) GetCompressedSize ¶ added in v0.1.3
func (q *SQ8Quantizer) GetCompressedSize(dim int) int
GetCompressedSize returns the size in bytes of a quantized vector
func (*SQ8Quantizer) Quantize ¶ added in v0.1.3
func (q *SQ8Quantizer) Quantize(vector []float32) []byte
Quantize compresses a float32 vector to 8-bit integers
type ScoredPoint ¶
type ScoredPoint struct {
ID string `json:"id"`
Version uint64 `json:"version"`
Score float32 `json:"score"`
Payload Payload `json:"payload,omitempty"`
}
ScoredPoint is returned by search queries, containing the distance score and point data. The Score field represents the computed similarity/distance based on the collection's metric.
type Storage ¶
type Storage struct {
// contains filtered or unexported fields
}
Storage handles local persistence using BoltDB (bbolt). It provides durable storage for vector collections and their points.
func NewStorage ¶
NewStorage initializes a new BoltDB storage engine at the specified path. The database file will be created if it doesn't exist. Returns an error if the database cannot be opened.
func NewStorageWithQuantization ¶ added in v0.1.3
func NewStorageWithQuantization(dbPath string, useQuant bool, quantizer Quantizer) (*Storage, error)
NewStorageWithQuantization initializes a new BoltDB storage engine with optional vector quantization. If useQuant is true, vectors will be compressed using the provided quantizer. Returns an error if the database cannot be opened.
func (*Storage) Close ¶
Close gracefully closes the database connection. It is important to call this when done to ensure all data is flushed to disk. This method is idempotent and can be called multiple times.
func (*Storage) DeletePoints ¶
DeletePoints deletes a batch of points from disk by their IDs. If a point ID doesn't exist, it is silently skipped. Returns an error if the collection doesn't exist.
func (*Storage) DropCollection ¶ added in v0.1.4
DropCollection removes a collection and its metadata from the storage.
func (*Storage) EnsureCollection ¶
EnsureCollection creates a bucket for the collection if it doesn't already exist. Each collection is stored as a separate bucket in BoltDB.
func (*Storage) ListCollectionMetas ¶ added in v0.1.3
func (s *Storage) ListCollectionMetas() ([]CollectionMeta, error)
ListCollectionMetas returns all collection metadata stored in the database.
func (*Storage) ListCollections ¶ added in v0.1.3
ListCollections returns all collection names (bucket names) in the storage. It excludes the internal metadata bucket.
func (*Storage) LoadCollection ¶
func (s *Storage) LoadCollection(colName string) (map[string]*PointStruct, error)
LoadCollection loads all points for a collection from disk into memory. Returns a map of point IDs to PointStruct pointers. If quantization is enabled, compressed vectors are decompressed during loading. If the collection doesn't exist, returns an empty map without error.
func (*Storage) LoadCollectionMeta ¶ added in v0.1.3
func (s *Storage) LoadCollectionMeta(name string) (*CollectionMeta, error)
LoadCollectionMeta retrieves collection metadata from the special bucket.
func (*Storage) SaveCollectionMeta ¶ added in v0.1.3
func (s *Storage) SaveCollectionMeta(name string, meta CollectionMeta) error
SaveCollectionMeta persists collection metadata to a special bucket.
func (*Storage) UpsertPoints ¶
func (s *Storage) UpsertPoints(colName string, points []PointStruct) error
UpsertPoints saves or updates a batch of points to disk. Points are serialized as Protocol Buffers and stored with their ID as the key. If quantization is enabled, vectors are compressed before storage. Returns an error if the collection doesn't exist or if serialization fails.
type VectorIndex ¶
type VectorIndex interface {
// Upsert adds or updates points in the index.
// If a point with the same ID already exists, it will be overwritten.
Upsert(points []PointStruct) error
// Search finds the nearest neighbors to the query vector, optionally applying a filter.
// Returns up to topK results sorted by relevance.
Search(query []float32, filter *Filter, topK int) ([]ScoredPoint, error)
// Delete removes a point from the index by its ID.
// Returns an error if the point does not exist.
Delete(id string) error
// Count returns the number of vectors currently stored in the index.
Count() int
// GetIDsByFilter returns all point IDs that match the given filter.
GetIDsByFilter(filter *Filter) []string
// GetPointsByFilter returns all points (with full payload) that match the given filter.
GetPointsByFilter(filter *Filter) []PointStruct
// DeleteByFilter removes all points that match the given filter and returns their IDs.
DeleteByFilter(filter *Filter) ([]string, error)
}
VectorIndex defines the standard interface for vector search engines. This allows transparent switching between Flat (brute-force) and HNSW (approximate) strategies.