Documentation
¶
Overview ¶
Package pgvector provides a PostgreSQL/pgvector implementation of OmniRetrieve's vector.Index interface for vector similarity search.
Features ¶
- Full vector.Index, vector.BatchIndex, and vector.IndexManager support
- HNSW and IVFFlat index types
- Cosine, Euclidean, and Inner Product distance metrics
- Efficient batch upsert using PostgreSQL's ON CONFLICT
- Metadata filtering via JSONB
Usage ¶
import (
"database/sql"
_ "github.com/lib/pq"
"github.com/plexusone/omniretrieve/providers/pgvector"
)
// Connect to PostgreSQL
db, err := sql.Open("postgres", "postgres://user:pass@localhost/mydb?sslmode=disable")
if err != nil {
log.Fatal(err)
}
// Create index with default configuration
idx, err := pgvector.New(db, pgvector.DefaultConfig("embeddings", 1536))
if err != nil {
log.Fatal(err)
}
// Use with OmniRetrieve
retriever := vector.NewRetriever(vector.RetrieverConfig{
Index: idx,
Embedder: myEmbedder,
})
Configuration ¶
The Config struct allows customization of:
- Table name and vector dimensions
- Distance metric (cosine, euclidean, inner_product)
- Index type (HNSW, IVFFlat, or none)
- HNSW parameters (M, ef_construction)
- IVFFlat parameters (lists)
Requirements ¶
- PostgreSQL 11+ with pgvector extension installed
- CREATE EXTENSION permissions (or pre-installed extension)
Index Types ¶
HNSW (recommended):
- Best for high recall and low latency
- Higher memory usage
- Good for datasets up to ~10M vectors
IVFFlat:
- Good balance of speed and accuracy
- Lower memory usage
- Requires training (happens automatically)
- Good for larger datasets
Flat (no index):
- Exact search (100% recall)
- Slow for large datasets
- Use only for small datasets or testing
Package pgvector provides a pgvector implementation of vector.Index for OmniRetrieve.
Index ¶
- type Config
- type DistanceMetric
- type HNSWConfig
- type IVFFlatConfig
- type Index
- func (idx *Index) Delete(ctx context.Context, id string) error
- func (idx *Index) DeleteBatch(ctx context.Context, ids []string) error
- func (idx *Index) Insert(ctx context.Context, node vector.Node) error
- func (idx *Index) InsertBatch(ctx context.Context, nodes []vector.Node) error
- func (idx *Index) Name() string
- func (idx *Index) Search(ctx context.Context, embedding []float32, k int, filters map[string]string) ([]vector.SearchResult, error)
- func (idx *Index) Upsert(ctx context.Context, node vector.Node) error
- func (idx *Index) UpsertBatch(ctx context.Context, nodes []vector.Node) error
- type IndexType
- type Manager
- func (m *Manager) CreateIndex(ctx context.Context, cfg vector.IndexConfig) error
- func (m *Manager) DropIndex(ctx context.Context, name string) error
- func (m *Manager) IndexExists(ctx context.Context, name string) (bool, error)
- func (m *Manager) IndexStats(ctx context.Context, name string) (*vector.IndexStats, error)
- func (m *Manager) ListIndexes(ctx context.Context) ([]string, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct {
// TableName is the name of the table to use for vectors.
TableName string
// Dimensions is the vector dimension size.
Dimensions int
// DistanceMetric is the distance function (cosine, euclidean, inner_product).
DistanceMetric DistanceMetric
// CreateTableIfNotExists creates the table on first use if true.
CreateTableIfNotExists bool
// IndexType specifies the index algorithm (hnsw, ivfflat, or none).
IndexType IndexType
// HNSWConfig contains HNSW-specific parameters.
HNSWConfig *HNSWConfig
// IVFFlatConfig contains IVFFlat-specific parameters.
IVFFlatConfig *IVFFlatConfig
}
Config configures the pgvector index.
func DefaultConfig ¶
DefaultConfig returns a default configuration.
type DistanceMetric ¶
type DistanceMetric string
DistanceMetric defines the distance function for similarity.
const ( // DistanceCosine uses cosine distance (1 - cosine similarity). DistanceCosine DistanceMetric = "cosine" // DistanceEuclidean uses L2 (Euclidean) distance. DistanceEuclidean DistanceMetric = "euclidean" // DistanceInnerProduct uses negative inner product (for max inner product search). DistanceInnerProduct DistanceMetric = "inner_product" )
type HNSWConfig ¶
type HNSWConfig struct {
// M is the number of connections per layer (default 16).
M int
// EfConstruction is the size of the dynamic candidate list during construction (default 64).
EfConstruction int
}
HNSWConfig contains HNSW index parameters.
type IVFFlatConfig ¶
type IVFFlatConfig struct {
// Lists is the number of inverted lists (default sqrt(n) where n is row count).
Lists int
}
IVFFlatConfig contains IVFFlat index parameters.
type Index ¶
type Index struct {
// contains filtered or unexported fields
}
Index implements vector.Index using PostgreSQL with pgvector extension.
func (*Index) DeleteBatch ¶
DeleteBatch implements vector.BatchIndex.
func (*Index) InsertBatch ¶
InsertBatch implements vector.BatchIndex.
func (*Index) Search ¶
func (idx *Index) Search(ctx context.Context, embedding []float32, k int, filters map[string]string) ([]vector.SearchResult, error)
Search implements vector.Index.
type IndexType ¶
type IndexType string
IndexType defines the vector index algorithm.
const ( // IndexTypeNone uses no index (brute force). IndexTypeNone IndexType = "none" // IndexTypeHNSW uses HNSW (Hierarchical Navigable Small World) index. IndexTypeHNSW IndexType = "hnsw" // IndexTypeIVFFlat uses IVFFlat (Inverted File with Flat compression) index. IndexTypeIVFFlat IndexType = "ivfflat" )
type Manager ¶
type Manager struct {
// contains filtered or unexported fields
}
Manager implements vector.IndexManager for PostgreSQL with pgvector.
func (*Manager) CreateIndex ¶
CreateIndex implements vector.IndexManager.
func (*Manager) IndexExists ¶
IndexExists implements vector.IndexManager.
func (*Manager) IndexStats ¶
IndexStats implements vector.IndexManager.