database

package
v0.16.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 20, 2026 License: BSD-3-Clause Imports: 30 Imported by: 0

README

Databases

There are currently (4) supported database implemetations:

  • DuckDB - manages vector embeddings using the DuckDB database and the VSS extension. This is the default implementation.
  • SQLite - manages vector embeddings using the SQLite database and the the sqlite-vec extension.
  • Bleve - manages vector embeddings using the Bleve database and the faiss library.
  • S3Vectors - manages vector embeddings using the Amazon Web Services S3Vectors service.

Here's the "tl;dr":

The DuckDB implementation is generally faster than the SQLite but requires that all your data be stored in memory. That data is periodically exported to disk in order that it may be re-imported without indexing all the data from scratch but it takes a noticeable amount of time to import that data at start up time.

The SQLite implementation while has slower query times but stores (and reads) all its data from disk so it is fast to start.

The Bleve implementation is also fast, has a fast start-up time, doesn't require loading all the data in to memory, doesn't use an unmanageable amount of disk space but remains a non-trivial chore to set up because of the dependency on libfaiss (see details below). If you can get it to work a Bleve-backed database is pretty great but know that the build process may be a challenge.

The S3Vectors implementation is fast and demonstrates good query times. It is, however, dependent on a commercial service (Amazon Web Services (AWS)) where everything (from storage to queries) is metered. Depending on how your database access is configured this could lead to very large bills at the end of the month. If you have already made your peace with AWS then it can be a quick and easy way to get started with vector embeddings.

duckdb://

Manage embeddings use the DuckDB database and the VSS extension.

duckdb://{PATH}?{QUERY_PARAMETERS}

Where {PATH} is an optional value mapped to the location of an existing DuckDB database. If present this database will be used to instantiate the database. Depending on the size of the database this can take a noticeable amount of time. It is also the location where the database will exported to if the Server.Export method is called.

Valid parameters are:

Key Value Required Notes
dimensions int no The number of dimensions for the embeddings being stored. Default is 512.
max-distance float no Update the default maximum distance when querying for similar embeddings. Default is 1.0.
max-results int no Update the default number of records to return when querying for similar embeddings. Default is 10.

For example:

duckdb:///usr/local/data/embeddings

sqlite://

Manage embeddings use the SQLite database and the sqlite-vec extension.

sqlite://?{QUERY_PARAMETERS}

Valid parameters are:

Key Value Required Notes
dsn string yes A registered database/sql.Driver DSN string.
dimensions int no The number of dimensions for the embeddings being stored. Default is 512.
max-distance float no Update the default maximum distance when querying for similar embeddings. Default is 1.0.
max-results int no Update the default number of records to return when querying for similar embeddings. Default is 10.
compression string no The type of compression to use when storing embeddings. Options are: none, quantized, matroyshka. Default is "none".

For example:

sqlite://?dsn=file:/usr/local/data/embeddings.db

Note: As of this writing only the Go-language CGO bindings are supported. Support for "pure Go" bindings will be added in future releases.

bleve://

Manage embeddings use the Bleve document store.

bleve://{PATH}?{QUERY_PARAMETERS}

If {PATH} is omitted then an in-memory database will be created.

Valid parameters are:

Key Value Required Notes
dimensions int no The number of dimensions for the embeddings being stored. Default is 512.
similarity-metric string no The similarity metric used when comparing embeddings. Consult https://github.com/blevesearch/bleve/blob/master/docs/vectors.md for details. Note: This can not be changed after a Bleve index is created. Default is "l2_norm".
optimize-for string no The vector index optimization strategy to use. Consult https://github.com/blevesearch/bleve/blob/master/docs/vectors.md for details. Default is "latency".
max-distance float no Update the default maximum distance when querying for similar embeddings. Default is 5.0.
max-results int no Update the default number of records to return when querying for similar embeddings. Default is 10.

For example:

bleve:///usr/local/data/bleve-embeddings
Building (DuckDB)

Under the hood the Bleve implementation stores the static vector embeddings data in a separate DuckDB database. This is because the vector embeddings stored in Bleve itself are not returned as part of normal search queries and storing those data internally (to Bleve, outside of the search index) consumes an obscene amount of disk space. DuckDB simply uses less disk space.

What this means, practically, when building a Bleve-backed implementation of the tools in this package is you will need to do the go mod tidy && go mod vendor dance, described below, to pull in the DuckDB .a files. Everything else should be handled internally and not your concern.

Building (libfaiss)

This is a bit of a chore on a Mac. If you have already installed libfaiss from Homebrew (or whatever) you need to remove it and install the Bleve-specific fork:

$> git clone ssh://git@github.com/blevesearch/faiss.git
$> cd faiss

Note: The blevesearch/faiss checkpoint is relevant and specific to the version of blevesearch/bleve being used. For details consult: https://github.com/blevesearch/bleve/blob/master/docs/vectors.md#pre-requisites

Now issue the following commands:

$> export LDFLAGS="-L/opt/homebrew/opt/llvm/lib" \
$> export CPPFLAGS="-I/opt/homebrew/opt/llvm/include" \
$> export CXX=/opt/homebrew/opt/llvm/bin/clang++ \
$> export CC=/opt/homebrew/opt/llvm/bin/clang \

$> cmake -B build \
  -DFAISS_ENABLE_GPU=OFF \
  -DFAISS_ENABLE_C_API=ON \
  -DBUILD_SHARED_LIBS=ON \
  -DFAISS_ENABLE_PYTHON=OFF .

$> make -C build
$> sudo make -C build install
$> sudo cp build/c_api/libfaiss_c.dylib /usr/local/lib

Note that I had to use a completely different set of instructions to get libfaiss to compile on an Intel Mac. I don't know. For build instructions for Linux and Windows please consult the Bleve documentation.

As of the "v2.6.0" release of blevesearch/bleve everything should work. Per the documentation you can sanity check things as follows:

$> cd /usr/local/src/bleve
$> go test -ldflags "-r /usr/local/lib" ./... -tags=vectors

Assuming that all the tests pass you can build the tools in this package. Remember that you also need to include the -tags vectors and -ldflags -r /usr/local/lib when you build things. For example:

$> make cli TAGS=sqlite,bleve,vectors LDFLAGS='-s -w -r /usr/local/lib'
go build -tags=sqlite,bleve,vectors -mod readonly -ldflags="-s -w -r /usr/local/lib" -o bin/embeddingsdb-client cmd/client/main.go
...and so on
Other "known knowns"

I have observed that under some conditions importing large datasets (using the parquet-import tool for example) data corruption can occur. This problem seems to be related to memory-mapping and the go.etcd.io/bbolt package but I am not certain. These problems seem to have been resolved on Apple Silicon Macs but I continue to experience them on older Intel-based Macs.

s3vectors://

Manage embeddings use the Amazon Web Services (AWS) S3Vectors service. It also uses the AWS DynamoDB service to store metadata properties to enable functionality not otherwise available by the S3Vectors service.

This database implementation relies on a commercial service that is metered. Depending on how your database access is configured this could lead to very large bills at the end of the month. If you have already made your peace with AWS then it can be a quick and easy way to get started with vector embeddings.

s3vectors://{BUCKET_NAME}?{QUERY_PARAMETERS}

Where {BUCKET_NAME} is the name of the S3Vectors bucket where embeddings are stored. This will be created dynamically at runtime if it does not already exist.

Valid parameters are:

Key Value Required Notes
index string yes The name of the S3Vectors index where embeddings are stored. This will be created dynamically at runtime if it does not already exist.
region string yes The AWS region where your S3Vectors bucket is stored.
credentials string yes A valid aaronland/go-aws/v3/auth credentials string. Details are discussed below.
dimensions int no The number of dimensions for the embeddings being stored. Default is 512.
max-distance float no Update the default maximum distance when querying for similar embeddings. Default is 1.0.
max-results int no Update the default number of records to return when querying for similar embeddings. Default is 10.
refresh-tags bool no A boolean flag to update denormalized database properties in to index-specific "tags". Details are discussed below.
dynamodb-table string no Use a custom DynamoDB table name for storing and querying record data. Default is "s3vectors".
For example:
s3vectors://embeddings-bucket?index=embeddings-1024?region=us-east-1&credentials=iam:&dimensions=1024
AWS credentials

Under the hood this package uses the aaronland/go-aws/v3/auth package for deriving AWS credentials using string labels. Valid labels are:

Label Description
anon: Empty or anonymous credentials.
env: Read credentials from AWS-defined environment variables.
iam: Assume AWS IAM credentials are in effect.
iam:{REGION}:{ARN} Assume AWS IAM credentials are in effect after assuming the IAM Role defined by {ARN} (in {REGION}).
sts:{ARN} Assume the role defined by {ARN} using STS credentials.
{AWS_PROFILE_NAME} This this profile from the default AWS credentials location.
{AWS_CREDENTIALS_PATH}:{AWS_PROFILE_NAME} This this profile from a user-defined AWS credentials location.

IAM policies

The following are the minimal IAM policies you will need to have to use an S3Vectors-backed database. The following policies work are designed to work with a minimalist Lambs function but these should be adjusted as needed to the specifics of your situation.

S3Vectors
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DiscoverBuckets",
            "Effect": "Allow",
            "Action": [
                "s3vectors:ListVectorBuckets"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ReadAndQueryAllS3VectorIndices",
            "Effect": "Allow",
            "Action": [
                "s3vectors:GetIndex",
                "s3vectors:GetVectors",
                "s3vectors:QueryVectors",
                "s3vectors:ListVectors"
            ],
            "Resource": "arn:aws:s3vectors:{AWS_REGION}:{AWS_ACCOUNT_ID}:bucket/{BUCKET_NAME}/index/*"
        },
        {
            "Sid": "ManageAllS3VectorIndexTags",
            "Effect": "Allow",
            "Action": [
                "s3vectors:ListTagsForResource",
                "s3vectors:TagResource",
                "s3vectors:UntagResource"
            ],
            "Resource": "arn:aws:s3vectors:{AWS_REGION}:{AWS_ACCOUNT_ID}:bucket/{BUCKET_NAME}/index/*"
        },
        {
            "Sid": "ListIndicesInBucket",
            "Effect": "Allow",
            "Action": [
                "s3vectors:ListIndexes",
                "s3vectors:ListIndexes",
                "s3vectors:GetVectorBucket"
            ],
            "Resource": "arn:aws:s3vectors:{AWS_REGION}:{AWS_ACCOUNT_ID}:bucket/{BUCKET_NAME}"
        }
    ]
}
DynamoDB

Note the use of the s3vectors and s3vectors_metadata table names in the example below. These are the default values. If you reassign the value of the s3vectors table with the ?dynamodb-table={YOUR_TABLE} parameter, described above, you will need to update this example to replace s3vectors and s3vectors_metadata with {YOUR_TABLE} and {YOUR_TABLE}_metadata respecitively.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DynamoDBTableCreateDescribeAndList",
            "Effect": "Allow",
            "Action": [
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:ListTables"
            ],
            "Resource": [
                "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors",
                "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors/*",
                "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors_metadata",
                "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors_metadata/*",		
                "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/*"
            ]
        },
        {
            "Sid": "DynamoDBPutDelete",
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:DeleteItem"
            ],
            "Resource": [
	        "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors",
	        "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors_metadata"
	    ]
        },
        {
            "Sid": "DynamoDBQueryOnTableAndGSI",
            "Effect": "Allow",
            "Action": [
                "dynamodb:Query"
            ],
            "Resource": [
                "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors",
  		"arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors_metadata",		
                "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors/index/by_provider_model",
                "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors/index/by_model_provider",
                "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/s3vectors_metadata/index/GSI1"		
            ]
        }
    ]
}

Documentation

Index

Constants

View Source
const CountablePaginationTypeLabel string = "countable"
View Source
const CursorPaginationTypeLabel string = "cursor"
View Source
const DuckDBDatabaseScheme string = "duckdb"
View Source
const NullDatabaseScheme string = "null"
View Source
const NullPaginationTypeLabel string = "null"
View Source
const S3VectorsDatabaseScheme = "s3vectors"

S3VectorsDatabaseScheme is the URI scheme used to create a database backed by Amazon S3 Vectors.

Variables

View Source
var RecordNotFound = errors.New("Record not found")

Functions

func DatabaseSchemes

func DatabaseSchemes() []string

Schemes returns the list of schemes that have been registered.

func DeriveModelDimensions added in v0.11.0

func DeriveModelDimensions(ctx context.Context, model string, opts ...options.Option) (int, error)

func InflateDuckDBRecord added in v0.7.1

func InflateDuckDBRecord(ctx context.Context, rows any) (*embeddingsdb.Record, error)

func ListRecords added in v0.13.0

func ListRecords(ctx context.Context, db Database, list_opts *ListRecordsOptions, opts ...options.Option) iter.Seq2[*embeddingsdb.Record, error]

ListRecords returns an [iter.Seq2[*embeddingsdb.Record, error]] iterator for listing all the records in an `embeddingsdb` database. It handles all the pagination requirements derived from 'opts'.

func RegisterDatabase

func RegisterDatabase(ctx context.Context, scheme string, init_func DatabaseInitializationFunc) error

RegisterDatabase registers 'scheme' as a key pointing to 'init_func' in an internal lookup table used to create new `Database` instances by the `NewDatabase` method.

func SetupDuckDBDatabase added in v0.7.1

func SetupDuckDBDatabase(ctx context.Context, db *sql.DB, opts *SetupDuckDBDatabaseOptions) error

Types

type Database

type Database interface {
	// Return the URI string used to instantiate the Database instance.
	URI() string
	// Add adds a [embeddingsdb.Record] instance to the underlying database implementation. Returns true or false if the addition was batched.
	AddRecord(context.Context, *embeddingsdb.Record, ...options.Option) (bool, error)
	// The number of batched records currently waiting to be added.
	BatchedRecordsCount(context.Context, ...options.Option) (int, error)
	// Add the pending batched records.
	AddBatchedRecords(context.Context, ...options.Option) error
	// Return the EmbeddingsDB instance record matching 'provider', 'depiction_id' and 'model'.
	GetRecord(context.Context, *embeddingsdb.GetRecordRequest, ...options.Option) (*embeddingsdb.Record, error)
	// Remove a record from an EmbeddingsDB instance.
	RemoveRecord(context.Context, *embeddingsdb.RemoveRecordRequest, ...options.Option) error
	// ListRecords returns a paginated list of records stored in the database.
	ListRecords(context.Context, pagination.Options, ...options.Option) ([]*embeddingsdb.Record, pagination.Results, error)
	// IterateRecords returns an [iter.Seq2[*embeddingsdb.Record, error]] for each record stored in the database.
	IterateRecords(context.Context, ...options.Option) iter.Seq2[*embeddingsdb.Record, error]
	// Find similar records for a given model and record instance.
	SimilarRecords(context.Context, *embeddingsdb.SimilarRecordsRequest, ...options.Option) ([]*embeddingsdb.SimilarRecord, error)
	// Export the contents of the database. Where and how a database is exported are left as details for specific implementations.
	Export(context.Context, string, ...options.Option) error
	// Return the Unix timestamp of the last update to the Database instance.
	LastUpdate(context.Context, ...options.Option) (int64, error)
	// Return the list of dimensions supported by this Database  implementation.
	Dimensions(context.Context, ...options.Option) ([]int, error)
	// Return the unique list of models, for zero (all) or more providers, across all the embeddings.
	Models(context.Context, ...options.Option) ([]string, error)
	// Return the unique list of providers across all the embeddings.
	Providers(context.Context, ...options.Option) ([]string, error)
	// Return the pagination type used by the database implementation.
	PaginationType(context.Context, ...options.Option) (PaginationType, error)
	// Close performs and terminating functions required by the database.
	Close(context.Context) error
}

Database defines an interface for adding and querying vector embeddings of embeddingsdb.Record records.

func NewDatabase

func NewDatabase(ctx context.Context, uri string) (Database, error)

NewDatabase returns a new `Database` instance configured by 'uri'. The value of 'uri' is parsed as a `url.URL` and its scheme is used as the key for a corresponding `DatabaseInitializationFunc` function used to instantiate the new `Database`. It is assumed that the scheme (and initialization function) have been registered by the `RegisterDatabase` method.

func NewDuckDBDatabase added in v0.7.1

func NewDuckDBDatabase(ctx context.Context, uri string) (Database, error)

Create a new DuckDBDatabase instance for managing embeddings using the DuckDB database and VSS extension derived from 'uri' which is expected to take the form of:

duckdb://{PATH}?{QUERY_PARAMETERS}

Valid query parameters are: * `dimensions` – The number of dimensions for the embeddings being stored. Default is 512. * `max-distance` – Update the default maximum distance when querying for similar embeddings. Default is 1.0. * `max-results` – Update the default number of records to return when querying for similar embeddings. Default is 10.

func NewNullDatabase

func NewNullDatabase(ctx context.Context, uri string) (Database, error)

func NewS3VectorsDatabase added in v0.11.0

func NewS3VectorsDatabase(ctx context.Context, uri string) (Database, error)

Create a new S3VectorsDatabase instance for managing embeddings using the Amazon Web Services S3Vectors serice derived from 'uri' which is expected to take the form of:

s3vectors://{BUCKET_NAME}?{QUERY_PARAMETERS}

Where `{BUCKET_NAME}` is the name of the S3Vectors bucket where embeddings are stored. This will be created dynamically at runtime if it does not already exist. Valid query parameters are:

  • `index` - The name of the S3Vectors index where embeddings are stored. This will be created dynamically at runtime if it does not already exist.
  • `region` - The AWS region where your S3Vectors bucket is stored.
  • `credentials` - A valid `aaronland/go-aws/v3/auth` credentials string.
  • `dimensions` – The number of dimensions for the embeddings being stored. Default is 512.
  • `max-distance` – Update the default maximum distance when querying for similar embeddings. Default is 1.0.
  • `max-results` – Update the default number of records to return when querying for similar embeddings. Default is 10.
  • `refresh-tags` - A boolean flag to update denormalized database properties in to index-specific "tags".
  • `dynamodb-table` – Use a custom DynamoDB table name for storing and querying record data. Default is "s3vectors".

type DatabaseInitializationFunc

type DatabaseInitializationFunc func(ctx context.Context, uri string) (Database, error)

DatabaseInitializationFunc is a function defined by individual database package and used to create an instance of that database

type DuckDBDatabase added in v0.7.1

type DuckDBDatabase struct {
	Database
	// contains filtered or unexported fields
}

func (*DuckDBDatabase) AddBatchedRecord added in v0.8.0

func (db *DuckDBDatabase) AddBatchedRecord(ctx context.Context, opts ...options.Option) error

Add adds a embeddingsdb.Record instance to the underlying database implementation. Returns true or false if the addition was batched.

func (*DuckDBDatabase) AddRecord added in v0.7.1

func (db *DuckDBDatabase) AddRecord(ctx context.Context, rec *embeddingsdb.Record, opts ...options.Option) (bool, error)

Add adds a embeddingsdb.Record instance to the underlying database implementation. Returns true or false if the addition was batched.

func (*DuckDBDatabase) BatchedRecordsCount added in v0.8.0

func (db *DuckDBDatabase) BatchedRecordsCount(ctx context.Context, opts ...options.Option) (int, error)

The number of batched records currently waiting to be added.

func (*DuckDBDatabase) Close added in v0.7.1

func (db *DuckDBDatabase) Close(ctx context.Context) error

Close performs and terminating functions required by the database.

func (*DuckDBDatabase) Dimensions added in v0.11.0

func (db *DuckDBDatabase) Dimensions(ctx context.Context, opts ...options.Option) ([]int, error)

Return the list of dimensions supported by this Database implementation.

func (*DuckDBDatabase) Export added in v0.7.1

func (db *DuckDBDatabase) Export(ctx context.Context, uri string, opts ...options.Option) error

Export the contents of the database. This method will export the DuckDB database to 'uri'.

func (*DuckDBDatabase) GetRecord added in v0.7.1

Return the EmbeddingsDB instance record matching 'provider', 'depiction_id' and 'model'.

func (*DuckDBDatabase) IterateRecords added in v0.7.1

func (db *DuckDBDatabase) IterateRecords(ctx context.Context, opts ...options.Option) iter.Seq2[*embeddingsdb.Record, error]

IterateRecords returns an [iter.Seq2[*embeddingsdb.Record, error]] for each record stored in the database.

func (*DuckDBDatabase) LastUpdate added in v0.7.1

func (db *DuckDBDatabase) LastUpdate(ctx context.Context, opts ...options.Option) (int64, error)

Return the Unix timestamp of the last update to the Database instance.

func (*DuckDBDatabase) ListRecords added in v0.7.1

func (db *DuckDBDatabase) ListRecords(ctx context.Context, pg_opts pagination.Options, opts ...options.Option) ([]*embeddingsdb.Record, pagination.Results, error)

ListRecords returns a paginated list of records stored in the database.

func (*DuckDBDatabase) Models added in v0.7.1

func (db *DuckDBDatabase) Models(ctx context.Context, opts ...options.Option) ([]string, error)

Return the unique list of models, for zero (all) or more providers, across all the embeddings.

func (*DuckDBDatabase) PaginationType added in v0.11.0

func (db *DuckDBDatabase) PaginationType(ctx context.Context, opts ...options.Option) (PaginationType, error)

Return the pagination type used by the database.

func (*DuckDBDatabase) Providers added in v0.7.1

func (db *DuckDBDatabase) Providers(ctx context.Context, opts ...options.Option) ([]string, error)

Return the unique list of providers across all the embeddings.

func (*DuckDBDatabase) RemoveRecord added in v0.7.1

func (db *DuckDBDatabase) RemoveRecord(ctx context.Context, req *embeddingsdb.RemoveRecordRequest, opts ...options.Option) error

Remove a record from an EmbeddingsDB instance.

func (*DuckDBDatabase) SimilarRecords added in v0.7.1

Find similar records for a given model and record instance.

func (*DuckDBDatabase) URI added in v0.7.1

func (db *DuckDBDatabase) URI() string

Return the URI string used to instantiate the Database instance.

type ListRecordsOptions added in v0.13.0

type ListRecordsOptions struct {
	// The number of records to return in each set of paginated results.
	PerPage int64
	// The initial page number to return paginated results for.
	StartPage int64
	// The maximum page number to return paginated results for. If -1 then this flag is ignored.
	EndPage int64
}

ListRecordOptions defines configuration options for calling the `ListRecords` method.

func DefaultListRecordsOptions added in v0.13.0

func DefaultListRecordsOptions() *ListRecordsOptions

DefaultListRecordsOptions returns a ListRecordsOptions with default values for returning all the records in an `embeddings` database in paginated sets of 1000 records.

type NullDatabase

type NullDatabase struct {
	Database
}

func (*NullDatabase) AddBatchedRecord added in v0.8.0

func (db *NullDatabase) AddBatchedRecord(ctx context.Context, opts ...options.Option) error

Add the pending batched records.

func (*NullDatabase) AddRecord

func (db *NullDatabase) AddRecord(ctx context.Context, rec *embeddingsdb.Record, opts ...options.Option) (bool, error)

Add adds a embeddingsdb.Record instance to the underlying database implementation. Returns true or false if the addition was batched.

func (*NullDatabase) BatchedRecordsCount added in v0.8.0

func (db *NullDatabase) BatchedRecordsCount(ctx context.Context, opts ...options.Option) (int, error)

The number of batched records currently waiting to be added.

func (*NullDatabase) Close

func (db *NullDatabase) Close(ctx context.Context) error

Close performs and terminating functions required by the database.

func (*NullDatabase) Dimensions added in v0.11.0

func (db *NullDatabase) Dimensions(ctx context.Context, opts ...options.Option) ([]int, error)

Return the list of dimensions supported by this Database implementation.

func (*NullDatabase) Export

func (db *NullDatabase) Export(ctx context.Context, uri string, opts ...options.Option) error

Export the contents of the database. Where and how a database is exported are left as details for specific implementations.

func (*NullDatabase) GetRecord

Return the EmbeddingsDB instance record matching 'provider', 'depiction_id' and 'model'.

func (*NullDatabase) IterateRecords added in v0.3.0

func (db *NullDatabase) IterateRecords(ctx context.Context, opts ...options.Option) iter.Seq2[*embeddingsdb.Record, error]

IterateRecords returns an [iter.Seq2[*embeddingsdb.Record, error]] for each record stored in the database.

func (*NullDatabase) LastUpdate

func (db *NullDatabase) LastUpdate(ctx context.Context, opts ...options.Option) (int64, error)

Return the Unix timestamp of the last update to the Database instance.

func (*NullDatabase) ListRecords added in v0.5.0

func (db *NullDatabase) ListRecords(ctx context.Context, pg_opts pagination.Options, opts ...options.Option) ([]*embeddingsdb.Record, pagination.Results, error)

ListRecords returns a paginated list of records stored in the database.

func (*NullDatabase) Models added in v0.1.0

func (db *NullDatabase) Models(ctx context.Context, opts ...options.Option) ([]string, error)

Return the unique list of models, for zero (all) or more providers, across all the embeddings.

func (*NullDatabase) PaginationType added in v0.11.0

func (db *NullDatabase) PaginationType(ctx context.Context, opts ...options.Option) (PaginationType, error)

Return the pagination type used by the database.

func (*NullDatabase) Providers added in v0.1.0

func (db *NullDatabase) Providers(ctx context.Context, opts ...options.Option) ([]string, error)

Return the unique list of providers across all the embeddings.

func (*NullDatabase) RemoveRecord added in v0.6.0

func (db *NullDatabase) RemoveRecord(ctx context.Context, req *embeddingsdb.RemoveRecordRequest, opts ...options.Option) error

Remove a record from an EmbeddingsDB instance.

func (*NullDatabase) SimilarRecords

Find similar records for a given model and record instance.

func (*NullDatabase) URI

func (db *NullDatabase) URI() string

Return the URI string used to instantiate the Database instance.

type PaginationType added in v0.11.0

type PaginationType uint8
const (
	NullPaginationType PaginationType = iota
	CountablePaginationType
	CursorPaginationType
)

func NewPaginationType added in v0.11.0

func NewPaginationType(label string) (PaginationType, error)

func (PaginationType) String added in v0.11.0

func (p PaginationType) String() string

type S3VectorsDatabase added in v0.11.0

type S3VectorsDatabase struct {
	Database
	// contains filtered or unexported fields
}

S3VectorsDatabase is a concrete implementation of the embeddingsdb.Database interface that stores embeddings in an S3 Vectors bucket and index. It optionally maintains a DynamoDB table for fast listing by provider or model.

func (*S3VectorsDatabase) AddBatchedRecord added in v0.11.0

func (db *S3VectorsDatabase) AddBatchedRecord(ctx context.Context, opts ...options.Option) error

Add the pending batched records.

func (*S3VectorsDatabase) AddRecord added in v0.11.0

func (db *S3VectorsDatabase) AddRecord(ctx context.Context, rec *embeddingsdb.Record, opts ...options.Option) (bool, error)

Add adds a embeddingsdb.Record instance to the underlying database implementation. Returns true or false if the addition was batched.

func (*S3VectorsDatabase) BatchedRecordsCount added in v0.11.0

func (db *S3VectorsDatabase) BatchedRecordsCount(ctx context.Context, opts ...options.Option) (int, error)

The number of batched records currently waiting to be added.

func (*S3VectorsDatabase) Close added in v0.11.0

func (db *S3VectorsDatabase) Close(ctx context.Context) error

Close performs and terminating functions required by the database.

func (*S3VectorsDatabase) Dimensions added in v0.11.0

func (db *S3VectorsDatabase) Dimensions(ctx context.Context, opts ...options.Option) ([]int, error)

Return the list of dimensions supported by this Database implementation.

func (*S3VectorsDatabase) Export added in v0.11.0

func (db *S3VectorsDatabase) Export(ctx context.Context, uri string, opts ...options.Option) error

Export the contents of the database. Where and how a database is exported are left as details for specific implementations.

func (*S3VectorsDatabase) GetRecord added in v0.11.0

GetRecord retrieves a single Record from S3 Vectors using the key composed from provider, model and depiction_id. If the record is not found, RecordNotFound is returned.

func (*S3VectorsDatabase) IterateRecords added in v0.11.0

func (db *S3VectorsDatabase) IterateRecords(ctx context.Context, opts ...options.Option) iter.Seq2[*embeddingsdb.Record, error]

IterateRecords returns a [iter.Seq2[*embeddingsdb.Record, error]] that yields every record stored in the database. The sequence is lazy and will continue until the context is cancelled or an error occurs.

func (*S3VectorsDatabase) LastUpdate added in v0.11.0

func (db *S3VectorsDatabase) LastUpdate(ctx context.Context, opts ...options.Option) (int64, error)

Return the Unix timestamp of the last update to the Database instance. As of this writing this always returns 0 because the cost of constantly crawling the index and the mechanics of denormalizing this data and then keeping in sync are too high.

func (*S3VectorsDatabase) ListRecords added in v0.11.0

ListRecords returns a paginated list of all records in the database. When a DynamoDB client is configured the method falls back to using it for filtering by provider or model. The returned Results object contains the pagination cursors.

func (*S3VectorsDatabase) Models added in v0.11.0

func (db *S3VectorsDatabase) Models(ctx context.Context, opts ...options.Option) ([]string, error)

Return the unique list of models, for zero (all) or more providers, across all the embeddings.

func (*S3VectorsDatabase) PaginationType added in v0.11.0

func (db *S3VectorsDatabase) PaginationType(ctx context.Context, opts ...options.Option) (PaginationType, error)

Return the pagination type used by the database.

func (*S3VectorsDatabase) Providers added in v0.11.0

func (db *S3VectorsDatabase) Providers(ctx context.Context, opts ...options.Option) ([]string, error)

Return the unique list of providers across all the embeddings.

func (*S3VectorsDatabase) RemoveRecord added in v0.11.0

RemoveRecord deletes the record identified by req from the S3 Vectors index and, if configured, from the DynamoDB table. Errors from either store are returned.

func (*S3VectorsDatabase) SimilarRecords added in v0.11.0

SimilarRecords searches for embeddings similar to those in req. The result slice contains the matching records together with their similarity distance. The search can be restricted by provider, model, distance and a list of depiction IDs to exclude via the supplied options.

type SetupDuckDBDatabaseOptions added in v0.7.1

type SetupDuckDBDatabaseOptions struct {
	Dimensions   int
	DatabasePath string
}

Directories

Path Synopsis
Package s3vectors contains helper code for maintaining a DynamoDB table that mirrors the records stored in an S3 Vectors index.
Package s3vectors contains helper code for maintaining a DynamoDB table that mirrors the records stored in an S3 Vectors index.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL