vdoc

package module
v0.0.0-...-d5371c7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2026 License: MIT Imports: 14 Imported by: 0

README

vdoc

A lightweight vector database for Go, built on BoltDB with OpenAI-compatible embeddings.

This is highly experimental and subject to change. Use at your own risk. Or better yet. Don't use it at all. It's not ready for production and may never be. But if you're curious about how a simple vector database might work in Go, feel free to explore the code and give feedback.

Overview

vdoc lets you store documents and search them by semantic similarity. It's designed to be simple, embeddable, and dependency-light.

Features

  • Embedded storage with BoltDB (no external service needed)
  • Semantic search via cosine similarity on vector embeddings
  • Flexible configuration via option pattern
  • OpenAI-compatible embedding APIs (including self-hosted endpoints)
  • Chunking for documents that exceed embedding limits

Basic Usage

// Create a database
db, err := vdoc.New("mydb.db",
    vdoc.WithEmbeddings(myEmbeddingFunc),
    vdoc.WithMaxChunkSize(500),
)
if err != nil {
    log.Fatal(err)
}
defer db.Close()

// Create a collection
collection, err := db.Collection("documents")
if err != nil {
    log.Fatal(err)
}

// Store documents
collection.Upsert(vdoc.Document{
    ID:   "doc1",
    Text: "Your content here",
})

// Search
results, err := collection.Search("query", 0.3)
for _, r := range results {
    fmt.Printf("%s: %.2f\n", r.Document.Text, r.Similarity)
}

Embeddings

vdoc doesn't include embeddings directly. Use the provided helpers:

// OpenAI
db, _ := vdoc.New("db.db",
    vdoc.WithEmbeddings(vdoc.NewOpenAIEmbeddings("api-key", "text-embedding-3-small")),
)

// OpenAI-compatible (self-hosted or another provider)
db, _ := vdoc.New("db.db",
    vdoc.WithNomicEmbedTextV2Model("http://localhost:8080/v1", "key", "model"),
)

License

MIT

Documentation

Index

Constants

View Source
const DefaultOpenAIBaseURL = "https://api.openai.com/v1"

DefaultOpenAIBaseURL is the default base URL for the OpenAI API. It can be used as a default value when creating an OpenAI embeddings function, but you can specify a different URL if needed (e.g. for enterprise or self-hosted deployments).

View Source
const DefaultOpenAiModel = "text-embedding-3-small"

DefaultOpenAiModel is the default model name for generating embeddings using the OpenAI API. It can be used as a default value when creating an OpenAI embeddings function, but you can specify a different model if needed (e.g. if you want to use a newer or more powerful model for better embeddings).

Variables

View Source
var ErrDocumentNotFound = errors.New("document id not found")

ErrDocumentNotFound is returned when a document with the specified ID is not found in the collection. This can happen when trying to retrieve a document that does not exist.

View Source
var ErrNilCollection = errors.New("collection is nil or uninitialized")

ErrNilCollection is returned when a method is called on a nil or uninitialized Collection. This can happen if the Collection was not properly created using the Database.Collection() method, or if it was set to nil after creation. To avoid this error, always ensure that you create a Collection using the Database.Collection() method and do not set it to nil. If you encounter this error, it indicates a programming mistake where a Collection is being used without being properly initialized.

View Source
var ErrNilDatabase = errors.New("database is nil or uninitialized")

ErrNilDatabase is returned when a method is called on a nil or uninitialized Database. This can happen if the Database was not properly created using the New() function, or if it was set to nil after creation. To avoid this error, always ensure that you create a Database using the New() function and do not set it to nil. If you encounter this error, it indicates a programming mistake where a Database is being used without being properly initialized.

Functions

This section is empty.

Types

type Collection

type Collection struct {
	// contains filtered or unexported fields
}

Collection represents a collection of documents within the database. It provides methods for storing, searching, retrieving, and deleting documents within the collection. Each collection has its own settings for embedding generation and preprocessing, which can be customized using the provided Option functions. IF no custom settings are provided when creating a collection, it will inherit the settings from the parent Database.

func (*Collection) Delete

func (c *Collection) Delete(id string) error

Delete removes a document from the collection by its ID. It also deletes all associated vectors for the document. If the document is not found, it returns an error.

func (*Collection) Get

func (c *Collection) Get(id string) (*Document, error)

Get retrieves a document from the collection by its ID. If the document is not found, it returns an error.

func (*Collection) Search

func (c *Collection) Search(query string, minSimilarity float32) ([]Result, error)

Search performs a similarity search on the collection using the provided query string. It returns a list of results that have a cosine similarity score greater than or equal to the specified minimum similarity threshold. The result will be sorted in descending order of similarity.

func (*Collection) Upsert

func (c *Collection) Upsert(documents ...Document) error

Upsert inserts or updates one or more documents in the collection. If a document with the same ID already exists, it will be updated with the new text and vectors. If it does not exist, it will be inserted as a new document. The method returns an error if any issues occur during the upsert process. This will either upsert all documents or none of them, so if an error occurs during the upsert of any document, the entire operation will be rolled back and no changes will be made to the collection.

type Database

type Database struct {
	// contains filtered or unexported fields
}

Database is the main entry point for interacting with the document storage system.

func New

func New(path string, options ...Option) (*Database, error)

func (*Database) Close

func (d *Database) Close() error

func (*Database) Collection

func (d *Database) Collection(name string, options ...Option) (*Collection, error)

Collection is the main interface for interacting with a specific collection of documents. It provides methods for storing, searching, retrieving, and deleting documents within the collection.

type Document

type Document struct {
	ID   string `json:"id"`
	Text string `json:"text"`
}

type EmbeddingsFunc

type EmbeddingsFunc func(text string) ([]float32, error)

EmbeddingsFunc is a function type that takes a string input and returns a slice of float32 values representing the embeddings, along with an error if the embedding generation fails.

func NewOpenAICompatibleEmbeddingsFunc

func NewOpenAICompatibleEmbeddingsFunc(baseUrl string, apiKey string, model string, prefix string) EmbeddingsFunc

NewOpenAICompatibleEmbeddingsFunc creates an EmbeddingsFunc that uses an OpenAI compatible API to generate embeddings for text.

func NewOpenAIEmbeddings

func NewOpenAIEmbeddings(apiKey string, model string) EmbeddingsFunc

NewOpenAIEmbeddings creates an EmbeddingsFunc that uses the OpenAI API to generate embeddings for text. You must provide your OpenAI API key and the name of the model you want to use for generating embeddings (e.g. "text-embedding-3-small").

type Option

type Option func(*settings)

func WithEmbeddings

func WithEmbeddings(f EmbeddingsFunc) Option

WithEmbeddings sets the same embedding function for both search and store operations.

func WithMaxChunkSize

func WithMaxChunkSize(size int) Option

WithMaxChunkSize sets the maximum chunk size for document processing. This can be used to control how documents are split into smaller pieces for embeddings.

func WithNomicEmbedTextV2Model

func WithNomicEmbedTextV2Model(baseUrl string, apiKey string, model string) Option

WithNomicEmbedTextV2Model is a convenience option for using the Nomic Embed Text V2 model for both search and store embeddings. It takes the base URL, API key, and model name as parameters and sets up the embedding functions accordingly.

func WithPreprocess

func WithPreprocess(f func(string) string) Option

WithPreprocess sets the same preprocessing function for both search and store operations.

func WithSearchEmbeddings

func WithSearchEmbeddings(f EmbeddingsFunc) Option

WithSearchEmbeddings sets the embedding function for search operations.

func WithSearchPreprocess

func WithSearchPreprocess(f func(string) string) Option

WithSearchPreprocess sets the preprocessing function for search operations.

func WithSplitFunc

func WithSplitFunc(f SplitFunc) Option

WithSplitFunc sets the function used to split documents into chunks for embedding generation. This allows for customization of how documents are divided into smaller pieces, which can be important for handling long documents or optimizing embedding quality. If you for example process code, it might be beneficial to split on syntax elements rather than just by character count and newlines.

func WithStoreEmbeddings

func WithStoreEmbeddings(f EmbeddingsFunc) Option

WithStoreEmbeddings sets the embedding function for store operations.

func WithStorePreprocess

func WithStorePreprocess(f func(string) string) Option

WithStorePreprocess sets the preprocessing function for store operations.

type Result

type Result struct {
	Document   *Document `json:"document"`
	Similarity float32   `json:"similarity"`
}

type SplitFunc

type SplitFunc func(text string, maxChunkSize int) []string

SplitFunc is a function type that takes a string input and a maximum chunk size, and returns a slice of strings representing the split chunks of the input text. This can be used to customize how documents are divided into smaller pieces for embedding generation.

type Vector

type Vector struct {
	// contains filtered or unexported fields
}

FIXME: Make this private.

func NewVector

func NewVector(vector []float32, documentID string) *Vector

NewVector creates a new Vector instance with the given vector and document ID. The vector is normalized to ensure consistent cosine similarity calculations. Please note that the input vector is modified in place for normalization, so if you need to keep the original vector, make a copy before calling this function.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL