vdoc

package module

v0.0.0-...-d5371c7 Latest Latest Go to latest Published: Mar 25, 2026 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/abrander/vdoc

Links

Open Source Insights

README ¶

vdoc

A lightweight vector database for Go, built on BoltDB with OpenAI-compatible embeddings.

This is highly experimental and subject to change. Use at your own risk. Or better yet. Don't use it at all. It's not ready for production and may never be. But if you're curious about how a simple vector database might work in Go, feel free to explore the code and give feedback.

Overview

vdoc lets you store documents and search them by semantic similarity. It's designed to be simple, embeddable, and dependency-light.

Features

Embedded storage with BoltDB (no external service needed)
Semantic search via cosine similarity on vector embeddings
Flexible configuration via option pattern
OpenAI-compatible embedding APIs (including self-hosted endpoints)
Chunking for documents that exceed embedding limits

Basic Usage

// Create a database
db, err := vdoc.New("mydb.db",
    vdoc.WithEmbeddings(myEmbeddingFunc),
    vdoc.WithMaxChunkSize(500),
)
if err != nil {
    log.Fatal(err)
}
defer db.Close()

// Create a collection
collection, err := db.Collection("documents")
if err != nil {
    log.Fatal(err)
}

// Store documents
collection.Upsert(vdoc.Document{
    ID:   "doc1",
    Text: "Your content here",
})

// Search
results, err := collection.Search("query", 0.3)
for _, r := range results {
    fmt.Printf("%s: %.2f\n", r.Document.Text, r.Similarity)
}

Embeddings

vdoc doesn't include embeddings directly. Use the provided helpers:

// OpenAI
db, _ := vdoc.New("db.db",
    vdoc.WithEmbeddings(vdoc.NewOpenAIEmbeddings("api-key", "text-embedding-3-small")),
)

// OpenAI-compatible (self-hosted or another provider)
db, _ := vdoc.New("db.db",
    vdoc.WithNomicEmbedTextV2Model("http://localhost:8080/v1", "key", "model"),
)

License

MIT

Documentation ¶

Index ¶

Constants
Variables
type Collection
type Database
- func New(path string, options ...Option) (*Database, error)
- func (d *Database) Close() error
- func (d *Database) Collection(name string, options ...Option) (*Collection, error)
type Document
type EmbeddingsFunc
- func NewOpenAICompatibleEmbeddingsFunc(baseUrl string, apiKey string, model string, prefix string) EmbeddingsFunc
- func NewOpenAIEmbeddings(apiKey string, model string) EmbeddingsFunc
type Option
type Result
type SplitFunc
type Vector
- func NewVector(vector []float32, documentID string) *Vector

Constants ¶

View Source

const DefaultOpenAIBaseURL = "https://api.openai.com/v1"

DefaultOpenAIBaseURL is the default base URL for the OpenAI API. It can be used as a default value when creating an OpenAI embeddings function, but you can specify a different URL if needed (e.g. for enterprise or self-hosted deployments).

View Source

const DefaultOpenAiModel = "text-embedding-3-small"

DefaultOpenAiModel is the default model name for generating embeddings using the OpenAI API. It can be used as a default value when creating an OpenAI embeddings function, but you can specify a different model if needed (e.g. if you want to use a newer or more powerful model for better embeddings).

Variables ¶

View Source

var ErrDocumentNotFound = errors.New("document id not found")

ErrDocumentNotFound is returned when a document with the specified ID is not found in the collection. This can happen when trying to retrieve a document that does not exist.

View Source

var ErrNilCollection = errors.New("collection is nil or uninitialized")

ErrNilCollection is returned when a method is called on a nil or uninitialized Collection. This can happen if the Collection was not properly created using the Database.Collection() method, or if it was set to nil after creation. To avoid this error, always ensure that you create a Collection using the Database.Collection() method and do not set it to nil. If you encounter this error, it indicates a programming mistake where a Collection is being used without being properly initialized.

View Source

var ErrNilDatabase = errors.New("database is nil or uninitialized")

ErrNilDatabase is returned when a method is called on a nil or uninitialized Database. This can happen if the Database was not properly created using the New() function, or if it was set to nil after creation. To avoid this error, always ensure that you create a Database using the New() function and do not set it to nil. If you encounter this error, it indicates a programming mistake where a Database is being used without being properly initialized.

Functions ¶

This section is empty.

Types ¶

type Collection ¶

type Collection struct {
	// contains filtered or unexported fields
}

Collection represents a collection of documents within the database. It provides methods for storing, searching, retrieving, and deleting documents within the collection. Each collection has its own settings for embedding generation and preprocessing, which can be customized using the provided Option functions. IF no custom settings are provided when creating a collection, it will inherit the settings from the parent Database.

func (*Collection) Delete ¶

func (c *Collection) Delete(id string) error

Delete removes a document from the collection by its ID. It also deletes all associated vectors for the document. If the document is not found, it returns an error.

func (*Collection) Get ¶

func (c *Collection) Get(id string) (*Document, error)

Get retrieves a document from the collection by its ID. If the document is not found, it returns an error.

func (*Collection) Search ¶

func (c *Collection) Search(query string, minSimilarity float32) ([]Result, error)

Search performs a similarity search on the collection using the provided query string. It returns a list of results that have a cosine similarity score greater than or equal to the specified minimum similarity threshold. The result will be sorted in descending order of similarity.

func (*Collection) Upsert ¶

func (c *Collection) Upsert(documents ...Document) error

Upsert inserts or updates one or more documents in the collection. If a document with the same ID already exists, it will be updated with the new text and vectors. If it does not exist, it will be inserted as a new document. The method returns an error if any issues occur during the upsert process. This will either upsert all documents or none of them, so if an error occurs during the upsert of any document, the entire operation will be rolled back and no changes will be made to the collection.

type Database ¶

type Database struct {
	// contains filtered or unexported fields
}

Database is the main entry point for interacting with the document storage system.

func New ¶

func New(path string, options ...Option) (*Database, error)

func (*Database) Close ¶

func (d *Database) Close() error

func (*Database) Collection ¶

func (d *Database) Collection(name string, options ...Option) (*Collection, error)

Collection is the main interface for interacting with a specific collection of documents. It provides methods for storing, searching, retrieving, and deleting documents within the collection.

type Document ¶

type Document struct {
	ID   string `json:"id"`
	Text string `json:"text"`
}

type EmbeddingsFunc ¶

type EmbeddingsFunc func(text string) ([]float32, error)

EmbeddingsFunc is a function type that takes a string input and returns a slice of float32 values representing the embeddings, along with an error if the embedding generation fails.

func NewOpenAICompatibleEmbeddingsFunc ¶

func NewOpenAICompatibleEmbeddingsFunc(baseUrl string, apiKey string, model string, prefix string) EmbeddingsFunc

NewOpenAICompatibleEmbeddingsFunc creates an EmbeddingsFunc that uses an OpenAI compatible API to generate embeddings for text.

func NewOpenAIEmbeddings ¶

func NewOpenAIEmbeddings(apiKey string, model string) EmbeddingsFunc

NewOpenAIEmbeddings creates an EmbeddingsFunc that uses the OpenAI API to generate embeddings for text. You must provide your OpenAI API key and the name of the model you want to use for generating embeddings (e.g. "text-embedding-3-small").

type Option ¶

type Option func(*settings)

func WithEmbeddings ¶

func WithEmbeddings(f EmbeddingsFunc) Option

WithEmbeddings sets the same embedding function for both search and store operations.

func WithMaxChunkSize ¶

func WithMaxChunkSize(size int) Option

WithMaxChunkSize sets the maximum chunk size for document processing. This can be used to control how documents are split into smaller pieces for embeddings.

func WithNomicEmbedTextV2Model ¶

func WithNomicEmbedTextV2Model(baseUrl string, apiKey string, model string) Option

WithNomicEmbedTextV2Model is a convenience option for using the Nomic Embed Text V2 model for both search and store embeddings. It takes the base URL, API key, and model name as parameters and sets up the embedding functions accordingly.

func WithPreprocess ¶

func WithPreprocess(f func(string) string) Option

WithPreprocess sets the same preprocessing function for both search and store operations.

func WithSearchEmbeddings ¶

func WithSearchEmbeddings(f EmbeddingsFunc) Option

WithSearchEmbeddings sets the embedding function for search operations.

func WithSearchPreprocess ¶

func WithSearchPreprocess(f func(string) string) Option

WithSearchPreprocess sets the preprocessing function for search operations.

func WithSplitFunc ¶

func WithSplitFunc(f SplitFunc) Option

WithSplitFunc sets the function used to split documents into chunks for embedding generation. This allows for customization of how documents are divided into smaller pieces, which can be important for handling long documents or optimizing embedding quality. If you for example process code, it might be beneficial to split on syntax elements rather than just by character count and newlines.

func WithStoreEmbeddings ¶

func WithStoreEmbeddings(f EmbeddingsFunc) Option

WithStoreEmbeddings sets the embedding function for store operations.

func WithStorePreprocess ¶

func WithStorePreprocess(f func(string) string) Option

WithStorePreprocess sets the preprocessing function for store operations.

type Result ¶

type Result struct {
	Document   *Document `json:"document"`
	Similarity float32   `json:"similarity"`
}

type SplitFunc ¶

type SplitFunc func(text string, maxChunkSize int) []string

SplitFunc is a function type that takes a string input and a maximum chunk size, and returns a slice of strings representing the split chunks of the input text. This can be used to customize how documents are divided into smaller pieces for embedding generation.

type Vector ¶

type Vector struct {
	// contains filtered or unexported fields
}

FIXME: Make this private.

func NewVector ¶

func NewVector(vector []float32, documentID string) *Vector

NewVector creates a new Vector instance with the given vector and document ID. The vector is normalized to ensure consistent cosine similarity calculations. Please note that the input vector is modified in place for normalization, so if you need to keep the original vector, make a copy before calling this function.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL