Documentation
¶
Index ¶
- Constants
- Variables
- type Collection
- type Database
- type Document
- type EmbeddingsFunc
- type Option
- func WithEmbeddings(f EmbeddingsFunc) Option
- func WithMaxChunkSize(size int) Option
- func WithNomicEmbedTextV2Model(baseUrl string, apiKey string, model string) Option
- func WithPreprocess(f func(string) string) Option
- func WithSearchEmbeddings(f EmbeddingsFunc) Option
- func WithSearchPreprocess(f func(string) string) Option
- func WithSplitFunc(f SplitFunc) Option
- func WithStoreEmbeddings(f EmbeddingsFunc) Option
- func WithStorePreprocess(f func(string) string) Option
- type Result
- type SplitFunc
- type Vector
Constants ¶
const DefaultOpenAIBaseURL = "https://api.openai.com/v1"
DefaultOpenAIBaseURL is the default base URL for the OpenAI API. It can be used as a default value when creating an OpenAI embeddings function, but you can specify a different URL if needed (e.g. for enterprise or self-hosted deployments).
const DefaultOpenAiModel = "text-embedding-3-small"
DefaultOpenAiModel is the default model name for generating embeddings using the OpenAI API. It can be used as a default value when creating an OpenAI embeddings function, but you can specify a different model if needed (e.g. if you want to use a newer or more powerful model for better embeddings).
Variables ¶
var ErrDocumentNotFound = errors.New("document id not found")
ErrDocumentNotFound is returned when a document with the specified ID is not found in the collection. This can happen when trying to retrieve a document that does not exist.
var ErrNilCollection = errors.New("collection is nil or uninitialized")
ErrNilCollection is returned when a method is called on a nil or uninitialized Collection. This can happen if the Collection was not properly created using the Database.Collection() method, or if it was set to nil after creation. To avoid this error, always ensure that you create a Collection using the Database.Collection() method and do not set it to nil. If you encounter this error, it indicates a programming mistake where a Collection is being used without being properly initialized.
var ErrNilDatabase = errors.New("database is nil or uninitialized")
ErrNilDatabase is returned when a method is called on a nil or uninitialized Database. This can happen if the Database was not properly created using the New() function, or if it was set to nil after creation. To avoid this error, always ensure that you create a Database using the New() function and do not set it to nil. If you encounter this error, it indicates a programming mistake where a Database is being used without being properly initialized.
Functions ¶
This section is empty.
Types ¶
type Collection ¶
type Collection struct {
// contains filtered or unexported fields
}
Collection represents a collection of documents within the database. It provides methods for storing, searching, retrieving, and deleting documents within the collection. Each collection has its own settings for embedding generation and preprocessing, which can be customized using the provided Option functions. IF no custom settings are provided when creating a collection, it will inherit the settings from the parent Database.
func (*Collection) Delete ¶
func (c *Collection) Delete(id string) error
Delete removes a document from the collection by its ID. It also deletes all associated vectors for the document. If the document is not found, it returns an error.
func (*Collection) Get ¶
func (c *Collection) Get(id string) (*Document, error)
Get retrieves a document from the collection by its ID. If the document is not found, it returns an error.
func (*Collection) Search ¶
func (c *Collection) Search(query string, minSimilarity float32) ([]Result, error)
Search performs a similarity search on the collection using the provided query string. It returns a list of results that have a cosine similarity score greater than or equal to the specified minimum similarity threshold. The result will be sorted in descending order of similarity.
func (*Collection) Upsert ¶
func (c *Collection) Upsert(documents ...Document) error
Upsert inserts or updates one or more documents in the collection. If a document with the same ID already exists, it will be updated with the new text and vectors. If it does not exist, it will be inserted as a new document. The method returns an error if any issues occur during the upsert process. This will either upsert all documents or none of them, so if an error occurs during the upsert of any document, the entire operation will be rolled back and no changes will be made to the collection.
type Database ¶
type Database struct {
// contains filtered or unexported fields
}
Database is the main entry point for interacting with the document storage system.
func (*Database) Collection ¶
func (d *Database) Collection(name string, options ...Option) (*Collection, error)
Collection is the main interface for interacting with a specific collection of documents. It provides methods for storing, searching, retrieving, and deleting documents within the collection.
type EmbeddingsFunc ¶
EmbeddingsFunc is a function type that takes a string input and returns a slice of float32 values representing the embeddings, along with an error if the embedding generation fails.
func NewOpenAICompatibleEmbeddingsFunc ¶
func NewOpenAICompatibleEmbeddingsFunc(baseUrl string, apiKey string, model string, prefix string) EmbeddingsFunc
NewOpenAICompatibleEmbeddingsFunc creates an EmbeddingsFunc that uses an OpenAI compatible API to generate embeddings for text.
func NewOpenAIEmbeddings ¶
func NewOpenAIEmbeddings(apiKey string, model string) EmbeddingsFunc
NewOpenAIEmbeddings creates an EmbeddingsFunc that uses the OpenAI API to generate embeddings for text. You must provide your OpenAI API key and the name of the model you want to use for generating embeddings (e.g. "text-embedding-3-small").
type Option ¶
type Option func(*settings)
func WithEmbeddings ¶
func WithEmbeddings(f EmbeddingsFunc) Option
WithEmbeddings sets the same embedding function for both search and store operations.
func WithMaxChunkSize ¶
WithMaxChunkSize sets the maximum chunk size for document processing. This can be used to control how documents are split into smaller pieces for embeddings.
func WithNomicEmbedTextV2Model ¶
WithNomicEmbedTextV2Model is a convenience option for using the Nomic Embed Text V2 model for both search and store embeddings. It takes the base URL, API key, and model name as parameters and sets up the embedding functions accordingly.
func WithPreprocess ¶
WithPreprocess sets the same preprocessing function for both search and store operations.
func WithSearchEmbeddings ¶
func WithSearchEmbeddings(f EmbeddingsFunc) Option
WithSearchEmbeddings sets the embedding function for search operations.
func WithSearchPreprocess ¶
WithSearchPreprocess sets the preprocessing function for search operations.
func WithSplitFunc ¶
WithSplitFunc sets the function used to split documents into chunks for embedding generation. This allows for customization of how documents are divided into smaller pieces, which can be important for handling long documents or optimizing embedding quality. If you for example process code, it might be beneficial to split on syntax elements rather than just by character count and newlines.
func WithStoreEmbeddings ¶
func WithStoreEmbeddings(f EmbeddingsFunc) Option
WithStoreEmbeddings sets the embedding function for store operations.
func WithStorePreprocess ¶
WithStorePreprocess sets the preprocessing function for store operations.
type SplitFunc ¶
SplitFunc is a function type that takes a string input and a maximum chunk size, and returns a slice of strings representing the split chunks of the input text. This can be used to customize how documents are divided into smaller pieces for embedding generation.
type Vector ¶
type Vector struct {
// contains filtered or unexported fields
}
FIXME: Make this private.
func NewVector ¶
NewVector creates a new Vector instance with the given vector and document ID. The vector is normalized to ensure consistent cosine similarity calculations. Please note that the input vector is modified in place for normalization, so if you need to keep the original vector, make a copy before calling this function.