vecdb

command module
v0.0.0-...-194320a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 8, 2024 License: MIT Imports: 10 Imported by: 0

README

VecDB

a very simple vector embedding database, you can say that it is a hash-table that let you find items similar to the item you're searching for.

Why!

I'm a databases enthusiast, and this is a for fun and learning project that could be used in production ;).

P.S: I like to re-invent the wheel in my free time, because it is my free time!

Data Model

I'm using the {key => value} model,

  • key should be a unique value that represents the item.
  • value should be the vector itself (List of Floats).

Configurations

by default vecdb searches for config.yml in the current working directory. but you can override it using the --config /path/to/config.yml flag by providing your own custom file path.

# http server related configs
server:
  # the address to listen on in the form of '[host]:port'
  listen: "0.0.0.0:3000"

# storage related configs
store:
  # the driver you want to use
  # currently vecdb supports "bolt" which is based on boltdb the in process embedded the database
  driver: "bolt"
  # the arguments required by the driver
  # for bolt, it requires a key called `database` points to the path you want to store the data in.
  args:
    database: "./vec.db"

# embeddings related configs
embedder:
  # whether to enable the embedder and all endpoints using it or not
  enabled: true
  # the driver you want to use, currently vecdb supports gemini
  driver: gemini
  # the arguments required by the driver
  # currently gemini driver requires `api_key` and `text_embedding_model`
  args:
    # by default vecdb will replace anything between ${..} with the actual value from the ENV var
    api_key: "${GEMINI_API_KEY}"
    text_embedding_model: "text-embedding-004"

Components

  • Raw Vectors Layer (low-level)
    • send VectorWriteRequest to POST /v1/vectors/write when you have a vector and want to store it somewhere.
    • send VectorSearchRequest to POST /v1/vectors/search when you have a vector and want to list all similar vectors' keys/ids ordered by cosine similarity in descending order.
  • Embedding Layer (optional)
    • send TextEmbeddingWriteRequest to POST /v1/embeddings/text/write when you have a text and want vecdb to build and store the vector for you using the configured embedder (gemini for now).
    • send TextEmbeddingSearchRequest to POST /v1/embeddings/text/search when you have a text and want vecdb to build a vector and search for similar vectors' keys for you ordered by cosine similarity in descending order.

Requests

VectorWriteRequest
{
  "bucket": "BUCKET_NAME", // consider it a collection or a table
  "key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)
  "vector": [1.929292, 0.3848484, -1.9383838383, ... ] // the vector you want to store 
}
VectorSearchRequest
{
  "bucket": "BUCKET_NAME", // consider it a collection or a table
  "vector": [1.929292, 0.3848484, -1.9383838383, ... ], // you will get a list ordered by cosine-similarity in descending order
  "min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get
  "max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)
}
TextEmbeddingWriteRequest

if you set embedder.enabled to true.

{
  "bucket": "BUCKET_NAME", // consider it a collection or a table
  "key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)
  "content": "This is some text representing the product" // this will be converted to a vector using the configured embedder 
}
TextEmbeddingSearchRequest

if you set embedder.enabled to true.

{
  "bucket": "BUCKET_NAME", // consider it a collection or a table
  "content": "A Product Text", // you will get a list ordered by cosine-similarity in descending order
  "min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get
  "max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)
}

Download/Install

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
internals

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL