knowledge

command module
v0.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 3, 2024 License: Apache-2.0 Imports: 2 Imported by: 0

README

Knowledge API

Standalone Knowledge Tool to be used with GPTScript and GPTStudio

Build

Requires Go 1.22+

make build

Run

The knowledge tool can run in two modes: server and client, where client can be standalone or referring to a remote server.

You can find a full gptscript-generated documentation in the CLI documentation.

Client - Standalone
knowledge create-dataset foobar
knowledge ingest -d foobar README.md
knowledge retrieve -d foobar "Which filetypes are supported?"
knowledge delete-dataset foobar
Server & Client - Server Mode
knowledge server
export KNOW_SERVER_URL=http://localhost:8000/v1
knowledge create-dataset foobar
knowledge ingest -d foobar README.md
knowledge retrieve -d foobar "Which filetypes are supported?"
knowledge delete-dataset foobar

Supported File Types

  • .pdf
  • .html
  • .md
  • .txt
  • .docx
  • .odt
  • .rtf
  • .csv
  • .ipynb
  • .json

OpenAPI / Swagger

The API is documented using OpenAPI 2.0 (Swagger), automatically generated using swaggo/swag (make openapi).

GPTScript Examples

Note: The examples in the examples/ directory expect the knowledge binary to be in your $PATH.

Run
gptscript examples/client.gpt

Architecture & Components

The knowledge tool is composed of the following components, which are all run from the same executable:

  • knowledge client
    • can run in two modes:
      • standalone (client/standalone): manages its own vector and knowledge database locally
      • server/remote (client/default): interacts with a knowledge server over the network
  • knowledge server (server)
    • lets you run a REST API server that interacts with the below databases, so the client is stateless and sends/receives data over the network
  • datastore (datastore)
    • responsible for handling data ingestion and retrieval
      • ingestion includes
        • loading documents (extracting text)
        • splitting text into chunks
        • pre-processing (e.g. metadata extraction, content enrichment)
        • requesting embeddings (part of the vectorstore implementation)
        • storing embeddings and metadata in the vector database
        • registering the document in the knowledge database (index)
      • retrieval includes
        • query embedding
        • querying the vector database for embeddings - similarity search
        • mapping the retrieved embeddings to document contents
        • (optional) post-processing (e.g. filtering, sorting, summarization)
        • returning document contents alongside their similarity scores
    • consists of two databases:
      • vector database (vectorstore)
        • Current choice: chromem-go
        • used for storing and retrieving embeddings alongside the document contents
        • the implementation is responsible for
          • requesting the embeddings from a model (e.g. OpenAI's text-embeddings-ada-002)
          • storing the embeddings together with metadata and document contents
          • doing similarity searches to retrieve embeddings
          • returning document contents alongside their similarity scores
      • knowledge database (index
        • Current choice: sqlite3
        • used for
          • indexing knowledge bases (datasets): dataset <(1:n)> files <(1:n)> documents
            • this is useful for deleting specific documents or files from a dataset and to get quick overviews over datasets without having to query the vector database (which holds this information in the metadata)
          • storing knowledge base metadata and e.g. attached ingestion flows

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
pkg
cmd
datastore/postprocessors
Package postprocessors is basically the same as package transformers, but used at a different stage of the RAG pipeline
Package postprocessors is basically the same as package transformers, but used at a different stage of the RAG pipeline
docs
Package docs Code generated by swaggo/swag.
Package docs Code generated by swaggo/swag.
env
llm

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL