rag

package
v0.249.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 22, 2026 License: MIT Imports: 15 Imported by: 0

Documentation

Overview

Package rag provides the RemembrancesService that bundles KB, Events, and Code indexing.

Package rag provides RAG (Retrieval-Augmented Generation) storage and search on top of the existing Pando SQLite database.

It supports:

  • Chunked document storage with JSON metadata
  • Vector similarity search (top-k ANN) via sqlite-vec / vec0 virtual tables
  • Full-text search (BM25) via SQLite FTS5
  • Hybrid search combining both with Reciprocal Rank Fusion

Prerequisites:

  • The sqlite-vec extension must be loaded before any Store is used. This is handled automatically by internal/db.Connect(), which calls sqlite_vec.Auto().
  • Call Store.Init after obtaining a *sql.DB to create the vec0 virtual table.

Index

Constants

View Source
const DefaultEmbeddingDim = 1536

DefaultEmbeddingDim is the default vector dimension (OpenAI text-embedding-ada-002 / -3-small).

Variables

This section is empty.

Functions

This section is empty.

Types

type Chunk

type Chunk struct {
	// ID is the database row ID (set after insert, zero for new chunks).
	ID int64
	// Collection groups related chunks together (e.g. project path, session ID).
	Collection string
	// Source identifies the origin document (file path, URL, …).
	Source string
	// Content is the raw text of this chunk.
	Content string
	// ChunkIndex is the zero-based position of this chunk inside its source document.
	ChunkIndex int
	// Metadata is an arbitrary JSON string stored alongside the chunk.
	Metadata  string
	CreatedAt time.Time
	UpdatedAt time.Time
}

Chunk is a unit of text stored in the RAG system.

type ContextEnricher added in v0.236.1

type ContextEnricher struct {
	// contains filtered or unexported fields
}

ContextEnricher performs pre-prompt KB, events and code searches and formats the results as context to be prepended to the user's message.

func NewContextEnricher added in v0.236.1

func NewContextEnricher(svc *RemembrancesService, kbResults, codeResults int, codeProject string, eventsResults int, eventsSubject string, eventsLastDays int) *ContextEnricher

NewContextEnricher creates a ContextEnricher from the given RemembrancesService and config values. Returns nil when the service is nil.

func (*ContextEnricher) EnrichContext added in v0.236.1

func (e *ContextEnricher) EnrichContext(ctx context.Context, query string) string

EnrichContext searches the KB and code index using the user's query and returns a formatted context block ready to be prepended to the prompt. Returns an empty string when no relevant results are found.

type HybridSearchOptions added in v0.210.0

type HybridSearchOptions struct {
	Query           string
	Limit           int
	ProjectIDs      []string
	IncludeKB       bool
	IncludeSessions bool
	IncludeCode     bool
}

type HybridSearchResult added in v0.210.0

type HybridSearchResult struct {
	Source     string                 `json:"source"`
	Title      string                 `json:"title,omitempty"`
	Path       string                 `json:"path,omitempty"`
	Content    string                 `json:"content"`
	Score      float64                `json:"score"`
	Rank       int                    `json:"rank"`
	Metadata   map[string]interface{} `json:"metadata,omitempty"`
	SessionID  string                 `json:"session_id,omitempty"`
	ProjectID  string                 `json:"project_id,omitempty"`
	FilePath   string                 `json:"file_path,omitempty"`
	SymbolName string                 `json:"symbol_name,omitempty"`
}

type RemembrancesService added in v0.7.0

type RemembrancesService struct {
	KB     *kb.KBStore
	Events *events.EventStore
	Code   *code.CodeIndexer
	// contains filtered or unexported fields
}

RemembrancesService groups KB, Events, and Code indexing stores. All components share the same SQLite database connection and use provider-configured embedders.

func NewRemembrancesService added in v0.7.0

func NewRemembrancesService(db *sql.DB, cfg *config.RemembrancesConfig) (*RemembrancesService, error)

NewRemembrancesService creates a RemembrancesService from the app configuration and an existing SQLite connection. Returns nil (no error) when remembrances is disabled.

func (*RemembrancesService) HybridSearch added in v0.210.0

type SearchOptions

type SearchOptions struct {
	// Collection restricts results to a single collection. Empty means all collections.
	Collection string
	// TopK is the maximum number of results to return. Defaults to 5.
	TopK int
	// MinScore excludes results with Score below this value (0 means no filter).
	MinScore float64
}

SearchOptions controls search behaviour.

type SearchResult

type SearchResult struct {
	Chunk
	// Distance is the L2 (or cosine) distance from the query vector (lower = more similar).
	// Populated only by SearchVector and SearchHybrid.
	Distance float64
	// Score is a normalised relevance score in [0, 1] where higher is better.
	// For vector search:  1 / (1 + Distance).
	// For FTS search:     normalised BM25 (−bm25 / max(−bm25) across the result set).
	// For hybrid search:  Reciprocal Rank Fusion score.
	Score float64
	// Rank is the 1-based position in the result list.
	Rank int
}

SearchResult is a ranked entry returned from vector, FTS, or hybrid search.

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store manages RAG chunks, embeddings, and full-text search on a SQLite database.

Vector embeddings are stored as little-endian float32 BLOBs in the rag_chunks table and similarity search is performed in Go – no SQLite extension is required. Full-text search uses SQLite's built-in FTS5 engine.

The Store does NOT own the *sql.DB – callers are responsible for opening and closing the database via internal/db.Connect().

Usage:

db, err := db.Connect()
store := rag.New(db, rag.StoreOptions{EmbeddingDim: 1536})
if err := store.Init(ctx); err != nil { … }

func New

func New(db *sql.DB, opts StoreOptions) *Store

New creates a Store backed by db. Call Init before any other method.

func (*Store) CountChunks

func (s *Store) CountChunks(ctx context.Context, collection string) (int64, error)

CountChunks returns the total number of chunks (all when collection="").

func (*Store) DeleteChunk

func (s *Store) DeleteChunk(ctx context.Context, id int64) error

DeleteChunk removes a chunk and its FTS entry. It is a no-op when the chunk does not exist.

func (*Store) DeleteCollection

func (s *Store) DeleteCollection(ctx context.Context, collection string) error

DeleteCollection removes all chunks (and their FTS entries) that belong to the given collection.

func (*Store) Dim

func (s *Store) Dim() int

Dim returns the configured embedding dimension.

func (*Store) GetChunk

func (s *Store) GetChunk(ctx context.Context, id int64) (*Chunk, error)

GetChunk retrieves a chunk by ID. Returns (nil, nil) when not found.

func (*Store) GetChunkEmbedding

func (s *Store) GetChunkEmbedding(ctx context.Context, id int64) ([]float32, error)

GetChunkEmbedding retrieves the stored embedding for a chunk. Returns (nil, nil) when the chunk has no embedding.

func (*Store) Init

func (s *Store) Init(ctx context.Context) error

Init validates the configured embedding dimension against the stored one and persists it on first use. It is idempotent.

func (*Store) InsertChunk

func (s *Store) InsertChunk(ctx context.Context, chunk Chunk, embedding []float32) (int64, error)

InsertChunk inserts a new chunk and optionally its embedding vector.

Pass a nil or empty embedding to store a text-only chunk (participates in FTS search but not in vector search).

Returns the auto-assigned chunk ID.

func (*Store) ListChunks

func (s *Store) ListChunks(ctx context.Context, collection string, limit, offset int) ([]Chunk, error)

ListChunks returns paginated chunks for a collection (all when collection="").

func (*Store) RebuildFTS

func (s *Store) RebuildFTS(ctx context.Context) error

RebuildFTS rebuilds the FTS5 index from the rag_chunks content table. Use this to recover from index corruption or after bulk inserts that bypassed the normal InsertChunk / UpdateChunk path.

func (*Store) SearchFTS

func (s *Store) SearchFTS(ctx context.Context, query string, opts SearchOptions) ([]SearchResult, error)

SearchFTS performs a full-text search using SQLite FTS5 (BM25 ranking).

query follows FTS5 syntax: plain terms, phrase queries ("…"), prefix queries (term*), column filters (content:term), boolean operators (AND / OR / NOT).

Results are ordered by descending relevance with scores normalised to [0, 1].

func (*Store) SearchHybrid

func (s *Store) SearchHybrid(ctx context.Context, query string, embedding []float32, opts SearchOptions) ([]SearchResult, error)

SearchHybrid combines vector and full-text search using Reciprocal Rank Fusion (RRF). Both searches run concurrently.

Pass a nil embedding to skip vector search (equivalent to SearchFTS).

RRF formula: score(d) = Σ 1 / (rrfK + rank(d, list)) where rrfK = 60.

func (*Store) SearchVector

func (s *Store) SearchVector(ctx context.Context, embedding []float32, opts SearchOptions) ([]SearchResult, error)

SearchVector performs an exact k-NN search by loading all embeddings from the collection and computing cosine similarity in Go.

This approach requires no SQLite extension and is suitable for typical RAG workloads (up to ~100k chunks). Chunks without embeddings are skipped.

Results are ordered by descending cosine similarity (most similar first).

func (*Store) UpdateChunk

func (s *Store) UpdateChunk(ctx context.Context, id int64, chunk Chunk, embedding []float32) error

UpdateChunk replaces the text, metadata, and optionally the embedding of an existing chunk. Pass nil embedding to leave the stored vector unchanged.

type StoreOptions

type StoreOptions struct {
	// EmbeddingDim is the dimensionality of the embedding vectors.
	// Must match the dimension used when the vec0 table was first created.
	// Defaults to DefaultEmbeddingDim (1536).
	EmbeddingDim int
}

StoreOptions configures a new Store.

Directories

Path Synopsis
Package code provides tree-sitter based code indexing with semantic search.
Package code provides tree-sitter based code indexing with semantic search.
Package embeddings provides a unified interface for generating embeddings from multiple providers (OpenAI, Google, Ollama, Anthropic/Voyage).
Package embeddings provides a unified interface for generating embeddings from multiple providers (OpenAI, Google, Ollama, Anthropic/Voyage).
Package events provides temporal event storage with semantic search capabilities.
Package events provides temporal event storage with semantic search capabilities.
Package kb provides a knowledge base system for storing and searching documents.
Package kb provides a knowledge base system for storing and searching documents.
Package treesitter provides AST walking and symbol extraction utilities.
Package treesitter provides AST walking and symbol extraction utilities.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL