Documentation
¶
Overview ¶
Package dory provides a retrieval intelligence library for Go.
Dory is organized around a pipeline of composable, interface-driven stages:
- Chunking — split documents into retrievable units
- Embedding — transform text into vector representations
- Indexing — store chunks in a searchable backend
- Retrieval — find the most relevant units for a query
- Reranking — reorder candidates by cross-encoder relevance
- Authorization — filter results by what the caller is allowed to see
- Evaluation — measure retrieval quality with quantitative metrics
Every stage is expressed as a Go interface. Dory ships with concrete implementations for each, but any implementation of the interface works — the library has no opinion about which vector store, embedding model, or authorization backend you use.
The canonical entry point for most users is Pipeline, which wires the stages together into a single coherent retrieval flow.
Index ¶
- type Action
- type AuthorizationMode
- type Authorizer
- type BytesContent
- type CheckRequest
- type Chunk
- func (c *Chunk) AsText() string
- func (c *Chunk) ID() string
- func (c *Chunk) MarshalJSON() ([]byte, error)
- func (c *Chunk) Metadata() map[string]any
- func (c *Chunk) Score() float64
- func (c *Chunk) Scores() []ScoreEntry
- func (c *Chunk) SourceDocumentID() string
- func (c *Chunk) SourceURI() string
- func (c *Chunk) Text() string
- func (c *Chunk) UnmarshalJSON(data []byte) error
- func (c *Chunk) WithScore(stage string, score float64) RetrievedUnit
- type Content
- type Document
- func (d *Document) Content() Content
- func (d *Document) CreatedAt() time.Time
- func (d *Document) Fingerprint() [32]byte
- func (d *Document) ID() string
- func (d *Document) Language() string
- func (d *Document) Metadata() map[string]any
- func (d *Document) SourceURI() string
- func (d *Document) TenantID() string
- func (d *Document) UpdatedAt() time.Time
- type DocumentOption
- type Embedder
- type EvalMetrics
- type EvalResult
- type Evaluator
- type FilterOp
- type FilterRequest
- type GraphFact
- func (g *GraphFact) AsText() string
- func (g *GraphFact) ID() string
- func (g *GraphFact) MarshalJSON() ([]byte, error)
- func (g *GraphFact) Metadata() map[string]any
- func (g *GraphFact) Score() float64
- func (g *GraphFact) Scores() []ScoreEntry
- func (g *GraphFact) SourceDocumentID() string
- func (g *GraphFact) SourceURI() string
- func (g *GraphFact) UnmarshalJSON(data []byte) error
- func (g *GraphFact) WithScore(stage string, score float64) RetrievedUnit
- type Hook
- type MetadataFilter
- type Pipeline
- type PipelineConfig
- type Position
- type Query
- type ReaderContent
- type Reranker
- type Resource
- type ResourceSet
- type RetrievedUnit
- type Retriever
- type ScoreEntry
- type ScoredChunk
- type SearchRequest
- type Splitter
- type StringContent
- type StructuredRow
- func (s *StructuredRow) AsText() string
- func (s *StructuredRow) ID() string
- func (s *StructuredRow) MarshalJSON() ([]byte, error)
- func (s *StructuredRow) Metadata() map[string]any
- func (s *StructuredRow) Score() float64
- func (s *StructuredRow) Scores() []ScoreEntry
- func (s *StructuredRow) SourceDocumentID() string
- func (s *StructuredRow) SourceURI() string
- func (s *StructuredRow) UnmarshalJSON(data []byte) error
- func (s *StructuredRow) WithScore(stage string, score float64) RetrievedUnit
- type Subject
- type TestCase
- type UnitEnvelope
- type UnitType
- type VectorStore
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Action ¶
type Action string
Action describes what the subject wants to do with the resource.
const ( // ActionRead is the action checked for retrieval. Most RAG systems // only need this single action. ActionRead Action = "read" )
type AuthorizationMode ¶
type AuthorizationMode int
AuthorizationMode controls where in the pipeline authorization is enforced.
const ( // PostFilter retrieves candidates first, then authorizes each result. // This is the safe default: correct regardless of metadata staleness. PostFilter AuthorizationMode = iota // PreFilter passes authorization constraints to the VectorStore as // metadata filters before the similarity search runs. // Faster, but requires keeping chunk metadata in sync with the // authorization system when permissions change. PreFilter // Hybrid applies tenant isolation as a pre-filter and fine-grained // per-document authorization as a post-filter. Hybrid )
type Authorizer ¶
type Authorizer interface {
// Check answers: can this subject perform this action on this resource?
Check(ctx context.Context, req CheckRequest) (bool, error)
// Filter answers: which of these resources (or all resources if
// Candidates is nil) can this subject access?
Filter(ctx context.Context, req FilterRequest) (ResourceSet, error)
}
Authorizer is the authorization backend interface. OpenFGA, Casbin, simple allowlists, and the NoopAuthorizer all implement this.
type BytesContent ¶
type BytesContent struct {
// contains filtered or unexported fields
}
BytesContent is a Content backed by raw bytes. Use this for binary formats like PDF or images where the content has not yet been extracted to text.
func BinaryContent ¶
func BinaryContent(data []byte, mimeType string) *BytesContent
BinaryContent creates a BytesContent with the given data and mime type.
func (*BytesContent) MimeType ¶
func (b *BytesContent) MimeType() string
func (*BytesContent) Reader ¶
func (b *BytesContent) Reader() (io.ReadCloser, error)
func (*BytesContent) Size ¶
func (b *BytesContent) Size() int64
func (*BytesContent) Text ¶
func (b *BytesContent) Text() (string, error)
type CheckRequest ¶
CheckRequest is the input to a single authorization check.
type Chunk ¶
type Chunk struct {
// Vector is the dense embedding of this chunk's text.
// Nil until the embedder processes this chunk.
Vector []float32
// Position describes where in the source document this chunk came from.
Position *Position
// TokenCount is the number of tokens in this chunk's text,
// computed by the Splitter at creation time. Zero if not computed.
TokenCount int
// ParentID, if non-empty, points to the larger parent chunk
// this chunk was derived from (small-to-big retrieval).
ParentID string
// WindowText, if non-empty, is the surrounding sentence window.
// When set, AsText returns this instead of the raw chunk text.
WindowText string
// ContextPrefix is a short LLM-generated sentence that situates
// this chunk within its source document.
ContextPrefix string
// contains filtered or unexported fields
}
Chunk is the concrete RetrievedUnit for text-based retrieval strategies: vector search, sparse search, hybrid search, and their variants.
func NewChunk ¶
NewChunk constructs a Chunk with the required identity fields.
Example ¶
package main
import (
"fmt"
"github.com/i33ym/dory"
)
func main() {
chunk := dory.NewChunk("chunk-1", "doc-1", "The quick brown fox.", nil)
fmt.Println(chunk.ID())
fmt.Println(chunk.AsText())
}
Output: chunk-1 The quick brown fox.
func NewChunkWithOptions ¶
func NewChunkWithOptions(id, sourceDocID, text string, metadata map[string]any, sourceURI string, pos *Position, tokenCount int) *Chunk
NewChunkWithOptions constructs a Chunk with additional fields.
func (*Chunk) MarshalJSON ¶
func (*Chunk) Scores ¶
func (c *Chunk) Scores() []ScoreEntry
func (*Chunk) SourceDocumentID ¶
func (*Chunk) UnmarshalJSON ¶
type Content ¶
type Content interface {
// Reader returns the content as a stream of bytes.
// Callers are responsible for closing the reader.
Reader() (io.ReadCloser, error)
// Text returns the content as a UTF-8 string, if possible.
// Returns an error if the content is binary or not yet extracted.
// Splitters call this — they work on text, not bytes.
Text() (string, error)
// MimeType describes the format of the content.
MimeType() string
// Size returns the content length in bytes, or -1 if unknown.
Size() int64
}
Content is the raw material of a Document. It abstracts over text, binary, and streaming content so that Dory's pipeline can handle each appropriately.
type Document ¶
type Document struct {
// contains filtered or unexported fields
}
Document is the ingestion unit — a raw source of knowledge before it has been chunked or indexed. A document carries its content, its identity, and the metadata the authorizer and retriever will consult later.
Documents are created via NewDocument, which validates required fields and computes a content fingerprint for change detection.
func NewDocument ¶
func NewDocument(id string, content Content, opts ...DocumentOption) (*Document, error)
NewDocument constructs a validated Document. Returns an error if the document cannot be used by Dory's pipeline — for example, if the ID is empty or the content is nil.
Example ¶
package main
import (
"fmt"
"github.com/i33ym/dory"
)
func main() {
content := dory.TextContent("Hello, Dory!", "text/plain")
doc, err := dory.NewDocument("doc-1", content,
dory.WithTenantID("acme"),
dory.WithLanguage("en"),
dory.WithMetadata("author", "alice"),
)
if err != nil {
panic(err)
}
fmt.Println(doc.ID())
fmt.Println(doc.TenantID())
fmt.Println(doc.Language())
fmt.Println(doc.Metadata()["author"])
}
Output: doc-1 acme en alice
func (*Document) Fingerprint ¶
Fingerprint returns the SHA-256 hash of this document's content. If two Documents have the same ID and the same Fingerprint, re-ingestion can be skipped safely.
func (*Document) SourceURI ¶
SourceURI returns the canonical location of this document's original source.
type DocumentOption ¶
DocumentOption configures a Document at construction time.
func WithLanguage ¶
func WithLanguage(tag string) DocumentOption
WithLanguage sets the BCP-47 language tag for this document's content. Used by sentence-aware chunking strategies to apply the correct sentence boundary detection rules. Defaults to "en" if not set.
func WithMetadata ¶
func WithMetadata(key string, value any) DocumentOption
WithMetadata attaches a key-value pair to the document's metadata.
func WithSourceURI ¶
func WithSourceURI(uri string) DocumentOption
WithSourceURI sets the canonical source location for this document. Examples: "s3://bucket/path/to/file.pdf", "https://docs.example.com/api".
func WithTenantID ¶
func WithTenantID(id string) DocumentOption
WithTenantID sets the tenant this document belongs to.
func WithTimestamps ¶
func WithTimestamps(createdAt, updatedAt time.Time) DocumentOption
WithTimestamps overrides the default creation and update timestamps.
type Embedder ¶
type Embedder interface {
// Embed returns the vector representation of the given text.
Embed(ctx context.Context, text string) ([]float32, error)
// EmbedBatch embeds multiple texts in a single call.
// Implementations that do not support native batching should
// loop over Embed internally. Callers should prefer EmbedBatch
// during ingestion to reduce API round-trips and cost.
EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
// Dimensions returns the dimensionality of the vectors this embedder
// produces. The vector store needs this at collection creation time.
Dimensions() int
}
Embedder transforms text into a dense vector representation. The library is agnostic about which model or provider is used — any implementation of this interface is interchangeable.
type EvalMetrics ¶
type EvalMetrics struct {
// ContextPrecision measures what fraction of retrieved chunks
// were actually relevant to the question.
ContextPrecision *float64
// ContextRecall measures what fraction of the information needed
// to answer the question was present in the retrieved chunks.
ContextRecall *float64
// Faithfulness measures whether the generated answer is supported
// by the retrieved context rather than the model's parametric knowledge.
Faithfulness *float64
// AnswerRelevance measures whether the generated answer actually
// addresses what the question asked.
AnswerRelevance *float64
}
EvalMetrics holds the computed scores for a single test case. All scores are in the range [0.0, 1.0]. A nil pointer means the metric was not requested or could not be computed.
type EvalResult ¶
type EvalResult struct {
TestCase TestCase
RetrievedUnits []RetrievedUnit
GeneratedAnswer string
Metrics EvalMetrics
}
EvalResult captures the full output of evaluating one TestCase.
type Evaluator ¶
type Evaluator interface {
Evaluate(ctx context.Context, cases []TestCase) ([]EvalResult, error)
}
Evaluator runs a retrieval pipeline against a set of test cases and produces scored results for each.
type FilterOp ¶
type FilterOp string
FilterOp is the comparison operator in a MetadataFilter.
const ( // FilterOpEq matches documents where the field equals the value exactly. FilterOpEq FilterOp = "eq" // FilterOpIn matches documents where the field equals any value in the list. FilterOpIn FilterOp = "in" // FilterOpAnyOf matches documents where a metadata array field contains // any value from the list. Used for multi-value fields like role lists. FilterOpAnyOf FilterOp = "any_of" )
type FilterRequest ¶
type FilterRequest struct {
Subject Subject
Action Action
// Candidates, if non-nil, restricts the check to this set of resources.
// If nil, implementations should return ALL authorized resources for
// the subject — used for the pre-filter path.
Candidates []Resource
}
FilterRequest is the input to a bulk authorization filter.
type GraphFact ¶
type GraphFact struct {
Subject string
Predicate string
Object string
// contains filtered or unexported fields
}
GraphFact is the concrete RetrievedUnit for graph retrieval. It represents a single fact extracted from the knowledge graph: a subject, a predicate (relationship type), and an object.
func NewGraphFact ¶
func NewGraphFact(id, sourceDocID, subject, predicate, object string, metadata map[string]any) *GraphFact
NewGraphFact constructs a GraphFact with the required identity fields.
func (*GraphFact) MarshalJSON ¶
func (*GraphFact) Scores ¶
func (g *GraphFact) Scores() []ScoreEntry
func (*GraphFact) SourceDocumentID ¶
func (*GraphFact) UnmarshalJSON ¶
type Hook ¶
type Hook struct {
// BeforeIngest is called before documents are ingested.
// Receives the number of documents about to be processed.
BeforeIngest func(ctx context.Context, docCount int)
// AfterIngest is called after documents are ingested.
// Receives the number of chunks produced and any error.
AfterIngest func(ctx context.Context, chunkCount int, err error)
// BeforeRetrieve is called before a retrieval query is executed.
BeforeRetrieve func(ctx context.Context, query Query)
// AfterRetrieve is called after retrieval completes.
// Receives the number of results and any error.
AfterRetrieve func(ctx context.Context, resultCount int, err error)
// BeforeRerank is called before reranking.
BeforeRerank func(ctx context.Context, query string, candidateCount int)
// AfterRerank is called after reranking completes.
AfterRerank func(ctx context.Context, resultCount int, err error)
}
Hook is called at specific points in the pipeline lifecycle. Hooks observe but do not modify pipeline behavior.
func NewLogHook ¶
func NewLogHook() Hook
NewLogHook creates a Hook that logs pipeline events using the standard log package.
type MetadataFilter ¶
type MetadataFilter struct {
Field string
Op FilterOp
Value any // string for Eq; []string for In and AnyOf
}
MetadataFilter is Dory's portable filter expression. It is intentionally minimal — just enough to express tenant isolation and authorization constraints. Each VectorStore implementation translates this into its native query language.
type Pipeline ¶
type Pipeline struct {
// contains filtered or unexported fields
}
Pipeline wires Dory's pipeline stages together into a single coherent retrieval flow: ingest documents, retrieve relevant units, and optionally rerank and authorize the results.
func NewPipeline ¶
func NewPipeline(config PipelineConfig) (*Pipeline, error)
NewPipeline constructs a Pipeline from the given configuration. Returns an error if any required component is nil.
func (*Pipeline) Delete ¶
Delete removes chunks associated with the given document IDs from the vector store.
type PipelineConfig ¶
type PipelineConfig struct {
Splitter Splitter
Embedder Embedder
Store VectorStore
Retriever Retriever
// Reranker, if non-nil, reorders retrieval results for higher precision.
Reranker Reranker
// Authorizer, if non-nil, enforces access control on retrieval results.
Authorizer Authorizer
// AuthMode controls where authorization is enforced. Defaults to PostFilter.
// Hybrid mode is not yet implemented and falls back to PostFilter.
AuthMode AuthorizationMode
// Hooks are called at key points in the pipeline lifecycle.
// Multiple hooks are called in the order they appear in the slice.
Hooks []Hook
}
PipelineConfig holds the components needed to construct a Pipeline. Splitter, Embedder, Store, and Retriever are required; Reranker and Authorizer are optional.
type Position ¶
type Position struct {
// StartByte and EndByte are the byte offsets in the original
// document content. Used for precise deduplication and
// for reconstructing the original context.
StartByte int `json:"start_byte"`
EndByte int `json:"end_byte"`
// Page is the page number in a paginated document (PDF, DOCX).
// Nil for documents without pagination.
Page *int `json:"page,omitempty"`
// Section is the heading path to this chunk's location in a
// structured document. For a Markdown file:
// ["Introduction", "Background", "Prior Work"].
// Nil for unstructured documents.
Section []string `json:"section,omitempty"`
}
Position describes where in the source document a chunk came from.
type Query ¶
type Query struct {
// Text is the raw natural language question from the user.
Text string
// TenantID is mandatory for multi-tenant knowledge bases.
// Retrievers must enforce tenant isolation before any other
// filtering. An empty TenantID is valid only for single-tenant systems.
TenantID string
// Subject is the identity of the caller for authorization checks.
// Passed to the Authorizer when authorization is enabled.
Subject string
// TopK is the maximum number of results the caller wants.
// Retrievers may internally over-fetch (e.g., for reranking)
// but should return at most TopK results.
TopK int
// Filters are additional metadata constraints the caller wants
// applied beyond tenant isolation and authorization.
Filters []MetadataFilter
}
Query carries everything a retriever needs to find relevant units.
type ReaderContent ¶
type ReaderContent struct {
// contains filtered or unexported fields
}
ReaderContent is a Content backed by a lazy reader function. Use this for streaming large files without loading them into memory.
func StreamContent ¶
func StreamContent(open func() (io.ReadCloser, error), mimeType string, size int64) *ReaderContent
StreamContent creates a ReaderContent with the given reader factory. The open function is called each time Reader() is invoked, allowing multiple reads of the same content. Pass size=-1 if the size is unknown.
func (*ReaderContent) MimeType ¶
func (r *ReaderContent) MimeType() string
func (*ReaderContent) Reader ¶
func (r *ReaderContent) Reader() (io.ReadCloser, error)
func (*ReaderContent) Size ¶
func (r *ReaderContent) Size() int64
func (*ReaderContent) Text ¶
func (r *ReaderContent) Text() (string, error)
type Reranker ¶
type Reranker interface {
// Rerank takes the original query text and the candidate units
// returned by the retriever, and returns them in a new order
// with updated scores. The returned slice may be shorter than
// the input if the reranker applies a relevance threshold.
Rerank(ctx context.Context, query string, units []RetrievedUnit) ([]RetrievedUnit, error)
}
Reranker reorders a slice of RetrievedUnits by their relevance to the original query. It operates after initial retrieval, trading latency for precision.
type Resource ¶
type Resource string
Resource identifies a document or chunk for authorization purposes.
type ResourceSet ¶
type ResourceSet struct {
// Resources is the explicit list of authorized resource IDs.
Resources []Resource
// Predicate, if non-nil, can be passed directly to a VectorStore
// to restrict the search space at the database level.
Predicate *MetadataFilter
}
ResourceSet is the result of a FilterRequest.
type RetrievedUnit ¶
type RetrievedUnit interface {
// ID returns a stable unique identifier for this unit.
ID() string
// SourceDocumentID returns the document or resource this unit came from.
SourceDocumentID() string
// SourceURI returns the canonical location of the source document.
// Used for citations and traceability.
SourceURI() string
// AsText returns a natural language representation of this unit
// suitable for injection into an LLM prompt.
AsText() string
// Score returns the most recent relevance score.
Score() float64
// Scores returns the complete scoring history of this unit,
// from initial retrieval through all reranking passes.
Scores() []ScoreEntry
// WithScore returns a copy of this unit with the given score
// appended to the score history. The stage parameter identifies
// which pipeline stage assigned the score.
WithScore(stage string, score float64) RetrievedUnit
// Metadata returns arbitrary key-value pairs attached to this unit.
Metadata() map[string]any
}
RetrievedUnit is the common interface for everything Dory can retrieve, regardless of which retrieval strategy produced it. The pipeline — reranking, authorization, and prompt injection — works exclusively against this interface, remaining agnostic about the concrete type.
func UnwrapUnit ¶
func UnwrapUnit(e UnitEnvelope) (RetrievedUnit, error)
UnwrapUnit recovers a RetrievedUnit from an envelope.
type Retriever ¶
type Retriever interface {
Retrieve(ctx context.Context, q Query) ([]RetrievedUnit, error)
}
Retriever finds the most relevant RetrievedUnits for a Query. All retrieval strategies — vector, sparse, hybrid, graph, structured, web — implement this interface.
type ScoreEntry ¶
type ScoreEntry struct {
// Stage is the name of the pipeline stage that assigned this score.
// Examples: "vector", "bm25", "rrf_fusion", "crossencoder", "final".
Stage string `json:"stage"`
// Score is the relevance score assigned at this stage.
Score float64 `json:"score"`
}
ScoreEntry records a single scoring event in a unit's retrieval history.
type ScoredChunk ¶
ScoredChunk is a Chunk returned from a vector store search, paired with its similarity score.
type SearchRequest ¶
type SearchRequest struct {
// QueryVector is the embedding of the user's (possibly transformed) query.
QueryVector []float32
// TopK is the maximum number of results to return.
TopK int
// Filter, if non-nil, restricts the search to chunks matching
// these metadata conditions. Tenant isolation and pre-filter
// authorization constraints are passed here.
Filter *MetadataFilter
}
SearchRequest bundles everything a VectorStore needs to execute a search.
type Splitter ¶
type Splitter interface {
// Split takes a Document and returns the chunks produced from it.
// Implementations must propagate doc.ID as each chunk's SourceDocumentID
// and doc.Metadata as the base for each chunk's metadata.
Split(ctx context.Context, doc *Document) ([]*Chunk, error)
}
Splitter transforms a Document into a sequence of Chunks. Each concrete implementation in the chunk/ sub-package represents a different strategy for finding good chunk boundaries.
type StringContent ¶
type StringContent struct {
// contains filtered or unexported fields
}
StringContent is a Content backed by a plain UTF-8 string. This is the most common case for pre-extracted text.
func TextContent ¶
func TextContent(text, mimeType string) *StringContent
TextContent creates a StringContent with the given text and mime type. Pass an empty mimeType to default to "text/plain".
Example ¶
package main
import (
"fmt"
"github.com/i33ym/dory"
)
func main() {
c := dory.TextContent("some plain text", "")
text, _ := c.Text()
fmt.Println(text)
fmt.Println(c.MimeType())
fmt.Println(c.Size())
}
Output: some plain text text/plain 15
func (*StringContent) MimeType ¶
func (s *StringContent) MimeType() string
func (*StringContent) Reader ¶
func (s *StringContent) Reader() (io.ReadCloser, error)
func (*StringContent) Size ¶
func (s *StringContent) Size() int64
func (*StringContent) Text ¶
func (s *StringContent) Text() (string, error)
type StructuredRow ¶
type StructuredRow struct {
// Columns preserves the relational structure of the row,
// keyed by column name.
Columns map[string]any
// contains filtered or unexported fields
}
StructuredRow is the concrete RetrievedUnit for structured retrieval — the case where the knowledge base is a database and the retriever executed a generated SQL query.
func NewStructuredRow ¶
func NewStructuredRow(id, sourceDocID string, columns map[string]any, metadata map[string]any) *StructuredRow
NewStructuredRow constructs a StructuredRow with the required identity fields.
func (*StructuredRow) AsText ¶
func (s *StructuredRow) AsText() string
func (*StructuredRow) ID ¶
func (s *StructuredRow) ID() string
func (*StructuredRow) MarshalJSON ¶
func (s *StructuredRow) MarshalJSON() ([]byte, error)
func (*StructuredRow) Metadata ¶
func (s *StructuredRow) Metadata() map[string]any
func (*StructuredRow) Score ¶
func (s *StructuredRow) Score() float64
func (*StructuredRow) Scores ¶
func (s *StructuredRow) Scores() []ScoreEntry
func (*StructuredRow) SourceDocumentID ¶
func (s *StructuredRow) SourceDocumentID() string
func (*StructuredRow) SourceURI ¶
func (s *StructuredRow) SourceURI() string
func (*StructuredRow) UnmarshalJSON ¶
func (s *StructuredRow) UnmarshalJSON(data []byte) error
func (*StructuredRow) WithScore ¶
func (s *StructuredRow) WithScore(stage string, score float64) RetrievedUnit
type TestCase ¶
type TestCase struct {
// ID uniquely identifies this test case for result tracking.
ID string
// Question is the natural language query to evaluate.
Question string
// ReferenceAnswer is a high-quality answer to the question.
// Used to score faithfulness and answer relevance.
ReferenceAnswer string
// RelevantDocumentIDs, if provided, are the document IDs that
// should appear in the retrieved context.
// Used to score context precision and context recall.
RelevantDocumentIDs []string
}
TestCase is a single evaluation example.
type UnitEnvelope ¶
type UnitEnvelope struct {
Type UnitType `json:"type"`
Data json.RawMessage `json:"data"`
}
UnitEnvelope is a serializable wrapper around a RetrievedUnit. It carries a type discriminator so that deserializers know which concrete type to decode into.
func WrapUnit ¶
func WrapUnit(u RetrievedUnit) (UnitEnvelope, error)
WrapUnit packs a RetrievedUnit into a serializable envelope.
type UnitType ¶
type UnitType string
UnitType identifies the concrete type of a serialized RetrievedUnit.
type VectorStore ¶
type VectorStore interface {
// Store persists a set of chunks. Implementations decide how to
// physically store the vector, text, and metadata fields.
Store(ctx context.Context, chunks []*Chunk) error
// Search finds the top-k chunks whose vectors are nearest to the
// query vector, applying any metadata filter before scoring.
Search(ctx context.Context, req SearchRequest) ([]ScoredChunk, error)
// Delete removes chunks by their IDs. Called on re-ingestion
// or when a document is permanently removed.
Delete(ctx context.Context, ids []string) error
}
VectorStore is the persistence and similarity search abstraction. The library never depends on a concrete implementation — only on this contract.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package auth provides authorization backend implementations for Dory.
|
Package auth provides authorization backend implementations for Dory. |
|
Package chunk provides text splitting strategies for Dory.
|
Package chunk provides text splitting strategies for Dory. |
|
Package embed provides embedder implementations for Dory.
|
Package embed provides embedder implementations for Dory. |
|
Package eval provides the evaluation pipeline for Dory.
|
Package eval provides the evaluation pipeline for Dory. |
|
examples
|
|
|
basic_rag
command
basic_rag demonstrates the simplest possible Dory pipeline: fixed-size chunking, OpenAI embeddings, in-memory vector store, and vector retrieval.
|
basic_rag demonstrates the simplest possible Dory pipeline: fixed-size chunking, OpenAI embeddings, in-memory vector store, and vector retrieval. |
|
graph_rag
command
graph_rag demonstrates graph-based retrieval using GraphFact triples.
|
graph_rag demonstrates graph-based retrieval using GraphFact triples. |
|
hybrid_rag
command
hybrid_rag demonstrates hybrid retrieval combining dense vector search with BM25 sparse retrieval, fused via Reciprocal Rank Fusion (RRF).
|
hybrid_rag demonstrates hybrid retrieval combining dense vector search with BM25 sparse retrieval, fused via Reciprocal Rank Fusion (RRF). |
|
with_auth
command
with_auth demonstrates Dory's authorization integration using the Allowlist backend in PostFilter mode.
|
with_auth demonstrates Dory's authorization integration using the Allowlist backend in PostFilter mode. |
|
internal
|
|
|
filter
Package filter provides MetadataFilter translation utilities used internally by VectorStore implementations.
|
Package filter provides MetadataFilter translation utilities used internally by VectorStore implementations. |
|
similarity
Package similarity provides vector similarity calculations used internally by Dory.
|
Package similarity provides vector similarity calculations used internally by Dory. |
|
tokenizer
Package tokenizer provides token counting utilities used internally by Dory's chunking strategies.
|
Package tokenizer provides token counting utilities used internally by Dory's chunking strategies. |
|
Package rerank provides reranking implementations for Dory.
|
Package rerank provides reranking implementations for Dory. |
|
Package retrieve provides retrieval strategy implementations for Dory.
|
Package retrieve provides retrieval strategy implementations for Dory. |
|
Package store provides VectorStore implementations for Dory.
|
Package store provides VectorStore implementations for Dory. |