index

package
v0.0.0-...-ec03379 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 20, 2026 License: MIT Imports: 16 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func SaveCache

func SaveCache(path string, entries []CacheEntry, modelID, sentexVersion string, dims int) error

SaveCache writes entries to path atomically (temp file + rename). Every save is a full rewrite of the cache.

func Similarity

func Similarity(a, b string) float64

Similarity computes the Jaccard similarity of the trigram sets of two strings.

func Trigrams

func Trigrams(term string) []string

Trigrams returns the set of unique 3-character sliding windows for term. For terms shorter than 3 characters, returns the term itself as a single element.

Types

type BM25

type BM25 struct {
	// contains filtered or unexported fields
}

BM25 is a weighted-field BM25 inverted index.

func NewBM25

func NewBM25() *BM25

NewBM25 creates an empty BM25 index.

func (*BM25) Add

func (b *BM25) Add(page pages.Page)

Add indexes a page, replacing any existing entry with the same name.

func (*BM25) Remove

func (b *BM25) Remove(name string)

Remove deletes a page from the index by name.

func (*BM25) Search

func (b *BM25) Search(query string, limit int) []SearchResult

Search returns up to limit pages ranked by BM25 score for query.

type CacheEntry

type CacheEntry struct {
	PageName    string
	ContentHash string
	Chunks      []CachedChunk
}

CacheEntry holds the cached embeddings for a single page.

func LoadCache

func LoadCache(path string, modelID, sentexVersion string, dims int) ([]CacheEntry, error)

LoadCache reads the cache from path. Returns empty slice (no error) when:

  • the file does not exist (first run)
  • any header field mismatches (stale cache — model/version/dims changed)

Returns nil + error when the file exists but cannot be decoded (corrupt).

type CachedChunk

type CachedChunk struct {
	StartLine int
	EndLine   int
	Vector    []float32
}

CachedChunk holds the embedding vector and line range for a single chunk.

type Chunk

type Chunk struct {
	Text      string // chunk content, prefixed with the page's # Title line
	StartLine int    // 1-indexed start line in the original page content
	EndLine   int    // 1-indexed end line in the original page content (inclusive)
}

Chunk represents a semantically-meaningful portion of a page with line-range tracking for search result anchoring.

func ChunkPage

func ChunkPage(page pages.Page) []Chunk

ChunkPage splits a page into semantically-meaningful chunks using section headings as the primary split strategy, falling back to paragraph breaks when no headings are present. Every chunk is prefixed with the page's "# Title" line. Chunks whose body content is below minChunkTokens tokens are merged with an adjacent chunk.

type Graph

type Graph struct {
	// contains filtered or unexported fields
}

Graph is a bidirectional wikilink graph.

func NewGraph

func NewGraph() *Graph

NewGraph creates an empty Graph.

func (*Graph) Add

func (g *Graph) Add(page pages.Page)

Add adds or replaces a page and its outbound wikilinks in the graph. If the page was previously indexed, old link relationships are cleaned up first.

func (*Graph) LinkedFrom

func (g *Graph) LinkedFrom(name string) []string

LinkedFrom returns the canonical names of pages that link to the given target.

func (*Graph) LinksTo

func (g *Graph) LinksTo(name string) []string

LinksTo returns the canonical names of pages that the given page links to, in the order the links appear in the page source.

func (*Graph) Remove

func (g *Graph) Remove(name string)

Remove removes a page and cleans up all its outbound link relationships.

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index is the composite search index combining BM25, trigram fuzzy matching, a bidirectional wikilink graph for link-boost, and an optional vector index.

func NewIndex

func NewIndex(model *embed.Model, cachePath string) *Index

NewIndex creates an empty composite Index. When model is non-nil a VectorIndex is created and wired in; when nil the index behaves as BM25 + trigram + graph only. cachePath is the path to the .memento-vectors sidecar file used for embedding write-through. An empty cachePath disables cache persistence.

func (*Index) Add

func (ix *Index) Add(page pages.Page)

Add indexes a page, replacing any existing entry with the same name. When a cachePath is set and a model is available, the resulting chunk embeddings are written through to the cache file.

func (*Index) AddFromCache

func (ix *Index) AddFromCache(page pages.Page, entry CacheEntry)

AddFromCache indexes a page using pre-computed chunk vectors from the given CacheEntry, bypassing the (expensive) embedding step. The BM25, graph, and trigram sub-indexes are updated exactly as in Add.

func (*Index) LinkedFrom

func (ix *Index) LinkedFrom(name string) []string

LinkedFrom returns the canonical names of pages that link to the given page. It delegates to the underlying graph.

func (*Index) LinksTo

func (ix *Index) LinksTo(name string) []string

LinksTo returns the canonical names of pages that the given page links to. It delegates to the underlying graph.

func (*Index) Remove

func (ix *Index) Remove(name string)

Remove removes a page from all sub-indexes. When a cachePath is set, the cache file is updated (write-through).

func (*Index) Search

func (ix *Index) Search(query string, limit int) []Result

Search executes the full search pipeline and returns up to limit results.

Pipeline (with vector model):

BM25 + vector cosine search → normalize & merge → graph boost → relevance threshold.

Pipeline (nil model, backward-compatible):

BM25 → trigram fallback if <3 results → graph boost → relevance threshold.

type Result

type Result struct {
	Page     string
	Score    float64
	Snippet  string
	Line     int
	IsDirect bool // true if the page matched the query directly (BM25), false if graph-boosted only
}

Result is a single result from the composite Index search.

type SearchResult

type SearchResult struct {
	Name  string
	Score float64
}

SearchResult is a single result from a BM25 search.

type Trigram

type Trigram struct {
	// contains filtered or unexported fields
}

Trigram is an in-memory fuzzy-match index based on trigram Jaccard similarity.

func NewTrigram

func NewTrigram() *Trigram

NewTrigram creates an empty trigram index.

func (*Trigram) Add

func (ti *Trigram) Add(term string)

Add adds a term to the trigram index.

func (*Trigram) FuzzyMatch

func (ti *Trigram) FuzzyMatch(query string, threshold float64) []string

FuzzyMatch returns all indexed terms whose Jaccard similarity with query is at or above threshold.

type VectorIndex

type VectorIndex struct {
	// contains filtered or unexported fields
}

VectorIndex stores per-chunk embeddings and supports Add/Remove/Search with case-insensitive page-name matching.

func NewVectorIndex

func NewVectorIndex(model *embed.Model) *VectorIndex

NewVectorIndex creates an empty vector index backed by the given embedding model.

func (*VectorIndex) Add

func (vi *VectorIndex) Add(page pages.Page) error

Add chunks the page, embeds each chunk, and stores the resulting vectors. If the page was previously indexed its old chunks are replaced.

func (*VectorIndex) AddFromCache

func (vi *VectorIndex) AddFromCache(page pages.Page, chunks []CachedChunk)

AddFromCache loads pre-computed chunk vectors for a page, bypassing embedding. If the page was previously indexed its old chunks are replaced.

func (*VectorIndex) Remove

func (vi *VectorIndex) Remove(name string)

Remove removes all stored chunks for the named page (case-insensitive).

func (*VectorIndex) Search

func (vi *VectorIndex) Search(query string, limit int) []VectorResult

Search embeds the query, scores all stored chunk vectors via cosine similarity, deduplicates to one result per page (best-scoring chunk wins), then returns up to limit results sorted by score descending. Returns nil when the index is empty.

type VectorResult

type VectorResult struct {
	Page  string
	Score float64 // cosine similarity, range [-1, 1] but typically [0, 1] for text
	Line  int     // 1-indexed start line of the best-matching chunk
}

VectorResult is a single result from a vector similarity search.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL