search

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 19, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultConfidenceMultiplier = 2

DefaultConfidenceMultiplier gates hook output: top result must score >= threshold * multiplier.

View Source
const DefaultLimit = 5

DefaultLimit is the maximum number of results returned.

View Source
const DefaultSingleTokenFloor = 8

DefaultSingleTokenFloor is the minimum score for results with only one matched token. Single-token matches need stronger signal to avoid conversational leakage (e.g., "great work on the refactor" -> only "refactor" survives stop-word filtering).

View Source
const DefaultThreshold = 2

DefaultThreshold is the minimum score for an entry to be included in results.

Variables

This section is empty.

Functions

func ApplyGates added in v1.2.0

func ApplyGates(hits []ScoredEntry) (surviving []ScoredEntry, reason GateReason)

ApplyGates applies the hook's three post-scoring gates to a ranked result list. Input must already be sorted by score descending (as ScoreTop guarantees). Returns the surviving entries and the GateReason if any results were suppressed or truncated; GateReasonNone means all inputs passed through intact.

Gate order (matches scoreTokens inline logic exactly):

  1. Low-confidence: top score < DefaultThreshold * DefaultConfidenceMultiplier → drop all.
  2. Single-token weak: top has <2 matched tokens AND score < DefaultSingleTokenFloor → drop all.
  3. Differential: top < softCeiling AND gap to #2 < minScoreGap → truncate to 1.

func CategoryAlias

func CategoryAlias(token string) (string, bool)

CategoryAlias returns the canonical category for a token alias, if one exists.

func FilterByType

func FilterByType(entries []index.Entry, t index.EntryType) []index.Entry

FilterByType returns only entries matching the given type.

func HasStemMatch

func HasStemMatch(a, b string) bool

HasStemMatch checks whether two words share a stem via prefix overlap. Returns true if the shared prefix is >= 4 bytes and covers >= 75% of the shorter word. Returns false if the words are identical (exact match handled separately).

func Tokenize

func Tokenize(prompt string, stops StopSet) []string

Tokenize lowercases the input, splits on non-alphanumeric boundaries (preserving internal hyphens), deduplicates, and removes stop words and single-char tokens.

Negation pass: tokens immediately following a negation marker (not, no, without, avoid, never, instead) are dropped. Stop words between the marker and the content token are transparent — "not a rust project" drops "rust". Negation markers themselves are always suppressed from output.

Types

type Corpus added in v1.2.0

type Corpus interface {
	Entries() []index.Entry
	IDFMap() map[string]float64
	AvgFieldLength() float64
}

Corpus is the minimal surface search.Run needs from an indexed document set. Exposes IDFMap (whole map) rather than per-token IDF(token) because ScoreExplained consumes the map internally; per-token lookup would force a scorer rewrite with no benefit.

func CorpusFromIndex added in v1.2.0

func CorpusFromIndex(idx *index.Index) Corpus

CorpusFromIndex wraps a loaded *index.Index as a Corpus.

func NewCorpus added in v1.2.0

func NewCorpus(entries []index.Entry, idf map[string]float64, avgFieldLen float64) Corpus

NewCorpus builds a Corpus with caller-supplied IDF and AvgFieldLen. Used when entries are a filtered subset of a larger corpus and callers must preserve the parent corpus's statistics (e.g. search --filter=skill).

type EntryDiagnostics added in v1.0.0

type EntryDiagnostics struct {
	Entry        index.Entry `json:"entry"`
	RawScore     float64     `json:"raw_score"`     // pre-round, post-usage-boost
	FinalScore   int         `json:"final_score"`   // math.Round(RawScore)
	TokenHits    []TokenHit  `json:"token_hits"`    // one per prompt token (includes misses)
	BigramHits   []string    `json:"bigram_hits"`   // prompt bigrams that matched this entry
	BigramDeltas []float64   `json:"bigram_deltas"` // IDF-weighted bonus per matched bigram (parallel to BigramHits)
	MaxIDF       float64     `json:"max_idf"`       // best per-term IDF contribution (for gate)
	UsageBoost   float64     `json:"usage_boost"`   // multiplier applied (1.0 if hit_count=0)
	PreBoostSum  float64     `json:"pre_boost_sum"` // score before applyUsageBoost
	Suppressed   string      `json:"suppressed,omitzero"`
}

EntryDiagnostics captures everything the scorer computed for a single entry. Populated unconditionally; Score discards the details, ScoreExplained keeps them.

Suppressed is the human-readable reason this entry would NOT appear in Score's output. Empty string means the entry passed all gates. Values:

"below threshold"     — RawScore (post-boost, pre-round) < threshold
"idf gate: low-idf"   — only matched via common terms, no bigram

Never set by the scorer itself — filled in by ScoreExplained based on the same conditions Score uses.

func ScoreExplained added in v1.0.0

func ScoreExplained(entries []index.Entry, tokens []string, threshold int, idf map[string]float64, avgdl float64) []EntryDiagnostics

ScoreExplained returns diagnostics for every entry, including suppressed ones. Unlike Score, it does not filter by threshold or the IDF gate — callers get the complete picture, with Suppressed populated for entries Score would drop. Sort order: kept entries (score desc, name asc) then suppressed entries (score desc, name asc).

When tokens is empty, returns nil (same contract as Score).

type GateReason added in v1.2.0

type GateReason string

GateReason identifies why ApplyGates suppressed or truncated results.

const (
	GateReasonNone             GateReason = ""
	GateReasonLowConfidence    GateReason = "low-confidence"
	GateReasonSingleTokenFloor GateReason = "single-token-floor"
	GateReasonDifferential     GateReason = "differential"
)

type InMemoryCorpus added in v1.2.0

type InMemoryCorpus struct {
	// contains filtered or unexported fields
}

InMemoryCorpus is a fixture Corpus. NewInMemoryCorpus computes IDF and AvgFieldLen from the supplied entries.

func NewInMemoryCorpus added in v1.2.0

func NewInMemoryCorpus(entries ...index.Entry) *InMemoryCorpus

NewInMemoryCorpus builds a Corpus from raw entries, computing IDF and AvgFieldLen fresh. Use for standalone test fixtures.

func (*InMemoryCorpus) AvgFieldLength added in v1.2.0

func (c *InMemoryCorpus) AvgFieldLength() float64

func (*InMemoryCorpus) Entries added in v1.2.0

func (c *InMemoryCorpus) Entries() []index.Entry

func (*InMemoryCorpus) IDFMap added in v1.2.0

func (c *InMemoryCorpus) IDFMap() map[string]float64

type PipelineInput added in v1.2.0

type PipelineInput struct {
	Tokens     []string
	Corpus     Corpus
	Threshold  int
	Limit      int
	ApplyGates bool
}

PipelineInput is the unified input for Run. Every caller (hook, actions/search, actions/explain) composes this and reads the result.

type PipelineResult added in v1.2.0

type PipelineResult struct {
	Tokens         []string
	Diagnostics    []EntryDiagnostics
	Scored         []ScoredEntry
	AboveThreshold []ScoredEntry
	Surviving      []ScoredEntry
	Suppression    GateReason
}

PipelineResult carries every stage so callers render whichever view they need.

func Run added in v1.2.0

Run executes the full scoring pipeline. Pure function: no I/O, no globals.

type ScoredEntry

type ScoredEntry struct {
	Entry   index.Entry `json:"entry"`
	Score   int         `json:"score"`
	Matched []string    `json:"matched"`
}

ScoredEntry pairs an entry with its relevance score.

func Score

func Score(entries []index.Entry, tokens []string, threshold int, idf map[string]float64, avgdl float64) []ScoredEntry

Score computes relevance scores for all entries against tokenized prompt. Uses field-weighted keywords, IDF multipliers, BM25 saturation, and bigrams. Returns entries with score >= threshold, sorted by score descending.

func ScoreTop

func ScoreTop(entries []index.Entry, tokens []string, threshold, limit int, idf map[string]float64, avgdl float64) []ScoredEntry

ScoreTop returns at most limit results.

type StopSet

type StopSet map[string]struct{}

StopSet is a set of words to filter during tokenization.

func DefaultStopWords

func DefaultStopWords() StopSet

DefaultStopWords returns the built-in stop word list.

func (StopSet) Contains

func (s StopSet) Contains(word string) bool

Contains reports whether the set includes the given word.

type TokenHit added in v1.0.0

type TokenHit struct {
	Token    string  `json:"token"`
	Kind     string  `json:"kind"`
	Weight   int     `json:"weight,omitzero"`    // field weight from entry.Keywords (0 for alias-only, miss)
	IDF      float64 `json:"idf"`                // IDF multiplier used (1.0 if no IDF map)
	Delta    float64 `json:"delta"`              // this token's contribution to score
	AliasCat string  `json:"alias_cat,omitzero"` // canonical category alias resolved to, if any
}

TokenHit is one prompt token's interaction with one entry. Kind values: "direct" | "plural" | "stem" | "alias" | "alias+direct" | "alias+plural" | "alias+stem" | "miss". Multiple mechanisms may fire for one token (alias + keyword additive); Kind joins them with "+" in that case.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL