Documentation
¶
Index ¶
- Constants
- func ApplyGates(hits []ScoredEntry) (surviving []ScoredEntry, reason GateReason)
- func CategoryAlias(token string) (string, bool)
- func FilterByType(entries []index.Entry, t index.EntryType) []index.Entry
- func HasStemMatch(a, b string) bool
- func Tokenize(prompt string, stops StopSet) []string
- type Corpus
- type EntryDiagnostics
- type GateReason
- type InMemoryCorpus
- type PipelineInput
- type PipelineResult
- type ScoredEntry
- type StopSet
- type TokenHit
Constants ¶
const DefaultConfidenceMultiplier = 2
DefaultConfidenceMultiplier gates hook output: top result must score >= threshold * multiplier.
const DefaultLimit = 5
DefaultLimit is the maximum number of results returned.
const DefaultSingleTokenFloor = 8
DefaultSingleTokenFloor is the minimum score for results with only one matched token. Single-token matches need stronger signal to avoid conversational leakage (e.g., "great work on the refactor" -> only "refactor" survives stop-word filtering).
const DefaultThreshold = 2
DefaultThreshold is the minimum score for an entry to be included in results.
Variables ¶
This section is empty.
Functions ¶
func ApplyGates ¶ added in v1.2.0
func ApplyGates(hits []ScoredEntry) (surviving []ScoredEntry, reason GateReason)
ApplyGates applies the hook's three post-scoring gates to a ranked result list. Input must already be sorted by score descending (as ScoreTop guarantees). Returns the surviving entries and the GateReason if any results were suppressed or truncated; GateReasonNone means all inputs passed through intact.
Gate order (matches scoreTokens inline logic exactly):
- Low-confidence: top score < DefaultThreshold * DefaultConfidenceMultiplier → drop all.
- Single-token weak: top has <2 matched tokens AND score < DefaultSingleTokenFloor → drop all.
- Differential: top < softCeiling AND gap to #2 < minScoreGap → truncate to 1.
func CategoryAlias ¶
CategoryAlias returns the canonical category for a token alias, if one exists.
func FilterByType ¶
FilterByType returns only entries matching the given type.
func HasStemMatch ¶
HasStemMatch checks whether two words share a stem via prefix overlap. Returns true if the shared prefix is >= 4 bytes and covers >= 75% of the shorter word. Returns false if the words are identical (exact match handled separately).
func Tokenize ¶
Tokenize lowercases the input, splits on non-alphanumeric boundaries (preserving internal hyphens), deduplicates, and removes stop words and single-char tokens.
Negation pass: tokens immediately following a negation marker (not, no, without, avoid, never, instead) are dropped. Stop words between the marker and the content token are transparent — "not a rust project" drops "rust". Negation markers themselves are always suppressed from output.
Types ¶
type Corpus ¶ added in v1.2.0
type Corpus interface {
Entries() []index.Entry
IDFMap() map[string]float64
AvgFieldLength() float64
}
Corpus is the minimal surface search.Run needs from an indexed document set. Exposes IDFMap (whole map) rather than per-token IDF(token) because ScoreExplained consumes the map internally; per-token lookup would force a scorer rewrite with no benefit.
func CorpusFromIndex ¶ added in v1.2.0
CorpusFromIndex wraps a loaded *index.Index as a Corpus.
type EntryDiagnostics ¶ added in v1.0.0
type EntryDiagnostics struct {
Entry index.Entry `json:"entry"`
RawScore float64 `json:"raw_score"` // pre-round, post-usage-boost
FinalScore int `json:"final_score"` // math.Round(RawScore)
TokenHits []TokenHit `json:"token_hits"` // one per prompt token (includes misses)
BigramHits []string `json:"bigram_hits"` // prompt bigrams that matched this entry
BigramDeltas []float64 `json:"bigram_deltas"` // IDF-weighted bonus per matched bigram (parallel to BigramHits)
MaxIDF float64 `json:"max_idf"` // best per-term IDF contribution (for gate)
UsageBoost float64 `json:"usage_boost"` // multiplier applied (1.0 if hit_count=0)
PreBoostSum float64 `json:"pre_boost_sum"` // score before applyUsageBoost
Suppressed string `json:"suppressed,omitzero"`
}
EntryDiagnostics captures everything the scorer computed for a single entry. Populated unconditionally; Score discards the details, ScoreExplained keeps them.
Suppressed is the human-readable reason this entry would NOT appear in Score's output. Empty string means the entry passed all gates. Values:
"below threshold" — RawScore (post-boost, pre-round) < threshold "idf gate: low-idf" — only matched via common terms, no bigram
Never set by the scorer itself — filled in by ScoreExplained based on the same conditions Score uses.
func ScoreExplained ¶ added in v1.0.0
func ScoreExplained(entries []index.Entry, tokens []string, threshold int, idf map[string]float64, avgdl float64) []EntryDiagnostics
ScoreExplained returns diagnostics for every entry, including suppressed ones. Unlike Score, it does not filter by threshold or the IDF gate — callers get the complete picture, with Suppressed populated for entries Score would drop. Sort order: kept entries (score desc, name asc) then suppressed entries (score desc, name asc).
When tokens is empty, returns nil (same contract as Score).
type GateReason ¶ added in v1.2.0
type GateReason string
GateReason identifies why ApplyGates suppressed or truncated results.
const ( GateReasonNone GateReason = "" GateReasonLowConfidence GateReason = "low-confidence" GateReasonSingleTokenFloor GateReason = "single-token-floor" GateReasonDifferential GateReason = "differential" )
type InMemoryCorpus ¶ added in v1.2.0
type InMemoryCorpus struct {
// contains filtered or unexported fields
}
InMemoryCorpus is a fixture Corpus. NewInMemoryCorpus computes IDF and AvgFieldLen from the supplied entries.
func NewInMemoryCorpus ¶ added in v1.2.0
func NewInMemoryCorpus(entries ...index.Entry) *InMemoryCorpus
NewInMemoryCorpus builds a Corpus from raw entries, computing IDF and AvgFieldLen fresh. Use for standalone test fixtures.
func (*InMemoryCorpus) AvgFieldLength ¶ added in v1.2.0
func (c *InMemoryCorpus) AvgFieldLength() float64
func (*InMemoryCorpus) Entries ¶ added in v1.2.0
func (c *InMemoryCorpus) Entries() []index.Entry
func (*InMemoryCorpus) IDFMap ¶ added in v1.2.0
func (c *InMemoryCorpus) IDFMap() map[string]float64
type PipelineInput ¶ added in v1.2.0
PipelineInput is the unified input for Run. Every caller (hook, actions/search, actions/explain) composes this and reads the result.
type PipelineResult ¶ added in v1.2.0
type PipelineResult struct {
Tokens []string
Diagnostics []EntryDiagnostics
Scored []ScoredEntry
AboveThreshold []ScoredEntry
Surviving []ScoredEntry
Suppression GateReason
}
PipelineResult carries every stage so callers render whichever view they need.
func Run ¶ added in v1.2.0
func Run(in PipelineInput) PipelineResult
Run executes the full scoring pipeline. Pure function: no I/O, no globals.
type ScoredEntry ¶
type ScoredEntry struct {
Entry index.Entry `json:"entry"`
Score int `json:"score"`
Matched []string `json:"matched"`
}
ScoredEntry pairs an entry with its relevance score.
func Score ¶
func Score(entries []index.Entry, tokens []string, threshold int, idf map[string]float64, avgdl float64) []ScoredEntry
Score computes relevance scores for all entries against tokenized prompt. Uses field-weighted keywords, IDF multipliers, BM25 saturation, and bigrams. Returns entries with score >= threshold, sorted by score descending.
type StopSet ¶
type StopSet map[string]struct{}
StopSet is a set of words to filter during tokenization.
func DefaultStopWords ¶
func DefaultStopWords() StopSet
DefaultStopWords returns the built-in stop word list.
type TokenHit ¶ added in v1.0.0
type TokenHit struct {
Token string `json:"token"`
Kind string `json:"kind"`
Weight int `json:"weight,omitzero"` // field weight from entry.Keywords (0 for alias-only, miss)
IDF float64 `json:"idf"` // IDF multiplier used (1.0 if no IDF map)
Delta float64 `json:"delta"` // this token's contribution to score
AliasCat string `json:"alias_cat,omitzero"` // canonical category alias resolved to, if any
}
TokenHit is one prompt token's interaction with one entry. Kind values: "direct" | "plural" | "stem" | "alias" | "alias+direct" | "alias+plural" | "alias+stem" | "miss". Multiple mechanisms may fire for one token (alias + keyword additive); Kind joins them with "+" in that case.