search

package

v1.2.0 Latest Latest Go to latest Published: Apr 19, 2026 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dotcommander/claudette

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func ApplyGates(hits []ScoredEntry) (surviving []ScoredEntry, reason GateReason)
func CategoryAlias(token string) (string, bool)
func FilterByType(entries []index.Entry, t index.EntryType) []index.Entry
func HasStemMatch(a, b string) bool
func Tokenize(prompt string, stops StopSet) []string
type Corpus
- func CorpusFromIndex(idx *index.Index) Corpus
- func NewCorpus(entries []index.Entry, idf map[string]float64, avgFieldLen float64) Corpus
type EntryDiagnostics
- func ScoreExplained(entries []index.Entry, tokens []string, threshold int, idf map[string]float64, ...) []EntryDiagnostics
type GateReason
type InMemoryCorpus
- func NewInMemoryCorpus(entries ...index.Entry) *InMemoryCorpus
- func (c *InMemoryCorpus) AvgFieldLength() float64
- func (c *InMemoryCorpus) Entries() []index.Entry
- func (c *InMemoryCorpus) IDFMap() map[string]float64
type PipelineInput
type PipelineResult
- func Run(in PipelineInput) PipelineResult
type ScoredEntry
- func Score(entries []index.Entry, tokens []string, threshold int, idf map[string]float64, ...) []ScoredEntry
- func ScoreTop(entries []index.Entry, tokens []string, threshold, limit int, ...) []ScoredEntry
type StopSet
- func DefaultStopWords() StopSet
- func (s StopSet) Contains(word string) bool
type TokenHit

Constants ¶

View Source

const DefaultConfidenceMultiplier = 2

DefaultConfidenceMultiplier gates hook output: top result must score >= threshold * multiplier.

View Source

const DefaultLimit = 5

DefaultLimit is the maximum number of results returned.

View Source

const DefaultSingleTokenFloor = 8

DefaultSingleTokenFloor is the minimum score for results with only one matched token. Single-token matches need stronger signal to avoid conversational leakage (e.g., "great work on the refactor" -> only "refactor" survives stop-word filtering).

View Source

const DefaultThreshold = 2

DefaultThreshold is the minimum score for an entry to be included in results.

Variables ¶

This section is empty.

Functions ¶

func ApplyGates ¶ added in v1.2.0

func ApplyGates(hits []ScoredEntry) (surviving []ScoredEntry, reason GateReason)

ApplyGates applies the hook's three post-scoring gates to a ranked result list. Input must already be sorted by score descending (as ScoreTop guarantees). Returns the surviving entries and the GateReason if any results were suppressed or truncated; GateReasonNone means all inputs passed through intact.

Gate order (matches scoreTokens inline logic exactly):

Low-confidence: top score < DefaultThreshold * DefaultConfidenceMultiplier → drop all.
Single-token weak: top has <2 matched tokens AND score < DefaultSingleTokenFloor → drop all.
Differential: top < softCeiling AND gap to #2 < minScoreGap → truncate to 1.

func CategoryAlias ¶

func CategoryAlias(token string) (string, bool)

CategoryAlias returns the canonical category for a token alias, if one exists.

func FilterByType ¶

func FilterByType(entries []index.Entry, t index.EntryType) []index.Entry

FilterByType returns only entries matching the given type.

func HasStemMatch ¶

func HasStemMatch(a, b string) bool

HasStemMatch checks whether two words share a stem via prefix overlap. Returns true if the shared prefix is >= 4 bytes and covers >= 75% of the shorter word. Returns false if the words are identical (exact match handled separately).

func Tokenize ¶

func Tokenize(prompt string, stops StopSet) []string

Tokenize lowercases the input, splits on non-alphanumeric boundaries (preserving internal hyphens), deduplicates, and removes stop words and single-char tokens.

Negation pass: tokens immediately following a negation marker (not, no, without, avoid, never, instead) are dropped. Stop words between the marker and the content token are transparent — "not a rust project" drops "rust". Negation markers themselves are always suppressed from output.

Types ¶

type Corpus ¶ added in v1.2.0

type Corpus interface {
	Entries() []index.Entry
	IDFMap() map[string]float64
	AvgFieldLength() float64
}

Corpus is the minimal surface search.Run needs from an indexed document set. Exposes IDFMap (whole map) rather than per-token IDF(token) because ScoreExplained consumes the map internally; per-token lookup would force a scorer rewrite with no benefit.

func CorpusFromIndex ¶ added in v1.2.0

func CorpusFromIndex(idx *index.Index) Corpus

CorpusFromIndex wraps a loaded *index.Index as a Corpus.

func NewCorpus ¶ added in v1.2.0

func NewCorpus(entries []index.Entry, idf map[string]float64, avgFieldLen float64) Corpus

NewCorpus builds a Corpus with caller-supplied IDF and AvgFieldLen. Used when entries are a filtered subset of a larger corpus and callers must preserve the parent corpus's statistics (e.g. search --filter=skill).

type EntryDiagnostics ¶ added in v1.0.0

type EntryDiagnostics struct {
	Entry        index.Entry `json:"entry"`
	RawScore     float64     `json:"raw_score"`     // pre-round, post-usage-boost
	FinalScore   int         `json:"final_score"`   // math.Round(RawScore)
	TokenHits    []TokenHit  `json:"token_hits"`    // one per prompt token (includes misses)
	BigramHits   []string    `json:"bigram_hits"`   // prompt bigrams that matched this entry
	BigramDeltas []float64   `json:"bigram_deltas"` // IDF-weighted bonus per matched bigram (parallel to BigramHits)
	MaxIDF       float64     `json:"max_idf"`       // best per-term IDF contribution (for gate)
	UsageBoost   float64     `json:"usage_boost"`   // multiplier applied (1.0 if hit_count=0)
	PreBoostSum  float64     `json:"pre_boost_sum"` // score before applyUsageBoost
	Suppressed   string      `json:"suppressed,omitzero"`
}

EntryDiagnostics captures everything the scorer computed for a single entry. Populated unconditionally; Score discards the details, ScoreExplained keeps them.

Suppressed is the human-readable reason this entry would NOT appear in Score's output. Empty string means the entry passed all gates. Values:

"below threshold"     — RawScore (post-boost, pre-round) < threshold
"idf gate: low-idf"   — only matched via common terms, no bigram

Never set by the scorer itself — filled in by ScoreExplained based on the same conditions Score uses.

func ScoreExplained ¶ added in v1.0.0

func ScoreExplained(entries []index.Entry, tokens []string, threshold int, idf map[string]float64, avgdl float64) []EntryDiagnostics

ScoreExplained returns diagnostics for every entry, including suppressed ones. Unlike Score, it does not filter by threshold or the IDF gate — callers get the complete picture, with Suppressed populated for entries Score would drop. Sort order: kept entries (score desc, name asc) then suppressed entries (score desc, name asc).

When tokens is empty, returns nil (same contract as Score).

type GateReason ¶ added in v1.2.0

type GateReason string

GateReason identifies why ApplyGates suppressed or truncated results.

const (
	GateReasonNone             GateReason = ""
	GateReasonLowConfidence    GateReason = "low-confidence"
	GateReasonSingleTokenFloor GateReason = "single-token-floor"
	GateReasonDifferential     GateReason = "differential"
)

type InMemoryCorpus ¶ added in v1.2.0

type InMemoryCorpus struct {
	// contains filtered or unexported fields
}

InMemoryCorpus is a fixture Corpus. NewInMemoryCorpus computes IDF and AvgFieldLen from the supplied entries.

func NewInMemoryCorpus ¶ added in v1.2.0

func NewInMemoryCorpus(entries ...index.Entry) *InMemoryCorpus

NewInMemoryCorpus builds a Corpus from raw entries, computing IDF and AvgFieldLen fresh. Use for standalone test fixtures.

func (*InMemoryCorpus) AvgFieldLength ¶ added in v1.2.0

func (c *InMemoryCorpus) AvgFieldLength() float64

func (*InMemoryCorpus) Entries ¶ added in v1.2.0

func (c *InMemoryCorpus) Entries() []index.Entry

func (*InMemoryCorpus) IDFMap ¶ added in v1.2.0

func (c *InMemoryCorpus) IDFMap() map[string]float64

type PipelineInput ¶ added in v1.2.0

type PipelineInput struct {
	Tokens     []string
	Corpus     Corpus
	Threshold  int
	Limit      int
	ApplyGates bool
}

PipelineInput is the unified input for Run. Every caller (hook, actions/search, actions/explain) composes this and reads the result.

type PipelineResult ¶ added in v1.2.0

type PipelineResult struct {
	Tokens         []string
	Diagnostics    []EntryDiagnostics
	Scored         []ScoredEntry
	AboveThreshold []ScoredEntry
	Surviving      []ScoredEntry
	Suppression    GateReason
}

PipelineResult carries every stage so callers render whichever view they need.

func Run ¶ added in v1.2.0

func Run(in PipelineInput) PipelineResult

Run executes the full scoring pipeline. Pure function: no I/O, no globals.

type ScoredEntry ¶

type ScoredEntry struct {
	Entry   index.Entry `json:"entry"`
	Score   int         `json:"score"`
	Matched []string    `json:"matched"`
}

ScoredEntry pairs an entry with its relevance score.

func Score ¶

func Score(entries []index.Entry, tokens []string, threshold int, idf map[string]float64, avgdl float64) []ScoredEntry

Score computes relevance scores for all entries against tokenized prompt. Uses field-weighted keywords, IDF multipliers, BM25 saturation, and bigrams. Returns entries with score >= threshold, sorted by score descending.

func ScoreTop ¶

func ScoreTop(entries []index.Entry, tokens []string, threshold, limit int, idf map[string]float64, avgdl float64) []ScoredEntry

ScoreTop returns at most limit results.

type StopSet ¶

type StopSet map[string]struct{}

StopSet is a set of words to filter during tokenization.

func DefaultStopWords ¶

func DefaultStopWords() StopSet

DefaultStopWords returns the built-in stop word list.

func (StopSet) Contains ¶

func (s StopSet) Contains(word string) bool

Contains reports whether the set includes the given word.

type TokenHit ¶ added in v1.0.0

type TokenHit struct {
	Token    string  `json:"token"`
	Kind     string  `json:"kind"`
	Weight   int     `json:"weight,omitzero"`    // field weight from entry.Keywords (0 for alias-only, miss)
	IDF      float64 `json:"idf"`                // IDF multiplier used (1.0 if no IDF map)
	Delta    float64 `json:"delta"`              // this token's contribution to score
	AliasCat string  `json:"alias_cat,omitzero"` // canonical category alias resolved to, if any
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL