Documentation
¶
Index ¶
- Constants
- func InverseDocumentFrequency(docCount, tokenDocFreq uint64) (idf float64)
- func NormalizedTF(tokenFreq, documentLength uint64, ...) (normTf float64)
- func ScoreTermBM25(docCount, tokenDocFreq, tokenFreq, documentLength uint64, ...) (score float64)
- type Clause
- type ClauseEntry
- type ClauseState
- type HandleClauseFunc
- type Keyword
- type QueryContext
- type Range
- type RangeCaptureMode
- type Searcher
- func (s *Searcher) BM25Score(ctx *QueryContext, q *SimpleQuery)
- func (s *Searcher) FieldScore(ctx *QueryContext, fieldHash uint64)
- func (s *Searcher) FilterDocuments(ctx *QueryContext, q *SimpleQuery)
- func (s *Searcher) Iter(c *Clause, handle HandleClauseFunc)
- func (s *Searcher) ResolveScores(ctx *QueryContext) (idxs []uint64)
- func (s *Searcher) UpdateScoresWithBM25(ctx *QueryContext, state *ClauseState)
- type SimpleQuery
Constants ¶
const ( DefaultSaturation = 1.2 DefaultLengthPenalty = 0.75 )
Variables ¶
This section is empty.
Functions ¶
func InverseDocumentFrequency ¶
IDF returns the Inverse Document Frequency for a single term. It answers: "how surprising is it to see this term in a document?" Document Count is the total number of documents indexed Token Document Frequency is how many documents contains the supplied token at least once
func NormalizedTF ¶
func NormalizedTF(tokenFreq, documentLength uint64, avgDocLength, saturation, lengthPenalty float64) (normTf float64)
NormalizedTF returns the saturated, length-normalized term frequency for one term in one document's field. tokenFreq - raw count: how many times the term appears in this doc's field documentLength - document length: number of tokens in this doc's field avgDocLength - average document length across all docs for this field saturation - saturation: how fast extra occurrences stop mattering (typically 1.2) lengthPenalty - length penalty: how hard to punish long documents (typically 0.75)
func ScoreTermBM25 ¶
Types ¶
type Clause ¶
type Clause struct {
Keywords []*Keyword
FieldKeywords []*ClauseEntry[Keyword]
FieldRanges []*ClauseEntry[Range]
}
func (*Clause) FieldKeyword ¶
func (*Clause) FieldRange ¶
func (c *Clause) FieldRange(field uint64, lo, hi []byte, mode RangeCaptureMode, boost float64)
type ClauseEntry ¶
type ClauseState ¶
type HandleClauseFunc ¶
type HandleClauseFunc func(state *ClauseState)
type QueryContext ¶
Query context intended to be cached and reused by caller on each search
type Range ¶
type Range struct {
CaptureMode RangeCaptureMode
Boost float64
Low, High []byte
}
type RangeCaptureMode ¶
type RangeCaptureMode int
const ( RangeCaptureModeNone RangeCaptureMode = iota RangeCaptureModeLeft RangeCaptureModeRight RangeCaptureModeBoth )
type Searcher ¶
type Searcher struct {
Storage *storage.Storage
BM25Saturation float64
BM25LengthPenalty float64
// Maximum amount of entries challenged against levenshtein fuzz algorithm
LevenshteinM int
LevenshteinMaxK int
}
func (*Searcher) BM25Score ¶
func (s *Searcher) BM25Score(ctx *QueryContext, q *SimpleQuery)
func (*Searcher) FieldScore ¶
func (s *Searcher) FieldScore(ctx *QueryContext, fieldHash uint64)
func (*Searcher) FilterDocuments ¶
func (s *Searcher) FilterDocuments(ctx *QueryContext, q *SimpleQuery)
Filter the documents id index into the destination bitmap the idea is to filter first the score results based on conditions is caller's responsability to clear dst bitmap
func (*Searcher) Iter ¶
func (s *Searcher) Iter(c *Clause, handle HandleClauseFunc)
func (*Searcher) ResolveScores ¶
func (s *Searcher) ResolveScores(ctx *QueryContext) (idxs []uint64)
Once a filtering and scoring are done, next step of a searching algorithm Resolves the ctx to an actual idx slice
func (*Searcher) UpdateScoresWithBM25 ¶
func (s *Searcher) UpdateScoresWithBM25(ctx *QueryContext, state *ClauseState)