query

package
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 21, 2026 License: AGPL-3.0 Imports: 7 Imported by: 0

Documentation

Index

Constants

View Source
const (
	DefaultSaturation    = 1.2
	DefaultLengthPenalty = 0.75
)

Variables

This section is empty.

Functions

func InverseDocumentFrequency

func InverseDocumentFrequency(docCount, tokenDocFreq uint64) (idf float64)

IDF returns the Inverse Document Frequency for a single term. It answers: "how surprising is it to see this term in a document?" Document Count is the total number of documents indexed Token Document Frequency is how many documents contains the supplied token at least once

func NormalizedTF

func NormalizedTF(tokenFreq, documentLength uint64, avgDocLength, saturation, lengthPenalty float64) (normTf float64)

NormalizedTF returns the saturated, length-normalized term frequency for one term in one document's field. tokenFreq - raw count: how many times the term appears in this doc's field documentLength - document length: number of tokens in this doc's field avgDocLength - average document length across all docs for this field saturation - saturation: how fast extra occurrences stop mattering (typically 1.2) lengthPenalty - length penalty: how hard to punish long documents (typically 0.75)

func ScoreTermBM25

func ScoreTermBM25(docCount, tokenDocFreq, tokenFreq, documentLength uint64, avgDocLength, saturation, lengthPenalty float64) (score float64)

Types

type Clause

type Clause struct {
	Keywords      []*Keyword
	FieldKeywords []*ClauseEntry[Keyword]
	FieldRanges   []*ClauseEntry[Range]
}

func (*Clause) Count

func (c *Clause) Count() (count int)

func (*Clause) FieldKeyword

func (c *Clause) FieldKeyword(field uint64, kw []byte, boost float64, fuzzy int)

func (*Clause) FieldRange

func (c *Clause) FieldRange(field uint64, lo, hi []byte, mode RangeCaptureMode, boost float64)

func (*Clause) Keyword

func (c *Clause) Keyword(kw []byte, boost float64, fuzzy int)

type ClauseEntry

type ClauseEntry[T Keyword | Range] struct {
	FieldHash uint64
	Value     T
}

type ClauseState

type ClauseState struct {
	// Used to check if something was actuall found or not
	// Should always be handled first by caller
	Found bool
	Boost float64
	// Field references
	Field     *storage.Field
	FieldHash uint64
	// Token references
	Token *storage.Token
}

type HandleClauseFunc

type HandleClauseFunc func(state *ClauseState)

type Keyword

type Keyword struct {
	Boost float64
	Fuzzy int
	Value []byte
}

type QueryContext

type QueryContext struct {
	Bitmap roaring64.Bitmap
	Scores map[uint64]float64
}

Query context intended to be cached and reused by caller on each search

type Range

type Range struct {
	CaptureMode RangeCaptureMode
	Boost       float64
	Low, High   []byte
}

type RangeCaptureMode

type RangeCaptureMode int
const (
	RangeCaptureModeNone RangeCaptureMode = iota
	RangeCaptureModeLeft
	RangeCaptureModeRight
	RangeCaptureModeBoth
)

type Searcher

type Searcher struct {
	Storage           *storage.Storage
	BM25Saturation    float64
	BM25LengthPenalty float64
	// Maximum amount of entries challenged against levenshtein fuzz algorithm
	LevenshteinM    int
	LevenshteinMaxK int
}

func New

func New(s *storage.Storage) (searcher *Searcher)

func (*Searcher) BM25Score

func (s *Searcher) BM25Score(ctx *QueryContext, q *SimpleQuery)

func (*Searcher) FieldScore

func (s *Searcher) FieldScore(ctx *QueryContext, fieldHash uint64)

func (*Searcher) FilterDocuments

func (s *Searcher) FilterDocuments(ctx *QueryContext, q *SimpleQuery)

Filter the documents id index into the destination bitmap the idea is to filter first the score results based on conditions is caller's responsability to clear dst bitmap

func (*Searcher) Iter

func (s *Searcher) Iter(c *Clause, handle HandleClauseFunc)

func (*Searcher) ResolveScores

func (s *Searcher) ResolveScores(ctx *QueryContext) (idxs []uint64)

Once a filtering and scoring are done, next step of a searching algorithm Resolves the ctx to an actual idx slice

func (*Searcher) UpdateScoresWithBM25

func (s *Searcher) UpdateScoresWithBM25(ctx *QueryContext, state *ClauseState)

type SimpleQuery

type SimpleQuery struct {
	Shoulds Clause
	Musts   Clause
	// Must not will not make use of boost
	MustNots Clause
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL