query

package

v1.2.1 Latest Latest Go to latest Published: Jun 21, 2026 License: AGPL-3.0 Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/RogueTeam/textiplex

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func InverseDocumentFrequency(docCount, tokenDocFreq uint64) (idf float64)
func NormalizedTF(tokenFreq, documentLength uint64, ...) (normTf float64)
func ScoreTermBM25(docCount, tokenDocFreq, tokenFreq, documentLength uint64, ...) (score float64)
type Clause
type ClauseEntry
type ClauseState
type HandleClauseFunc
type Keyword
type QueryContext
type Range
type RangeCaptureMode
type Searcher
- func New(s *storage.Storage) (searcher *Searcher)
type SimpleQuery

Constants ¶

View Source

const (
	DefaultSaturation    = 1.2
	DefaultLengthPenalty = 0.75
)

Variables ¶

This section is empty.

Functions ¶

func InverseDocumentFrequency ¶

func InverseDocumentFrequency(docCount, tokenDocFreq uint64) (idf float64)

IDF returns the Inverse Document Frequency for a single term. It answers: "how surprising is it to see this term in a document?" Document Count is the total number of documents indexed Token Document Frequency is how many documents contains the supplied token at least once

func NormalizedTF ¶

func NormalizedTF(tokenFreq, documentLength uint64, avgDocLength, saturation, lengthPenalty float64) (normTf float64)

NormalizedTF returns the saturated, length-normalized term frequency for one term in one document's field. tokenFreq - raw count: how many times the term appears in this doc's field documentLength - document length: number of tokens in this doc's field avgDocLength - average document length across all docs for this field saturation - saturation: how fast extra occurrences stop mattering (typically 1.2) lengthPenalty - length penalty: how hard to punish long documents (typically 0.75)

func ScoreTermBM25 ¶

func ScoreTermBM25(docCount, tokenDocFreq, tokenFreq, documentLength uint64, avgDocLength, saturation, lengthPenalty float64) (score float64)

Types ¶

type Clause ¶

type Clause struct {
	Keywords      []*Keyword
	FieldKeywords []*ClauseEntry[Keyword]
	FieldRanges   []*ClauseEntry[Range]
}

func (*Clause) Count ¶

func (c *Clause) Count() (count int)

func (*Clause) FieldKeyword ¶

func (c *Clause) FieldKeyword(field uint64, kw []byte, boost float64, fuzzy int)

func (*Clause) FieldRange ¶

func (c *Clause) FieldRange(field uint64, lo, hi []byte, mode RangeCaptureMode, boost float64)

func (*Clause) Keyword ¶

func (c *Clause) Keyword(kw []byte, boost float64, fuzzy int)

type ClauseEntry ¶

type ClauseEntry[T Keyword | Range] struct {
	FieldHash uint64
	Value     T
}

type ClauseState ¶

type ClauseState struct {
	// Used to check if something was actuall found or not
	// Should always be handled first by caller
	Found bool
	Boost float64
	// Field references
	Field     *storage.Field
	FieldHash uint64
	// Token references
	Token *storage.Token
}

type HandleClauseFunc ¶

type HandleClauseFunc func(state *ClauseState)

type Keyword ¶

type Keyword struct {
	Boost float64
	Fuzzy int
	Value []byte
}

type QueryContext ¶

type QueryContext struct {
	Bitmap roaring64.Bitmap
	Scores map[uint64]float64
}

Query context intended to be cached and reused by caller on each search

type Range ¶

type Range struct {
	CaptureMode RangeCaptureMode
	Boost       float64
	Low, High   []byte
}

type RangeCaptureMode ¶

type RangeCaptureMode int

const (
	RangeCaptureModeNone RangeCaptureMode = iota
	RangeCaptureModeLeft
	RangeCaptureModeRight
	RangeCaptureModeBoth
)

type Searcher ¶

type Searcher struct {
	Storage           *storage.Storage
	BM25Saturation    float64
	BM25LengthPenalty float64
	// Maximum amount of entries challenged against levenshtein fuzz algorithm
	LevenshteinM    int
	LevenshteinMaxK int
}

func New ¶

func New(s *storage.Storage) (searcher *Searcher)

func (*Searcher) BM25Score ¶

func (s *Searcher) BM25Score(ctx *QueryContext, q *SimpleQuery)

func (*Searcher) FieldScore ¶

func (s *Searcher) FieldScore(ctx *QueryContext, fieldHash uint64)

func (*Searcher) FilterDocuments ¶

func (s *Searcher) FilterDocuments(ctx *QueryContext, q *SimpleQuery)

Filter the documents id index into the destination bitmap the idea is to filter first the score results based on conditions is caller's responsability to clear dst bitmap

func (*Searcher) Iter ¶

func (s *Searcher) Iter(c *Clause, handle HandleClauseFunc)

func (*Searcher) ResolveScores ¶

func (s *Searcher) ResolveScores(ctx *QueryContext) (idxs []uint64)

Once a filtering and scoring are done, next step of a searching algorithm Resolves the ctx to an actual idx slice

func (*Searcher) UpdateScoresWithBM25 ¶

func (s *Searcher) UpdateScoresWithBM25(ctx *QueryContext, state *ClauseState)

type SimpleQuery ¶

type SimpleQuery struct {
	Shoulds Clause
	Musts   Clause
	// Must not will not make use of boost
	MustNots Clause
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL