Documentation
¶
Index ¶
- Variables
- func HitsSearch(tokens []string, indexer Indexer) slices.Slice[SearchResult]
- func LinearSearch(tokens []string, indexable Indexer) slices.Slice[SearchResult]
- func NoopAllSearch(tokens []string, indexable Indexer) slices.Slice[SearchResult]
- func NoopZeroSearch(tokens []string, indexable Indexer) slices.Slice[SearchResult]
- func R1R2RV(w []byte) (r1 []byte, r2 []byte, rv []byte)
- type AliasesResult
- type AliasesResultRow
- type Builder
- type CleanTokenizer
- type Doc
- type DocRequest
- type Engine
- type EngineType
- type Filter
- type HashKey
- type Hashable
- type Index
- type IndexRepo
- func (h *IndexRepo) Alias(alias string, index string) bool
- func (h *IndexRepo) Drop(indexName string) bool
- func (h *IndexRepo) Has(name string) bool
- func (h *IndexRepo) HasAlias(name string) bool
- func (h *IndexRepo) List() []string
- func (h *IndexRepo) ListAliases() AliasesResult
- func (h *IndexRepo) Put(indexName string, doc DocRequest)
- func (h *IndexRepo) Rename(old string, new string) bool
- func (h *IndexRepo) Search(indexName string, terms string, engine Engine) (streams.ReadStream[SearchResult], error)
- func (h *IndexRepo) String() string
- func (h *IndexRepo) UnAlias(alias, index string) bool
- type Indexer
- type LowerCaseFilter
- type MemoryIndex
- func (mi *MemoryIndex) Document(index int) Doc
- func (mi *MemoryIndex) Indexed(key string) []int
- func (mi *MemoryIndex) Len() int
- func (mi *MemoryIndex) Put(payload DocRequest) Index
- func (mi *MemoryIndex) Search(payload string, engine Engine) slices.Slice[SearchResult]
- func (mi *MemoryIndex) String() string
- type MimeType
- type Repo
- type SearchResult
- func (r SearchResult) Doc() Doc
- func (r SearchResult) GetDoc() Doc
- func (r SearchResult) GetHits() int
- func (v SearchResult) MarshalEasyJSON(w *jwriter.Writer)
- func (v SearchResult) MarshalJSON() ([]byte, error)
- func (v *SearchResult) UnmarshalEasyJSON(l *jlexer.Lexer)
- func (v *SearchResult) UnmarshalJSON(data []byte) error
- type SearchResults
- type SpanishStemmerFilter
- type Stemmer
- type StopWords
- type StopWordsFilter
- type TokenizationPipeline
- type Tokenizer
Constants ¶
This section is empty.
Variables ¶
var (
SpanishStopWords = StopWords{}/* 443 elements not displayed */
)
Functions ¶
func HitsSearch ¶
func HitsSearch(tokens []string, indexer Indexer) slices.Slice[SearchResult]
HitsSearch implements a hit-counting based search algorithm with AND logic.
Algorithm: 1. For each search token, find all documents containing that token 2. Count hits per document (number of unique search tokens each document contains) 3. Filter documents that have ALL tokens (hits >= number of search tokens) 4. Sort results by hit count in descending order (most relevant first) 5. Return results in deterministic order by preserving document index order for ties
Behavior: - Uses AND logic: only returns documents that contain ALL search tokens - Hit count = number of unique search tokens found in document (not total occurrences) - Results are sorted by relevance (hit count), then by document order for determinism - For multi-token queries, only documents with all tokens are returned - Time complexity: O(T * D + R log R) where T=tokens, D=avg docs per token, R=results - Space complexity: O(R) where R=number of matching documents
Differences from LinearSearch: - HitsSearch: Uses hit counting with hash map lookup, then sorts by relevance - LinearSearch: Uses set intersection with early termination, preserves document order - Both implement AND logic but with different performance characteristics - HitsSearch is better for relevance ranking, LinearSearch for simple boolean matching
Example:
Query: "java programming" Doc1: "java tutorial" (hits=1, excluded - doesn't have "programming") Doc2: "java programming guide" (hits=2, included) Doc3: "advanced java programming concepts" (hits=2, included) Result: [Doc2, Doc3] (both have hits=2, ordered by document index)
Note: Hit counting is per unique token, not total occurrences:
"java java programming" with query "java programming" = 2 hits (not 3)
func LinearSearch ¶
func LinearSearch(tokens []string, indexable Indexer) slices.Slice[SearchResult]
LinearSearch performs an intersection-based search across all query tokens.
Algorithm: 1. For each token in the query, get the list of documents that contain it 2. Find the intersection of all these document lists (documents that contain ALL tokens) 3. Return only documents that contain every single token in the query
Key characteristics: - Uses AND logic: ALL tokens must be present in a document for it to match - Same logic as HitsSearch but different algorithm (intersection vs. hit counting) - Results have Hits = len(tokens) since all tokens are guaranteed to be found - More efficient for queries with many tokens due to early termination - Deterministic order based on document index order
Comparison with HitsSearch: - Both implement AND logic (only documents with ALL tokens are returned) - LinearSearch: Uses set intersection operations (more efficient for large queries) - HitsSearch: Uses hit counting with threshold filtering (more flexible for scoring)
Example:
Query: "programming java" - Only returns documents that contain BOTH "programming" AND "java" - A document with only "programming" will NOT be returned - A document with only "java" will NOT be returned
func NoopAllSearch ¶
func NoopAllSearch(tokens []string, indexable Indexer) slices.Slice[SearchResult]
NoopAllSearch returns all documents as results
func NoopZeroSearch ¶
func NoopZeroSearch(tokens []string, indexable Indexer) slices.Slice[SearchResult]
NoopZeroSearch returns empty results
Types ¶
type AliasesResult ¶
type AliasesResult struct {
Aliases []AliasesResultRow
}
type AliasesResultRow ¶
type Builder ¶
func NewMemoryIndexBuilder ¶
func NewMemoryIndexBuilder(tokenizer tokenizer) Builder
type CleanTokenizer ¶
type CleanTokenizer struct {
// contains filtered or unexported fields
}
func NewCleanTokenizer ¶
func NewCleanTokenizer(fns ...cleanFunc) CleanTokenizer
func NewKeepAlphanumericTokenizer ¶
func NewKeepAlphanumericTokenizer() *CleanTokenizer
func (*CleanTokenizer) Tokenize ¶
func (c *CleanTokenizer) Tokenize(text string) []string
type DocRequest ¶
type DocRequest struct { Name string Content string MimeType MimeType // contains filtered or unexported fields }
func NewDocRequest ¶
func NewDocRequest(name, content string) DocRequest
func NewDocRequestWith ¶
func NewDocRequestWith(name, content, statement string) DocRequest
func NewDocRequestWithMime ¶
func NewDocRequestWithMime(name, content string, mime MimeType) DocRequest
func (DocRequest) ID ¶
func (d DocRequest) ID() string
func (DocRequest) Mime ¶
func (d DocRequest) Mime() MimeType
func (DocRequest) Raw ¶
func (d DocRequest) Raw() string
func (DocRequest) Statement ¶
func (d DocRequest) Statement() string
type Engine ¶
type Engine func(tokens []string, indexable Indexer) slices.Slice[SearchResult]
Engine defines the function signature for search functions
type EngineType ¶
type EngineType byte
const ( NoopZero EngineType = iota NoopAll Hits SmartsHits Linear )
type Index ¶
type Index interface { Put(payload DocRequest) Index Search(terms string, engine Engine) slices.Slice[SearchResult] }
type IndexRepo ¶
type IndexRepo struct {
// contains filtered or unexported fields
}
IndexRepo handles a collection of indexes
func NewIndexRepo ¶
func (*IndexRepo) ListAliases ¶
func (h *IndexRepo) ListAliases() AliasesResult
func (*IndexRepo) Put ¶
func (h *IndexRepo) Put(indexName string, doc DocRequest)
func (*IndexRepo) Search ¶
func (h *IndexRepo) Search( indexName string, terms string, engine Engine, ) (streams.ReadStream[SearchResult], error)
type LowerCaseFilter ¶
type LowerCaseFilter struct{}
func NewLowerCaseTokenizer ¶
func NewLowerCaseTokenizer() LowerCaseFilter
func (LowerCaseFilter) Filter ¶
func (l LowerCaseFilter) Filter(tokens []string) []string
type MemoryIndex ¶
type MemoryIndex struct { Docs []Doc `json:"indexed"` InvertedIndex map[string][]int `json:"inverted"` // contains filtered or unexported fields }
func NewMemoryIndex ¶
func NewMemoryIndex(name string, tkr tokenizer) *MemoryIndex
func (*MemoryIndex) Document ¶
func (mi *MemoryIndex) Document(index int) Doc
func (*MemoryIndex) Indexed ¶
func (mi *MemoryIndex) Indexed(key string) []int
func (*MemoryIndex) Len ¶
func (mi *MemoryIndex) Len() int
func (*MemoryIndex) Put ¶
func (mi *MemoryIndex) Put(payload DocRequest) Index
func (*MemoryIndex) Search ¶
func (mi *MemoryIndex) Search(payload string, engine Engine) slices.Slice[SearchResult]
func (*MemoryIndex) String ¶
func (mi *MemoryIndex) String() string
type Repo ¶
type Repo interface { List() []string ListAliases() AliasesResult Has(name string) bool HasAlias(name string) bool Alias(alias string, in string) bool UnAlias(alias, index string) bool Put(in string, req DocRequest) Search(index string, terms string, engine Engine) (streams.ReadStream[SearchResult], error) Rename(old string, new string) bool Drop(in string) bool }
type SearchResult ¶
func (SearchResult) GetDoc ¶
func (r SearchResult) GetDoc() Doc
func (SearchResult) GetHits ¶
func (r SearchResult) GetHits() int
func (SearchResult) MarshalEasyJSON ¶ added in v0.3.0
func (v SearchResult) MarshalEasyJSON(w *jwriter.Writer)
MarshalEasyJSON supports easyjson.Marshaler interface
func (SearchResult) MarshalJSON ¶ added in v0.3.0
func (v SearchResult) MarshalJSON() ([]byte, error)
MarshalJSON supports json.Marshaler interface
func (*SearchResult) UnmarshalEasyJSON ¶ added in v0.3.0
func (v *SearchResult) UnmarshalEasyJSON(l *jlexer.Lexer)
UnmarshalEasyJSON supports easyjson.Unmarshaler interface
func (*SearchResult) UnmarshalJSON ¶ added in v0.3.0
func (v *SearchResult) UnmarshalJSON(data []byte) error
UnmarshalJSON supports json.Unmarshaler interface
type SearchResults ¶
type SearchResults []SearchResult
func (SearchResults) Less ¶
func (r SearchResults) Less(i, j int) bool
func (SearchResults) Swap ¶
func (r SearchResults) Swap(i, j int)
type SpanishStemmerFilter ¶
type SpanishStemmerFilter struct {
// contains filtered or unexported fields
}
func NewSpanishStemmer ¶
func NewSpanishStemmer(removeStopWords bool) SpanishStemmerFilter
func (SpanishStemmerFilter) Filter ¶
func (s SpanishStemmerFilter) Filter(tokens []string) []string
type StopWordsFilter ¶
type StopWordsFilter struct {
// contains filtered or unexported fields
}
func NewStopWordsFilter ¶
func NewStopWordsFilter(sw StopWords) StopWordsFilter
func (StopWordsFilter) Filter ¶
func (s StopWordsFilter) Filter(tokens []string) []string
type TokenizationPipeline ¶
type TokenizationPipeline struct {
// contains filtered or unexported fields
}
func NewTokenizationPipeline ¶
func NewTokenizationPipeline(t Tokenizer, f ...Filter) *TokenizationPipeline
func (*TokenizationPipeline) Tokenize ¶
func (p *TokenizationPipeline) Tokenize(text string) []string
Source Files
¶
- analyze_clean_tokenizer.go
- analyze_compose.go
- analyze_lowercase_filter.go
- analyze_stem_filter.go
- analyze_stopwords_filter.go
- entities_common.go
- entities_doc.go
- entities_hash.go
- entities_request.go
- index_index.go
- index_memory_index.go
- repos_repo.go
- search_hits_search.go
- search_linear_search.go
- search_noop_search.go
- search_result.go
- search_result_easyjson.go
- search_search.go
- stemmer_spanish_snowball.go
- stemmer_stem.go