analysis

package
v2.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2024 License: Apache-2.0 Imports: 9 Imported by: 141

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")
View Source
var ErrInvalidTimestampRange = fmt.Errorf("timestamp out of range")
View Source
var ErrInvalidTimestampString = fmt.Errorf("unable to parse timestamp string")

Functions

func BuildTermFromRunes

func BuildTermFromRunes(runes []rune) []byte

func BuildTermFromRunesOptimistic

func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte

BuildTermFromRunesOptimistic will build a term from the provided runes AND optimistically attempt to encode into the provided buffer if at any point it appears the buffer is too small, a new buffer is allocated and that is used instead this should be used in cases where frequently the new term is the same length or shorter than the original term (in number of bytes)

func DeleteRune

func DeleteRune(in []rune, pos int) []rune

func InsertRune

func InsertRune(in []rune, pos int, r rune) []rune

func RunesEndsWith

func RunesEndsWith(input []rune, suffix string) bool

func TokenFrequency

func TokenFrequency(tokens TokenStream, arrayPositions []uint64, options index.FieldIndexingOptions) index.TokenFrequencies

func TruncateRunes

func TruncateRunes(input []byte, num int) []byte

Types

type Analyzer

type Analyzer interface {
	Analyze([]byte) TokenStream
}

type ByteArrayConverter

type ByteArrayConverter interface {
	Convert([]byte) (interface{}, error)
}

type CharFilter

type CharFilter interface {
	Filter([]byte) []byte
}

type DateTimeParser

type DateTimeParser interface {
	ParseDateTime(string) (time.Time, string, error)
}

type DefaultAnalyzer added in v2.3.5

type DefaultAnalyzer struct {
	CharFilters  []CharFilter
	Tokenizer    Tokenizer
	TokenFilters []TokenFilter
}

func (*DefaultAnalyzer) Analyze added in v2.3.5

func (a *DefaultAnalyzer) Analyze(input []byte) TokenStream

type Token

type Token struct {
	// Start specifies the byte offset of the beginning of the term in the
	// field.
	Start int `json:"start"`

	// End specifies the byte offset of the end of the term in the field.
	End  int    `json:"end"`
	Term []byte `json:"term"`

	// Position specifies the 1-based index of the token in the sequence of
	// occurrences of its term in the field.
	Position int       `json:"position"`
	Type     TokenType `json:"type"`
	KeyWord  bool      `json:"keyword"`
}

Token represents one occurrence of a term at a particular location in a field.

func (*Token) String

func (t *Token) String() string

type TokenFilter

type TokenFilter interface {
	Filter(TokenStream) TokenStream
}

A TokenFilter adds, transforms or removes tokens from a token stream.

type TokenMap

type TokenMap map[string]bool

func NewTokenMap

func NewTokenMap() TokenMap

func (TokenMap) AddToken

func (t TokenMap) AddToken(token string)

func (TokenMap) LoadBytes

func (t TokenMap) LoadBytes(data []byte) error

LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadFile

func (t TokenMap) LoadFile(filename string) error

LoadFile reads in a list of tokens from a text file, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadLine

func (t TokenMap) LoadLine(line string)

type TokenStream

type TokenStream []*Token

type TokenType

type TokenType int
const (
	AlphaNumeric TokenType = iota
	Ideographic
	Numeric
	DateTime
	Shingle
	Single
	Double
	Boolean
	IP
)

type Tokenizer

type Tokenizer interface {
	Tokenize([]byte) TokenStream
}

A Tokenizer splits an input string into tokens, the usual behaviour being to map words to tokens.

Directories

Path Synopsis
analyzer
web
char
datetime
iso
lang
ar
bg
ca
cjk
ckb
cs
da
de
el
en
Package en implements an analyzer with reasonable defaults for processing English text.
Package en implements an analyzer with reasonable defaults for processing English text.
es
eu
fa
fi
fr
ga
gl
hi
hr
hu
hy
id
in
it
nl
no
pl
pt
ro
ru
sv
tr
token
lowercase
Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.
Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.
stop
Package stop implements a TokenFilter removing tokens found in a TokenMap.
Package stop implements a TokenFilter removing tokens found in a TokenMap.
tokenizer
exception
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
web
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL