analysis

package

v0.1.2 Latest Latest Go to latest Published: Nov 24, 2023 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Package analysis represents API to convert text into indexable/searchable tokens

This section is empty.

This section is empty.

This section is empty.

type Token = string

Token is a string with an assigned and thus identified meaning

type TokenFilter interface {
	// Filter filters the given list with described behaviour
	Filter(list []Token) []Token
}

TokenFilter is responsible for removing, modifiying and altering the given token flow

func NewEnglishStemmerFilter() TokenFilter

NewEnglishStemmerFilter creates a new stemmer for English language.

func NewNormalizerFilter(chars alphabet.Alphabet, pad string) TokenFilter

NewNormalizerFilter returns tokens filter

func NewRussianStemmerFilter() TokenFilter

NewRussianStemmerFilter creates a new stemmer for Russian language.

type Tokenizer interface {
	// Splits the given text on a sequence of tokens
	Tokenize(text string) []Token
}

Tokenizer performs splitting the given text on a sequence of tokens

func NewFilterTokenizer(tokenizer Tokenizer, filter TokenFilter) Tokenizer

NewFilterTokenizer creates a new instance of filter tokenizer

func NewNGramTokenizer(nGramSize int) Tokenizer

NewNGramTokenizer creates a new instance of Tokenizer

func NewWordTokenizer(alphabet alphabet.Alphabet) Tokenizer

NewWordTokenizer creates a new instance of Tokenizer

func NewWrapTokenizer(tokenizer Tokenizer, start, end string) Tokenizer

NewWrapTokenizer returns a tokenizer that performs wrap the provided text before tokenization

Path	Synopsis
en
ru