analysis

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 24, 2023 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package analysis represents API to convert text into indexable/searchable tokens

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Token

type Token = string

Token is a string with an assigned and thus identified meaning

type TokenFilter

type TokenFilter interface {
	// Filter filters the given list with described behaviour
	Filter(list []Token) []Token
}

TokenFilter is responsible for removing, modifiying and altering the given token flow

func NewEnglishStemmerFilter

func NewEnglishStemmerFilter() TokenFilter

NewEnglishStemmerFilter creates a new stemmer for English language.

func NewNormalizerFilter

func NewNormalizerFilter(chars alphabet.Alphabet, pad string) TokenFilter

NewNormalizerFilter returns tokens filter

func NewRussianStemmerFilter

func NewRussianStemmerFilter() TokenFilter

NewRussianStemmerFilter creates a new stemmer for Russian language.

type Tokenizer

type Tokenizer interface {
	// Splits the given text on a sequence of tokens
	Tokenize(text string) []Token
}

Tokenizer performs splitting the given text on a sequence of tokens

func NewFilterTokenizer

func NewFilterTokenizer(tokenizer Tokenizer, filter TokenFilter) Tokenizer

NewFilterTokenizer creates a new instance of filter tokenizer

func NewNGramTokenizer

func NewNGramTokenizer(nGramSize int) Tokenizer

NewNGramTokenizer creates a new instance of Tokenizer

func NewWordTokenizer

func NewWordTokenizer(alphabet alphabet.Alphabet) Tokenizer

NewWordTokenizer creates a new instance of Tokenizer

func NewWrapTokenizer

func NewWrapTokenizer(tokenizer Tokenizer, start, end string) Tokenizer

NewWrapTokenizer returns a tokenizer that performs wrap the provided text before tokenization

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL