Documentation
¶
Overview ¶
Package lm provides interfaces and base implementation of word lemmatization.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func UnpackPosNyble ¶
func UnpackPosNyble(nyble POSIdNyble) []nlpgo.POSId
Types ¶
type LemmaAccumulator ¶
A map type to accumulate lemma candidates
type LemmaIndex ¶
type LemmaIndex struct {
// contains filtered or unexported fields
}
LemmaIndex is a default implementation of LmChecker
func NewLemmaIndex ¶
func NewLemmaIndex(data map[string][]nlpgo.POSId) LemmaIndex
type Lemmatizer ¶
type Lemmatizer struct {
// contains filtered or unexported fields
}
Lemmatizer type implements lemmatization
func NewLemmatizer ¶
func NewLemmatizer(lkpr LmChecker, resolvers []LmResolver, opts ...LmOption) *Lemmatizer
func (Lemmatizer) LemmaCandidates ¶
func (l Lemmatizer) LemmaCandidates(word string, max int) (candidates []Lemma)
LemmaCandidates returns up to `max` lemma candidates for the given word. The order of the candidates in the returning array depends on:
- the order of the resolvers passed in to the Lemmatizer constructor
- the internal policy of each resolver
func (Lemmatizer) Lemmatize ¶
func (l Lemmatizer) Lemmatize(word string) Lemma
Lemmatize returns the first resolved lemma
type LmChecker ¶
type LmChecker interface {
// contains filtered or unexported methods
}
LmChecker provides an interface to check if a lemma exist. What is considered to be lemma is implementation specific.
type LmOption ¶
type LmOption func(*Lemmatizer)
LmOption defines a functional option type for the Lemmatizer
type LmResolver ¶
type LmResolver interface { // Resolves lemmata and adds them to acc. It should stop resolving if total // acc size >= max Resolve(word string, acc LemmaAccumulator, max int) }
LmResolver is a way to apply a strategy to the lemmatization process
func NewExceptionResolver ¶
func NewExceptionResolver(excidx map[string][]Lemma) LmResolver
func NewSuffixRuleResolver ¶
func NewSuffixRuleResolver(rules []Rule, lc LmChecker) LmResolver
type POSIdNyble ¶
type POSIdNyble = uint32
POSIdNyble can be used to pack up to 4 PosId values (1 byte each) as a uint32 (4 bytes) value. This allows to compact up to 4 POSId items (4 bytes) vs [4]POSId slice (24 + 4 bytes).
func PackPosNyble ¶
func PackPosNyble(ps []nlpgo.POSId) POSIdNyble
PackPosNyble is used to put slice of up to 4 POS elements to the PosIdNyble value. If len(ps) > 4 then only the first 4 elements will be packed.
type Rule ¶
type Rule struct { Affix string // Tag(s) to identify the inflection form Pos []nlpgo.POSId Transforms []RuleTransform }
Rule defines an affix-related attributes to restore a lemma from a word. Currently only suffix-based rules are supported.
type RuleTransform ¶
type RuleTransform struct { // Number of chars to discard Cutoff int // A compensative affix to augment to a word after Affix detaching // [optional] Augment string // Minimal word length used as a threshold trigger to apply the rule. // [optional] (ignored if zero value). MinValidLen int // A regexp to validate the word before applying the affix detaching. // [optional] ReBefore *regexp.Regexp // A regexp to validate the lemma candidate after Affix detaching // [optional] ReAfter *regexp.Regexp }