lm

package
v0.2.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 15, 2020 License: MIT Imports: 3 Imported by: 0

Documentation

Overview

Package lm provides interfaces and base implementation of word lemmatization.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func UnpackPosNyble

func UnpackPosNyble(nyble POSIdNyble) []nlpgo.POSId

Types

type Lemma

type Lemma struct {
	// Lemma value
	Val string
	// Optional list of possible Parts Of Speech for the word parsed.
	Pos []nlpgo.POSId
}

type LemmaAccumulator

type LemmaAccumulator map[string]map[nlpgo.POSId]struct{}

A map type to accumulate lemma candidates

func (LemmaAccumulator) Set

func (acc LemmaAccumulator) Set(lemma string, pp []nlpgo.POSId)

type LemmaIndex

type LemmaIndex struct {
	// contains filtered or unexported fields
}

LemmaIndex is a default implementation of LmChecker

func NewLemmaIndex

func NewLemmaIndex(data map[string][]nlpgo.POSId) LemmaIndex

type Lemmatizer

type Lemmatizer struct {
	// contains filtered or unexported fields
}

Lemmatizer type implements lemmatization

func NewLemmatizer

func NewLemmatizer(lkpr LmChecker, resolvers []LmResolver, opts ...LmOption) *Lemmatizer

func (Lemmatizer) LemmaCandidates

func (l Lemmatizer) LemmaCandidates(word string, max int) (candidates []Lemma)

LemmaCandidates returns up to `max` lemma candidates for the given word. The order of the candidates in the returning array depends on:

  • the order of the resolvers passed in to the Lemmatizer constructor
  • the internal policy of each resolver

func (Lemmatizer) Lemmatize

func (l Lemmatizer) Lemmatize(word string) Lemma

Lemmatize returns the first resolved lemma

type LmChecker

type LmChecker interface {
	// contains filtered or unexported methods
}

LmChecker provides an interface to check if a lemma exist. What is considered to be lemma is implementation specific.

type LmOption

type LmOption func(*Lemmatizer)

LmOption defines a functional option type for the Lemmatizer

type LmResolver

type LmResolver interface {
	// Resolves lemmata and adds them to acc. It should stop resolving if total
	// acc size >= max
	Resolve(word string, acc LemmaAccumulator, max int)
}

LmResolver is a way to apply a strategy to the lemmatization process

func NewExceptionResolver

func NewExceptionResolver(excidx map[string][]Lemma) LmResolver

func NewSuffixRuleResolver

func NewSuffixRuleResolver(rules []Rule, lc LmChecker) LmResolver

type POSIdNyble

type POSIdNyble = uint32

POSIdNyble can be used to pack up to 4 PosId values (1 byte each) as a uint32 (4 bytes) value. This allows to compact up to 4 POSId items (4 bytes) vs [4]POSId slice (24 + 4 bytes).

func PackPosNyble

func PackPosNyble(ps []nlpgo.POSId) POSIdNyble

PackPosNyble is used to put slice of up to 4 POS elements to the PosIdNyble value. If len(ps) > 4 then only the first 4 elements will be packed.

type Rule

type Rule struct {
	Affix string
	// Tag(s) to identify the inflection form
	Pos        []nlpgo.POSId
	Transforms []RuleTransform
}

Rule defines an affix-related attributes to restore a lemma from a word. Currently only suffix-based rules are supported.

type RuleTransform

type RuleTransform struct {
	// Number of chars to discard
	Cutoff int
	// A compensative affix to augment to a word after Affix detaching
	// [optional]
	Augment string
	// Minimal word length used as a threshold trigger to apply the rule.
	// [optional] (ignored if zero value).
	MinValidLen int
	// A regexp to validate the word before applying the affix detaching.
	// [optional]
	ReBefore *regexp.Regexp
	// A regexp to validate the lemma candidate after Affix detaching
	// [optional]
	ReAfter *regexp.Regexp
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL