lm

package

v0.2.6 Latest Latest Go to latest Published: Dec 15, 2020 License: MIT Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/timurgarif/nlpgo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package lm provides interfaces and base implementation of word lemmatization.

Index ¶

func UnpackPosNyble(nyble POSIdNyble) []nlpgo.POSId
type Lemma
type LemmaAccumulator
- func (acc LemmaAccumulator) Set(lemma string, pp []nlpgo.POSId)
type LemmaIndex
- func NewLemmaIndex(data map[string][]nlpgo.POSId) LemmaIndex
type Lemmatizer
- func NewLemmatizer(lkpr LmChecker, resolvers []LmResolver, opts ...LmOption) *Lemmatizer
- func (l Lemmatizer) LemmaCandidates(word string, max int) (candidates []Lemma)
- func (l Lemmatizer) Lemmatize(word string) Lemma
type LmChecker
type LmOption
type LmResolver
- func NewExceptionResolver(excidx map[string][]Lemma) LmResolver
- func NewSuffixRuleResolver(rules []Rule, lc LmChecker) LmResolver
type POSIdNyble
- func PackPosNyble(ps []nlpgo.POSId) POSIdNyble
type Rule
type RuleTransform

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func UnpackPosNyble ¶

func UnpackPosNyble(nyble POSIdNyble) []nlpgo.POSId

Types ¶

type Lemma ¶

type Lemma struct {
	// Lemma value
	Val string
	// Optional list of possible Parts Of Speech for the word parsed.
	Pos []nlpgo.POSId
}

type LemmaAccumulator ¶

type LemmaAccumulator map[string]map[nlpgo.POSId]struct{}

A map type to accumulate lemma candidates

func (LemmaAccumulator) Set ¶

func (acc LemmaAccumulator) Set(lemma string, pp []nlpgo.POSId)

type LemmaIndex ¶

type LemmaIndex struct {
	// contains filtered or unexported fields
}

LemmaIndex is a default implementation of LmChecker

func NewLemmaIndex ¶

func NewLemmaIndex(data map[string][]nlpgo.POSId) LemmaIndex

type Lemmatizer ¶

type Lemmatizer struct {
	// contains filtered or unexported fields
}

Lemmatizer type implements lemmatization

func NewLemmatizer ¶

func NewLemmatizer(lkpr LmChecker, resolvers []LmResolver, opts ...LmOption) *Lemmatizer

func (Lemmatizer) LemmaCandidates ¶

func (l Lemmatizer) LemmaCandidates(word string, max int) (candidates []Lemma)

LemmaCandidates returns up to `max` lemma candidates for the given word. The order of the candidates in the returning array depends on:

the order of the resolvers passed in to the Lemmatizer constructor
the internal policy of each resolver

func (Lemmatizer) Lemmatize ¶

func (l Lemmatizer) Lemmatize(word string) Lemma

Lemmatize returns the first resolved lemma

type LmChecker ¶

type LmChecker interface {
	// contains filtered or unexported methods
}

LmChecker provides an interface to check if a lemma exist. What is considered to be lemma is implementation specific.

type LmOption ¶

type LmOption func(*Lemmatizer)

LmOption defines a functional option type for the Lemmatizer

type LmResolver ¶

type LmResolver interface {
	// Resolves lemmata and adds them to acc. It should stop resolving if total
	// acc size >= max
	Resolve(word string, acc LemmaAccumulator, max int)
}

LmResolver is a way to apply a strategy to the lemmatization process

func NewExceptionResolver ¶

func NewExceptionResolver(excidx map[string][]Lemma) LmResolver

func NewSuffixRuleResolver ¶

func NewSuffixRuleResolver(rules []Rule, lc LmChecker) LmResolver

type POSIdNyble ¶

type POSIdNyble = uint32

POSIdNyble can be used to pack up to 4 PosId values (1 byte each) as a uint32 (4 bytes) value. This allows to compact up to 4 POSId items (4 bytes) vs [4]POSId slice (24 + 4 bytes).

func PackPosNyble ¶

func PackPosNyble(ps []nlpgo.POSId) POSIdNyble

PackPosNyble is used to put slice of up to 4 POS elements to the PosIdNyble value. If len(ps) > 4 then only the first 4 elements will be packed.

type Rule ¶

type Rule struct {
	Affix string
	// Tag(s) to identify the inflection form
	Pos        []nlpgo.POSId
	Transforms []RuleTransform
}

Rule defines an affix-related attributes to restore a lemma from a word. Currently only suffix-based rules are supported.

type RuleTransform ¶

type RuleTransform struct {
	// Number of chars to discard
	Cutoff int
	// A compensative affix to augment to a word after Affix detaching
	// [optional]
	Augment string
	// Minimal word length used as a threshold trigger to apply the rule.
	// [optional] (ignored if zero value).
	MinValidLen int
	// A regexp to validate the word before applying the affix detaching.
	// [optional]
	ReBefore *regexp.Regexp
	// A regexp to validate the lemma candidate after Affix detaching
	// [optional]
	ReAfter *regexp.Regexp
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL