Documentation ¶
Overview ¶
Package token deals with breaking a text into tokens. It cleans names broken by new lines, concatenating pieces together. Tokens are connected to properties. Properties are used for heuristic and Bayes' approaches for finding names.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewTokenSN ¶
NewTokenSN is a factory and a wrapper. It takes gner.TokenNER object and wraps into TokenSN interface.
func SetIndices ¶
func SetIndices(ts []TokenSN, d *dict.Dictionary)
SetIndices takes a slice of tokens that correspond to a name candidate. It analyses the tokens and sets Token.Indices according to feasibility of the input tokens to form a scientific name. It checks if there is a possible species, ranks, and infraspecies.
func UpperIndex ¶
UpperIndex takes an index of a token and length of the tokens slice and returns an upper index of what could be a slice of a name. We expect that that most of the names will fit into 5 words. Other cases would require more thorough algorithims that we can run later as plugins.
Types ¶
type Decision ¶
type Decision int
Decision definds possible kinds of name candidates.
const ( NotName Decision = iota Uninomial PossibleUninomial Binomial PossibleBinomial Trinomial BayesUninomial BayesBinomial BayesTrinomial )
Possible Decisions
func (Decision) Cardinality ¶
Cardinality returns number of elements in canonical form of a scientific name. If name is uninomial 1 is returned, for binomial 2, for trinomial 3.
type Features ¶
type Features struct { // IsCapitalized is true if the first rune that is letter, is capitalized. IsCapitalized bool // HasDash is true if token contains dash HasDash bool // HasStartParens is true if token start with '(' HasStartParens bool // HasEndParens is true if token ends with ')' HasEndParens bool // Abbr feature: token ends with a period. Abbr bool // PotentialBinomialGenus feature: the token might be a genus of name. PotentialBinomialGenus bool // StartsWithLetter feature: the token has necessary qualities to be a start // of a binomial species. It assumes to be low-case and be two letters or // more. StartsWithLetter bool // EndsWithLetter feature: the token has necessary quality to be a species // part of trinomial. EndsWithLetter bool // RankLike is true if token is a known infraspecific rank RankLike bool // UninomialDict defines which Genera or Uninomials dictionary (if any) // contained the token. UninomialDict dict.DictionaryType // SpeciesDict defines which Species dictionary (if any) contained the token. SpeciesDict dict.DictionaryType // GenSpInAmbigDict shows how many specific/infraspecific epithets of a putative // name matched bi-/tri- nomials in a full name dictionary for grey genera. // For example "Bubo bubo" name would set it to 1, and "Bubo bubo bubo" would // set it to 2. GenSpInAmbigDict int }
Features keep properties of a token as a possible candidate for a name part.
func (*Features) SetSpeciesDict ¶
func (p *Features) SetSpeciesDict(cleaned string, d *dict.Dictionary)
func (*Features) SetUninomialDict ¶
func (p *Features) SetUninomialDict(cleaned string, d *dict.Dictionary)
type NLP ¶
type NLP struct { // Odds are posterior odds. Odds float64 // ClassCases is used to calculate prior odds of names appearing in a // document. ClassCases map[feature.Class]int // OddsDetails are used for calculating final odds for detected names and // for displaying results in the output OddsDetails }
NLP collects data received from Bayes' algorithm
type OddsDetails ¶
func NewOddsDetails ¶
func NewOddsDetails(odds posterior.Odds) OddsDetails
func (OddsDetails) MarshalJSON ¶ added in v0.17.0
func (od OddsDetails) MarshalJSON() ([]byte, error)