token

package

v0.10.0 Latest Latest Go to latest Published: Apr 24, 2020 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/gnames/gnfinder

Links

Open Source Insights

Documentation ¶

Overview ¶

Package token deals with breaking a text into tokens. It cleans names broken by new lines, concatenating pieces together. Tokens are connected to features. Features are used for heuristic and Bayes' approaches for finding names.

Index ¶

func SetIndices(ts []Token, d *dict.Dictionary)
func UpperIndex(i int, l int) int
type Decision
type Features
type Indices
type NLP
type OddsDetails
- func NewOddsDetails(l bayes.Likelihoods) OddsDetails
type Token
- func NewToken(text []rune, start int, end int) Token
- func Tokenize(text []rune) []Token

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func SetIndices ¶

func SetIndices(ts []Token, d *dict.Dictionary)

SetIndices takes a slice of tokens that correspond to a name candidate. It analyses the tokens and sets Token.Indices according to feasibility of the input tokens to form a scientific name. It checks if there is a possible species, ranks, and infraspecies.

func UpperIndex ¶ added in v0.8.4

func UpperIndex(i int, l int) int

UpperIndex takes an index of a token and length of the tokens slice and returns an upper index of what could be a slice of a name. We expect that that most of the names will fit into 5 words. Other cases would require more thorough algorithims that we can run later as plugins.

Types ¶

type Decision ¶

type Decision int

Decision definds possible kinds of name candidates.

const (
	NotName Decision = iota
	Uninomial
	Binomial
	PossibleBinomial
	Trinomial
	BayesUninomial
	BayesBinomial
	BayesTrinomial
)

Possible Decisions

func (Decision) Cardinality ¶

func (d Decision) Cardinality() int

Cardinality returns number of elements in canonical form of a scientific name. If name is uninomial 1 is returned, for binomial 2, for trinomial 3.

func (Decision) In ¶

func (d Decision) In(ds ...Decision) bool

In returns true if a Decision is included in given constants.

func (Decision) String ¶

func (d Decision) String() string

String representation of a Decision

type Features ¶

type Features struct {
	// Candidate to be a start of a uninomial or binomial.
	NameStartCandidate bool
	// The name looks like a possible genus name.
	PotentialBinomialGenus bool
	// The token has necessary qualities to be a start of a binomial.
	StartsWithLetter bool
	// The token has necessary quality to be a species part of trinomial.
	EndsWithLetter bool
	// Capitalized feature of the first alphabetic character.
	Capitalized bool
	// CapitalizedSpecies -- the first species lphabetic character is capitalized.
	CapitalizedSpecies bool
	// HasDash -- information if '-' character is part of the word
	HasDash bool
	// ParensEnd feature: token starts with parentheses.
	ParensStart bool
	// ParensEnd feature: token ends with parentheses.
	ParensEnd bool
	// ParensEndSpecies feature: species token ends with parentheses.
	ParensEndSpecies bool
	// Abbr feature: token ends with a period.
	Abbr bool
	// RankLike is true if token is a known infraspecific rank
	RankLike bool
	// UninomialDict defines which Genera or Uninomials dictionary (if any)
	// contained the token.
	UninomialDict dict.DictionaryType
	// SpeciesDict defines which Species dictionary (if any) contained the token.
	SpeciesDict dict.DictionaryType
}

Features keep properties of a token as a possible candidate for a name part.

type Indices ¶

type Indices struct {
	Species      int
	Rank         int
	Infraspecies int
}

Indices of the elmements for a name candidate.

type NLP ¶

type NLP struct {
	// Odds are posterior odds.
	Odds float64
	// OddsDetails are elements from which Odds are calculated.
	OddsDetails
	// LabelFreq is used to calculate prior odds of names appearing in a
	// document
	LabelFreq bayes.LabelFreq
}

NLP collects data received from Bayes' algorithm

type OddsDetails ¶

type OddsDetails map[string]map[bayes.FeatureName]map[bayes.FeatureValue]float64

OddsDetails are elements from which Odds are calculated

func NewOddsDetails ¶

func NewOddsDetails(l bayes.Likelihoods) OddsDetails

type Token ¶

type Token struct {
	// Raw is a verbatim presentation of a token as it appears in a text.
	Raw []rune
	// Cleaned is a presentation of a token after normalization.
	Cleaned string
	// Start is the index of the first rune of a token. The first rune
	// does not have to be alpha-numeric.
	Start int
	// End is the index of the last rune of a token. The last rune does not
	// have to be alpha-numeric.
	End int
	// Decision tags the first token of a possible name with a classification
	// decision.
	Decision
	// Indices of semantic elements of a possible name.
	Indices
	// NLP data
	NLP
	// Features is a collection of features associated with the token
	Features
}

Token represents a word separated by spaces in a text. Words split by new lines are concatenated.

func NewToken ¶

func NewToken(text []rune, start int, end int) Token

NewToken constructs a new Token object.

func Tokenize ¶

func Tokenize(text []rune) []Token

Tokenize creates a slice containing every word in the document tokenized.

func (*Token) Clean ¶

func (t *Token) Clean()

Clean converts a verbatim (Raw) string of a token into normalized cleaned up version.

func (*Token) InParentheses ¶

func (t *Token) InParentheses() bool

InParentheses is true if token is surrounded by parentheses.

func (*Token) SetRank ¶

func (t *Token) SetRank(d *dict.Dictionary)

func (*Token) SetSpeciesDict ¶

func (t *Token) SetSpeciesDict(d *dict.Dictionary)

func (*Token) SetUninomialDict ¶

func (t *Token) SetUninomialDict(d *dict.Dictionary)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL