token

package

v1.0.0 Latest Latest Go to latest Published: Aug 24, 2022 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/gnames/gnfinder

Links

Open Source Insights

Documentation ¶

Overview ¶

Package token deals with breaking a text into tokens. It cleans names broken by new lines, concatenating pieces together. Tokens are connected to properties. Properties are used for heuristic and Bayes' approaches for finding names.

Index ¶

func NewTokenSN(token gner.TokenNER) gner.TokenNER
func SetIndices(ts []TokenSN, d *dict.Dictionary)
func UpperIndex(i int, l int) int
type Decision
type Features
type Indices
type NLP
type OddsDetails
- func NewOddsDetails(odds posterior.Odds) OddsDetails
- func (od OddsDetails) MarshalJSON() ([]byte, error)
type TokenSN
- func Tokenize(text []rune) []TokenSN

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewTokenSN ¶

func NewTokenSN(token gner.TokenNER) gner.TokenNER

NewTokenSN is a factory and a wrapper. It takes gner.TokenNER object and wraps into TokenSN interface.

func SetIndices ¶

func SetIndices(ts []TokenSN, d *dict.Dictionary)

SetIndices takes a slice of tokens that correspond to a name candidate. It analyses the tokens and sets Token.Indices according to feasibility of the input tokens to form a scientific name. It checks if there is a possible species, ranks, and infraspecies.

func UpperIndex ¶

func UpperIndex(i int, l int) int

UpperIndex takes an index of a token and length of the tokens slice and returns an upper index of what could be a slice of a name. We expect that that most of the names will fit into 5 words. Other cases would require more thorough algorithims that we can run later as plugins.

Types ¶

type Decision ¶

type Decision int

Decision definds possible kinds of name candidates.

const (
	NotName Decision = iota
	Uninomial
	PossibleUninomial
	Binomial
	PossibleBinomial
	Trinomial
	BayesUninomial
	BayesBinomial
	BayesTrinomial
)

Possible Decisions

func (Decision) Cardinality ¶

func (d Decision) Cardinality() int

Cardinality returns number of elements in canonical form of a scientific name. If name is uninomial 1 is returned, for binomial 2, for trinomial 3.

func (Decision) In ¶

func (d Decision) In(ds ...Decision) bool

In returns true if a Decision is included in given constants.

func (Decision) String ¶

func (d Decision) String() string

String representation of a Decision

type Features ¶

type Features struct {
	// IsCapitalized is true if the first rune that is letter, is capitalized.
	IsCapitalized bool

	// HasDash is true if token contains dash
	HasDash bool

	// HasStartParens is true if token start with '('
	HasStartParens bool

	// HasEndParens is true if token ends with ')'
	HasEndParens bool

	// Abbr feature: token ends with a period.
	Abbr bool

	// PotentialBinomialGenus feature: the token might be a genus of name.
	PotentialBinomialGenus bool

	// StartsWithLetter feature: the token has necessary qualities to be a start
	// of a binomial species. It assumes to be low-case and be two letters or
	// more.
	StartsWithLetter bool

	// EndsWithLetter feature: the token has necessary quality to be a species
	// part of trinomial.
	EndsWithLetter bool

	// RankLike is true if token is a known infraspecific rank
	RankLike bool

	// UninomialDict defines which Genera or Uninomials dictionary (if any)
	// contained the token.
	UninomialDict dict.DictionaryType

	// SpeciesDict defines which Species dictionary (if any) contained the token.
	SpeciesDict dict.DictionaryType

	// GenSpInAmbigDict shows how many specific/infraspecific epithets of a putative
	// name matched bi-/tri- nomials in a full name dictionary for grey genera.
	// For example "Bubo bubo" name would set it to 1, and "Bubo bubo bubo" would
	// set it to 2.
	GenSpInAmbigDict int
}

Features keep properties of a token as a possible candidate for a name part.

func (*Features) SetRank ¶

func (p *Features) SetRank(raw string, d *dict.Dictionary)

func (*Features) SetSpeciesDict ¶

func (p *Features) SetSpeciesDict(cleaned string, d *dict.Dictionary)

func (*Features) SetUninomialDict ¶

func (p *Features) SetUninomialDict(cleaned string, d *dict.Dictionary)

type Indices ¶

type Indices struct {
	Species      int
	Rank         int
	Infraspecies int
}

Indices of the elmements for a name candidate.

type NLP ¶

type NLP struct {
	// Odds are posterior odds.
	Odds float64

	// ClassCases is used to calculate prior odds of names appearing in a
	// document.
	ClassCases map[feature.Class]int

	// OddsDetails are used for calculating final odds for detected names and
	// for displaying results in the output
	OddsDetails
}

NLP collects data received from Bayes' algorithm

type OddsDetails ¶

type OddsDetails map[string]float64

func NewOddsDetails ¶

func NewOddsDetails(odds posterior.Odds) OddsDetails

func (OddsDetails) MarshalJSON ¶ added in v0.17.0

func (od OddsDetails) MarshalJSON() ([]byte, error)

type TokenSN ¶

type TokenSN interface {
	gner.TokenNER
	Features() *Features
	NLP() *NLP
	Indices() *Indices
	Decision() Decision
	SetDecision(d Decision)
}

func Tokenize ¶

func Tokenize(text []rune) []TokenSN

Tokenize creates a slice containing every word in the document tokenized.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL