matcher

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 28, 2020 License: MIT Imports: 18 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// GNUUID is a UUID seed made from 'globalnames.org' domain to generate
	// UUIDv5 identifiers.
	GNUUID = uuid.NewV5(uuid.NamespaceDNS, "globalnames.org")
)

Functions

func DetectAbbreviated

func DetectAbbreviated(parsed *pb.Parsed) *protob.Result

DetectAbbreviated checks if parsed name is abbreviated. If name is not abbreviated the function returns nil. If it is abbreviated, it returns result with the MatchType 'NONE'.

Types

type MatchResult added in v0.3.0

type MatchResult struct {
	Index  int
	Result *protob.Result
}

type MatchTask added in v0.3.0

type MatchTask struct {
	Index int
	Name  string
}

MatchTask contains a name to be matched and an index where it should be located in an array.

type Matcher

type Matcher struct {
	Config  config.Config
	Filters *bloom.Filters
	Trie    *levenshtein.MinTree
}

Matcher contains data and functions necessary for exact, fuzzy and partial matching of scientific names.

func NewMatcher

func NewMatcher(cnf config.Config) Matcher

NewMatcher creates a new instance of Matcher struct.

func (Matcher) Match

func (m Matcher) Match(ns NameString) *protob.Result

Match tries to match a canonical form of a name-string exactly to canonical from from gnames database.

func (Matcher) MatchFuzzy

func (m Matcher) MatchFuzzy(name, stem string,
	ns NameString, kv *badger.DB) *protob.Result

MatchFuzzy tries to do fuzzy matchin of a stemmed name-string to canonical forms from the gnames database.

func (Matcher) MatchPartial

func (m Matcher) MatchPartial(ns NameString, kv *badger.DB) *protob.Result

MatchPartial tries to match all patial variants of a name-string. The process stops as soon as a match was found.

func (Matcher) MatchVirus

func (m Matcher) MatchVirus(ns NameString) *protob.Result

MatchVirus tries to match a name-string exactly to a virus name from the gnames database.

func (Matcher) MatchWorker added in v0.3.0

func (m Matcher) MatchWorker(chIn <-chan MatchTask,
	chOut chan<- MatchResult, wg *sync.WaitGroup, kv *badger.DB)

MatchWorker takes name-strings from chIn channel, matches them and sends results to chOut channel.

type Multinomial

type Multinomial struct {
	// Tail is genus + the last epithet.
	Tail string
	// Head is the name without the last epithet.
	Head string
}

Multinomial contains multinomial names that were constructed from an 'infraspecific' name-string.

type NameString

type NameString struct {
	// ID is UUID v5 generated from the verbatim name-string.
	ID string
	// Name is a verbatim name-string.
	Name string
	// Cardinality is the apparent number of elemenents in a name. Uninomial
	// corresponds to cardinality 1, bionmial to 2, trinomial to 3 etc.
	Cardinality int
	// Canonical is the simplest most common version of a canonical form of
	// a name string.
	Canonical string
	// CanonicalID is UUID v5 generated from the Canonical field.
	CanonicalID string
	// CanonicalFull is a canonical form that also contains infraspecific ranks
	// and hybrid signes for named hybrids names.
	CanonicalFull string
	// CanonicalFullID is UUID v5 generated from the CanonicalFullID field.
	CanonicalFullID string
	// Canonical Stem is version of the Canonical field with suffixes removed
	// and characters substituted according to rules of Latin grammar.
	CanonicalStem string
	// Partial contains truncated versions of Canonical form. It is important
	// for matching names that could not be matched for all specific epithets.
	Partial *Partial
}

NameString stores input data for doing exact, fuzzy, exact partial, and fuzzy partial matching. It is created by parsing a name-string and storing its semantic elements.

func NewNameString

func NewNameString(parser gnparser.GNparser,
	name string) (NameString, *pb.Parsed)

NewNameString creates a new instance of NameString.

type Partial

type Partial struct {
	// Genus is a truncated canonical form with all specific epithets removed.
	Genus string
	// Multinomials are truncated canonical forms where one or more specific
	// epithets removed.
	Multinomials []Multinomial
}

Partial stores truncated version of a 'canonical' name-string.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL