Documentation ¶
Overview ¶
package gnmatcher provides the main use-case of the project, which is matching of possible name-strings to scientific names registered in a variety of biodiversity databases.
The goal of the project is to return matched canonical forms of scientific names by tens of thousands a second, making it possible to work with hundreds of millions/billions of name-string matching events.
The package is intended to be used by long-running services because it takes a few seconds/minutes to initialize its lookup data structures.
Example ¶
package main import ( "fmt" "github.com/gnames/gnmatcher" "github.com/gnames/gnmatcher/config" "github.com/gnames/gnmatcher/io/bloom" "github.com/gnames/gnmatcher/io/trie" "github.com/gnames/gnmatcher/io/virusio" ) func main() { // Note that it takes several minutes to initialize lookup data structures. // Requirement for initialization: Postgresql database with loaded // http://opendata.globalnames.org/dumps/gnames-latest.sql.gz // // If data are imported already, it still takes several seconds to // load lookup data into memory. cfg := config.New() em := bloom.New(cfg) fm := trie.New(cfg) vm := virusio.New(cfg) gnm := gnmatcher.New(em, fm, vm, cfg) res := gnm.MatchNames([]string{"Pomatomus saltator", "Pardosa moesta"}) for _, match := range res { fmt.Println(match.Name) fmt.Println(match.MatchType) for _, item := range match.MatchItems { fmt.Println(item.MatchStr) fmt.Println(item.EditDistance) } } }
Output:
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // Version of the gnmatcher. When make runs, it automatically // sets the variable using git tags and hashes. Version = "v0.9.2+" // Build timestamp. When make runs, it automatically sets the variable. Build = "n/a" )
Functions ¶
This section is empty.
Types ¶
type GNmatcher ¶ added in v0.5.4
type GNmatcher interface { // MatchNames takes a slice of scientific name-strings and returns back // matches to canonical forms of known scientific names. The following // matches are attempted: // - Exact string match for viruses // - Exact match of the name-string's canonical form // - Fuzzy match of the canonical form // - Partial match of the canonical form where the middle parts of the name // or last elements of the name are removed. // - Partial fuzzy match of the canonical form. // // In case if a name is determined as a "virus" (a non-celular entity like // virus, prion, plasmid etc.), It is not matched, and returned back // to be found in a database. // // The resulting output does provide canonical forms, but not the sources // where they are registered. MatchNames(names []string) []mlib.Match // GetConfig provides configuration object of GNmatcher. GetConfig() config.Config // GetVersion returns version number and build timestamp. GetVersion() gnvers.Version }
GNmatcher is a public API to the project functionality.
func New ¶ added in v0.6.1
func New( em exact.ExactMatcher, fm fuzzy.FuzzyMatcher, vm virus.VirusMatcher, cfg config.Config, ) GNmatcher
New is a constructor for GNmatcher interface. It takes two interfaces ExactMatcher and FuzzyMatcher.
Directories ¶
Path | Synopsis |
---|---|
package config contains information needed to run gnmatcher project.
|
package config contains information needed to run gnmatcher project. |
ent
|
|
exact
package exact contains interface for exact-matching strings to known scientific names.
|
package exact contains interface for exact-matching strings to known scientific names. |
fuzzy
package fuzzy contains interfaces and code to facilitate fuzzy-matching of name-strings to scientific names collected in gnames database.
|
package fuzzy contains interfaces and code to facilitate fuzzy-matching of name-strings to scientific names collected in gnames database. |
matcher
package matcher is the central processing unit for matching name-strings to known scientific names.
|
package matcher is the central processing unit for matching name-strings to known scientific names. |
virus
package virus contains an interface for matching strings to names of viruses, plasmids, prions and other non-cellular entities.
|
package virus contains an interface for matching strings to names of viruses, plasmids, prions and other non-cellular entities. |
package main provides an CLI interface to http service to run gnmatcher functionality.
|
package main provides an CLI interface to http service to run gnmatcher functionality. |
io
|
|
bloom
package bloom creates and serves bloom filters for stemmed canonical names, and names of viruses.
|
package bloom creates and serves bloom filters for stemmed canonical names, and names of viruses. |
dbase
package dbase provides convenience methods for accessing PostgreSQL database.
|
package dbase provides convenience methods for accessing PostgreSQL database. |
rest
package rest provides http REST interface to gnmatcher functionality.
|
package rest provides http REST interface to gnmatcher functionality. |
trie
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names.
|
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names. |
The purpose of this script is to find out how fast algorithms can go through a list of 100_000 names.
|
The purpose of this script is to find out how fast algorithms can go through a list of 100_000 names. |