Documentation ¶
Overview ¶
package gnmatcher provides the main use-case of the project, which is matching of possible name-strings to scientific names registered in a variety of biodiversity databases.
The goal of the project is to return matched canonical forms of scientific names by tens of thousands a second, making it possible to work with hundreds of millions/billions of name-string matching events.
The package is intended to be used by long-running services because it takes a few seconds/minutes to initialize its lookup data structures.
Example ¶
package main import ( "fmt" gnmatcher "github.com/gnames/gnmatcher/pkg" "github.com/gnames/gnmatcher/pkg/config" "github.com/gnames/gnmatcher/pkg/io/bloom" "github.com/gnames/gnmatcher/pkg/io/trie" "github.com/gnames/gnmatcher/pkg/io/virusio" ) func main() { // Note that it takes several minutes to initialize lookup data structures. // Requirement for initialization: Postgresql database with loaded // http://opendata.globalnames.org/dumps/gnames-latest.sql.gz // // If data are imported already, it still takes several seconds to // load lookup data into memory. cfg := config.New() em := bloom.New(cfg) fm := trie.New(cfg) vm := virusio.New(cfg) gnm := gnmatcher.New(em, fm, vm, cfg) res := gnm.MatchNames([]string{"Pomatomus saltator", "Pardosa moesta"}) for _, match := range res.Matches { fmt.Println(match.Name) fmt.Println(match.MatchType) for _, item := range match.MatchItems { fmt.Println(item.MatchStr) fmt.Println(item.EditDistance) } } }
Output:
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // Version of the gnmatcher. When make runs, it automatically // sets the variable using git tags and hashes. Version = "v1.1.4" // Build timestamp. When make runs, it automatically sets the variable. Build = "n/a" )
Functions ¶
This section is empty.
Types ¶
type GNmatcher ¶
type GNmatcher interface { // MatchNames takes a slice of scientific name-strings with options and // returns back matches to canonical forms of known scientific names. The // following matches are attempted: // - Exact string match for viruses // - Exact match of the name-string's canonical form // - Fuzzy match of the canonical form // - Partial match of the canonical form where the middle parts of the name // or last elements of the name are removed. // - Partial fuzzy match of the canonical form. // // In case if a name is determined as a "virus" (a non-celular entity like // virus, prion, plasmid etc.), It is not matched, and returned back // to be found in a database. // // The resulting output does provide canonical forms, but not the sources // where they are registered. MatchNames(names []string, opts ...config.Option) mlib.Output // GetConfig provides configuration object of GNmatcher. GetConfig() config.Config // GetVersion returns version number and build timestamp. GetVersion() gnvers.Version }
GNmatcher is a public API to the project functionality.
func New ¶
func New( em exact.ExactMatcher, fm fuzzy.FuzzyMatcher, vm virus.VirusMatcher, cfg config.Config, ) GNmatcher
New is a constructor for GNmatcher interface. It takes two interfaces ExactMatcher and FuzzyMatcher.
Directories ¶
Path | Synopsis |
---|---|
package config contains information needed to run gnmatcher project.
|
package config contains information needed to run gnmatcher project. |
io
|
|
bloom
package bloom creates and serves bloom filters for stemmed canonical names, and names of viruses.
|
package bloom creates and serves bloom filters for stemmed canonical names, and names of viruses. |
dbase
package dbase provides convenience methods for accessing PostgreSQL database.
|
package dbase provides convenience methods for accessing PostgreSQL database. |
rest
package rest provides http REST interface to gnmatcher functionality.
|
package rest provides http REST interface to gnmatcher functionality. |
trie
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names.
|
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names. |