Documentation ¶
Overview ¶
package gnmatcher provides the main use-case of the project, which is matching of possible name-strings to scientific names registered in a variety of biodiversity databases.
The goal of the project is to return matched canonical forms of scientific names by tens of thousands a second, making it possible to work with hundreds of millions/billions of name-string matching events.
The package is intended to be used by long-running services because it takes a few seconds/minutes to initialize its lookup data structures.
Example ¶
package main import ( "fmt" "github.com/gnames/gnmatcher/internal/io/bloom" "github.com/gnames/gnmatcher/internal/io/trie" "github.com/gnames/gnmatcher/internal/io/virusio" gnmatcher "github.com/gnames/gnmatcher/pkg" "github.com/gnames/gnmatcher/pkg/config" ) func main() { // Note that it takes several minutes to initialize lookup data structures. // Requirement for initialization: Postgresql database with loaded // http://opendata.globalnames.org/dumps/gnames-latest.sql.gz // // If data are imported already, it still takes several seconds to // load lookup data into memory. cfg := config.New() em := bloom.New(cfg) fm := trie.New(cfg) vm := virusio.New(cfg) gnm, err := gnmatcher.New(em, fm, vm, cfg) if err != nil { fmt.Println(err) return } res := gnm.MatchNames([]string{"Pomatomus saltator", "Pardosa moesta"}) for _, match := range res.Matches { fmt.Println(match.Name) fmt.Println(match.MatchType) for _, item := range match.MatchItems { fmt.Println(item.MatchStr) fmt.Println(item.EditDistance) } } }
Output:
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // Version of the gnmatcher. When make runs, it automatically // sets the variable using git tags and hashes. Version = "v1.1.12" // Build timestamp. When make runs, it automatically sets the variable. Build = "n/a" )
Functions ¶
This section is empty.
Types ¶
type GNmatcher ¶
type GNmatcher interface { // MatchNames takes a slice of scientific name-strings with options and // returns back matches to canonical forms of known scientific names. The // following matches are attempted: // - Exact string match for viruses // - Exact match of the name-string's canonical form // - Fuzzy match of the canonical form // - Partial match of the canonical form where the middle parts of the name // or last elements of the name are removed. // - Partial fuzzy match of the canonical form. // // In case if a name is determined as a "virus" (a non-celular entity like // virus, prion, plasmid etc.), It is not matched, and returned back // to be found in a database. // // The resulting output does provide canonical forms, but not the sources // where they are registered. MatchNames(names []string, opts ...config.Option) mlib.Output // GetConfig provides configuration object of GNmatcher. GetConfig() config.Config // GetVersion returns version number and build timestamp. GetVersion() gnvers.Version }
GNmatcher is a public API to the project functionality.
func New ¶
func New( em exact.ExactMatcher, fm fuzzy.FuzzyMatcher, vm virus.VirusMatcher, cfg config.Config, ) (GNmatcher, error)
New is a constructor for GNmatcher interface. It takes two interfaces ExactMatcher and FuzzyMatcher.