Documentation
¶
Overview ¶
package gnmatcher provides the main use-case of the project, which is matching of possible name-strings to scientific names registered in a variety of biodiversity databases.
The goal of the project is to return matched canonical forms of scientific names by tens of thousands a second, making it possible to work with hundreds of millions/billions of name-string matching events.
The package is intended to be used by long-running services because it takes a few seconds/minutes to initialize its lookup data structures.
Example ¶
package main
import (
"fmt"
"github.com/gnames/gnmatcher"
"github.com/gnames/gnmatcher/config"
"github.com/gnames/gnmatcher/io/bloom"
"github.com/gnames/gnmatcher/io/trie"
"github.com/gnames/gnmatcher/io/virusio"
)
func main() {
// Note that it takes several minutes to initialize lookup data structures.
// Requirement for initialization: Postgresql database with loaded
// http://opendata.globalnames.org/dumps/gnames-latest.sql.gz
//
// If data are imported already, it still takes several seconds to
// load lookup data into memory.
cfg := config.New()
em := bloom.New(cfg)
fm := trie.New(cfg)
vm := virusio.New(cfg)
gnm := gnmatcher.New(em, fm, vm, cfg)
res := gnm.MatchNames([]string{"Pomatomus saltator", "Pardosa moesta"})
for _, match := range res {
fmt.Println(match.Name)
fmt.Println(match.MatchType)
for _, item := range match.MatchItems {
fmt.Println(item.MatchStr)
fmt.Println(item.EditDistance)
}
}
}
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // Version of the gnmatcher. When make runs, it automatically // sets the variable using git tags and hashes. Version = "v0.9.8+" // Build timestamp. When make runs, it automatically sets the variable. Build = "n/a" )
Functions ¶
This section is empty.
Types ¶
type GNmatcher ¶ added in v0.5.4
type GNmatcher interface {
// MatchNames takes a slice of scientific name-strings with options and
// returns back matches to canonical forms of known scientific names. The
// following matches are attempted:
// - Exact string match for viruses
// - Exact match of the name-string's canonical form
// - Fuzzy match of the canonical form
// - Partial match of the canonical form where the middle parts of the name
// or last elements of the name are removed.
// - Partial fuzzy match of the canonical form.
//
// In case if a name is determined as a "virus" (a non-celular entity like
// virus, prion, plasmid etc.), It is not matched, and returned back
// to be found in a database.
//
// The resulting output does provide canonical forms, but not the sources
// where they are registered.
MatchNames(names []string, opts ...config.Option) []mlib.Output
// GetConfig provides configuration object of GNmatcher.
GetConfig() config.Config
// GetVersion returns version number and build timestamp.
GetVersion() gnvers.Version
}
GNmatcher is a public API to the project functionality.
func New ¶ added in v0.6.1
func New( em exact.ExactMatcher, fm fuzzy.FuzzyMatcher, vm virus.VirusMatcher, cfg config.Config, ) GNmatcher
New is a constructor for GNmatcher interface. It takes two interfaces ExactMatcher and FuzzyMatcher.
Directories
¶
| Path | Synopsis |
|---|---|
|
package config contains information needed to run gnmatcher project.
|
package config contains information needed to run gnmatcher project. |
|
ent
|
|
|
exact
package exact contains interface for exact-matching strings to known scientific names.
|
package exact contains interface for exact-matching strings to known scientific names. |
|
fuzzy
package fuzzy contains interfaces and code to facilitate fuzzy-matching of name-strings to scientific names collected in gnames database.
|
package fuzzy contains interfaces and code to facilitate fuzzy-matching of name-strings to scientific names collected in gnames database. |
|
matcher
package matcher is the central processing unit for matching name-strings to known scientific names.
|
package matcher is the central processing unit for matching name-strings to known scientific names. |
|
virus
package virus contains an interface for matching strings to names of viruses, plasmids, prions and other non-cellular entities.
|
package virus contains an interface for matching strings to names of viruses, plasmids, prions and other non-cellular entities. |
|
package main provides an CLI interface to http service to run gnmatcher functionality.
|
package main provides an CLI interface to http service to run gnmatcher functionality. |
|
io
|
|
|
bloom
package bloom creates and serves bloom filters for stemmed canonical names, and names of viruses.
|
package bloom creates and serves bloom filters for stemmed canonical names, and names of viruses. |
|
dbase
package dbase provides convenience methods for accessing PostgreSQL database.
|
package dbase provides convenience methods for accessing PostgreSQL database. |
|
rest
package rest provides http REST interface to gnmatcher functionality.
|
package rest provides http REST interface to gnmatcher functionality. |
|
trie
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names.
|
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names. |
|
The purpose of this script is to find out how fast algorithms can go through a list of 100_000 names.
|
The purpose of this script is to find out how fast algorithms can go through a list of 100_000 names. |