Documentation
¶
Overview ¶
package gnmatcher provides the main use-case of the project, which is matching of possible name-strings to scientific names registered in a variety of biodiversity databases.
The goal of the project is to return back matched canonical forms of scientific names by tens of thousands a second, making it possible to work with hundreds of millions/billions of name-string matching events.
The package is intended to be used by long-running services, because it takes a few seconds to initialized its lookup data structures.
Index ¶
Constants ¶
const MaxNamesNumber = 10_000
MaxMaxNamesNumber is the upper limit of the number of name-strings the MatchNames function can process. If the number is higher, the list of name-strings will be truncated.
Variables ¶
var ( // Version of the gnmatcher Version = "v0.2.0" // Build timestamp Build = "n/a" )
Functions ¶
func NewGNMatcher ¶
NewGNMatcher is a constructor for GNMatcher interface
Types ¶
type GNMatcher ¶
type GNMatcher interface { // MatchNames take a slice of scientific name-strings and return back // matches to canonical forms of known scientific names. The following // matches are attempted: // - Exact string match for viruses // - Exact match of the name-string's canonical form // - Fuzzy match of the canonical form // - Partial match of the canonical form where the middle parts of the name // or last elements of the name are removed. // - Partial fuzzy match of the canonical form. // // The resulting output does provide canonical forms, but not the sources // where they are registered. // MatchNames(names []string) []*mlib.Match }
GNMatcher is a public API to the project functionality.
Directories
¶
Path | Synopsis |
---|---|
package bloom creates and serves bloom filters for canonical names, and names of viruses.
|
package bloom creates and serves bloom filters for canonical names, and names of viruses. |
Package dbase is an interface to PostgreSQL database that contains Global Names index data
|
Package dbase is an interface to PostgreSQL database that contains Global Names index data |
The purpose of this script is to find out how fast algorithms can go through a list of 100_000 names.
|
The purpose of this script is to find out how fast algorithms can go through a list of 100_000 names. |
stems_db package operates on a key-value store that contains stems and canonical forms that correspond to these stems.
|
stems_db package operates on a key-value store that contains stems and canonical forms that correspond to these stems. |