Documentation
¶
Overview ¶
package gnmatcher provides the main use-case of the project, which is matching of possible name-strings to scientific names registered in a variety of biodiversity databases.
The goal of the project is to return matched canonical forms of scientific names by tens of thousands a second, making it possible to work with hundreds of millions/billions of name-string matching events.
The package is intended to be used by long-running services because it takes a few seconds/minutes to initialize its lookup data structures.
Example ¶
package main
import (
"fmt"
"github.com/gnames/gnmatcher"
"github.com/gnames/gnmatcher/config"
"github.com/gnames/gnmatcher/io/bloom"
"github.com/gnames/gnmatcher/io/trie"
)
func main() {
// Note that it takes several minutes to initialize lookup data structures.
// Requirement for initialization: Postgresql database with loaded
// http://opendata.globalnames.org/dumps/gnames-latest.sql.gz
//
// If data are imported already, it still takes several seconds to
// load lookup data into memory.
cfg := config.NewConfig()
em := bloom.NewExactMatcher(cfg)
fm := trie.NewFuzzyMatcher(cfg)
gnm := gnmatcher.NewGNMatcher(em, fm, 1)
res := gnm.MatchNames([]string{"Pomatomus saltator", "Pardosa moesta"})
for _, match := range res {
fmt.Println(match.Name)
fmt.Println(match.MatchType)
for _, item := range match.MatchItems {
fmt.Println(item.MatchStr)
fmt.Println(item.EditDistance)
}
}
}
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( // Version of the gnmatcher. When make runs, it automatically // sets the variable using git tags and hashes. Version = "v0.3.6" // Build timestamp. When make runs, it automatically sets the variable. Build = "n/a" )
Functions ¶
This section is empty.
Types ¶
type GNMatcher ¶
type GNMatcher interface {
// MatchNames takes a slice of scientific name-strings and returns back
// matches to canonical forms of known scientific names. The following
// matches are attempted:
// - Exact string match for viruses
// - Exact match of the name-string's canonical form
// - Fuzzy match of the canonical form
// - Partial match of the canonical form where the middle parts of the name
// or last elements of the name are removed.
// - Partial fuzzy match of the canonical form.
//
// The resulting output does provide canonical forms, but not the sources
// where they are registered.
MatchNames(names []string) []*mlib.Match
// Interface to Version number and Build timestamp
gn.Versioner
}
GNMatcher is a public API to the project functionality.
func NewGNMatcher ¶
func NewGNMatcher(em exact.ExactMatcher, fm fuzzy.FuzzyMatcher, j int) GNMatcher
NewGNMatcher is a constructor for GNMatcher interface. It takes two interfaces ExactMatcher and FuzzyMatcher.
Directories
¶
| Path | Synopsis |
|---|---|
|
package config contains information needed to run gnmatcher project.
|
package config contains information needed to run gnmatcher project. |
|
entity
|
|
|
exact
package exact contains interface for exact-matching strings to known scientific names.
|
package exact contains interface for exact-matching strings to known scientific names. |
|
fuzzy
package fuzzy contains interfaces and code to facilitate fuzzy-matching of name-strings to scientific names collected in gnames database.
|
package fuzzy contains interfaces and code to facilitate fuzzy-matching of name-strings to scientific names collected in gnames database. |
|
matcher
package matcher is the central processing unit for matching name-strings to known scientific names.
|
package matcher is the central processing unit for matching name-strings to known scientific names. |
|
package main provides an CLI interface to http service to run gnmatcher functionality.
|
package main provides an CLI interface to http service to run gnmatcher functionality. |
|
io
|
|
|
bloom
package bloom creates and serves bloom filters for canonical names, and names of viruses.
|
package bloom creates and serves bloom filters for canonical names, and names of viruses. |
|
dbase
package dbase provides convenience methods for accessing PostgreSQL database.
|
package dbase provides convenience methods for accessing PostgreSQL database. |
|
rest
package rest provides http REST interface to gnmatcher functionality.
|
package rest provides http REST interface to gnmatcher functionality. |
|
trie
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names.
|
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names. |
|
The purpose of this script is to find out how fast algorithms can go through a list of 100_000 names.
|
The purpose of this script is to find out how fast algorithms can go through a list of 100_000 names. |