gnmatcher

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 21, 2020 License: MIT Imports: 16 Imported by: 2

README

gnmatcher

gnmatcher provides fast stemming, fuzzy matching altorithms for matching scientific names

Documentation

Index

Constants

View Source
const MaxNamesNumber = 10_000

Variables

View Source
var (
	Version = "v0.0.0"
	Build   = "n/a"
)

Functions

This section is empty.

Types

type Config

type Config struct {
	WorkDir  string
	NatsURI  string
	JobsNum  int
	GNamesDB dbase.Dbase
}

Config collects and stores external configuration data.

func NewConfig

func NewConfig(opts ...Option) Config

NewConfig is a Config constructor that takes external options to update default values to external ones.

type GNMatcher

type GNMatcher struct {
	WorkDir  string
	NatsURI  string
	JobsNum  int
	GNUUID   uuid.UUID
	GNamesDB dbase.Dbase
	Filters  *bloom.Filters
	Trie     *levenshtein.MinTree
}

GNMatcher keeps most general configuration settings and high level methods for scientific name matching.

func NewGNMatcher

func NewGNMatcher(cnf Config) (GNMatcher, error)

NewGNMatcher is a constructor for GNMatcher instance

func (GNMatcher) CreateWorkDirs

func (gnm GNMatcher) CreateWorkDirs() error

func (GNMatcher) FiltersDir

func (gnm GNMatcher) FiltersDir() string

func (GNMatcher) Match

func (gnm GNMatcher) Match(ns NameString) *protob.Result

func (GNMatcher) MatchFuzzy

func (gnm GNMatcher) MatchFuzzy(ns NameString, kv *badger.DB) *protob.Result

func (GNMatcher) MatchNames

func (gnm GNMatcher) MatchNames(names []string) []*protob.Result

func (GNMatcher) MatchVirus

func (gnm GNMatcher) MatchVirus(ns NameString) *protob.Result

func (GNMatcher) NewNameString

func (gnm GNMatcher) NewNameString(parser gnparser.GNparser, name string) (NameString, bool)

func (GNMatcher) StemsDir

func (gnm GNMatcher) StemsDir() string

func (GNMatcher) TrieDir

func (gnm GNMatcher) TrieDir() string

type NameString

type NameString struct {
	ID              string
	Name            string
	Canonical       string
	CanonicalID     string
	CanonicalFull   string
	CanonicalFullID string
	CanonicalStem   string
}

type Option

type Option func(cnf *Config)

Option is a type of all options for Config.

func OptJobsNum

func OptJobsNum(i int) Option

OptJobsNum sets number of concurrent jobs to run for parallel tasks.

func OptNatsURI

func OptNatsURI(s string) Option

OptNatsURI defines a URI to connect to NATS messaging service server.

func OptPgDB

func OptPgDB(s string) Option

OptPgDB sets the name of gnames database

func OptPgHost

func OptPgHost(s string) Option

OptPgHost sets the host of gnames database

func OptPgPass

func OptPgPass(s string) Option

OptPgPass sets the password to access gnnames database

func OptPgPort

func OptPgPort(i int) Option

OptPgPort sets the port for gnames database

func OptPgUser

func OptPgUser(s string) Option

OptPgUser sets the user of gnnames database

func OptWorkDir

func OptWorkDir(s string) Option

OptWorkDir sets a directory for key-value stores and temporary files.

Directories

Path Synopsis
package bloom creates and serves bloom filters for canonical names, and names of viruses.
package bloom creates and serves bloom filters for canonical names, and names of viruses.
Package dbase is an interface to PostgreSQL database that contains Global Names index data
Package dbase is an interface to PostgreSQL database that contains Global Names index data
Package fuzzy includes a Levenshtein automaton as well as a traditional implementation to calculate Levenshtein Distance.
Package fuzzy includes a Levenshtein automaton as well as a traditional implementation to calculate Levenshtein Distance.
cmd
stems_db package operates on a key-value store that contains stems and canonical forms that correspond to these stems.
stems_db package operates on a key-value store that contains stems and canonical forms that correspond to these stems.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL