gnmatcher

package
v1.1.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 20, 2023 License: MIT Imports: 7 Imported by: 1

Documentation

Overview

package gnmatcher provides the main use-case of the project, which is matching of possible name-strings to scientific names registered in a variety of biodiversity databases.

The goal of the project is to return matched canonical forms of scientific names by tens of thousands a second, making it possible to work with hundreds of millions/billions of name-string matching events.

The package is intended to be used by long-running services because it takes a few seconds/minutes to initialize its lookup data structures.

Example
package main

import (
	"fmt"

	gnmatcher "github.com/gnames/gnmatcher/pkg"
	"github.com/gnames/gnmatcher/pkg/config"
	"github.com/gnames/gnmatcher/pkg/io/bloom"
	"github.com/gnames/gnmatcher/pkg/io/trie"
	"github.com/gnames/gnmatcher/pkg/io/virusio"
)

func main() {
	// Note that it takes several minutes to initialize lookup data structures.
	// Requirement for initialization: Postgresql database with loaded
	// http://opendata.globalnames.org/dumps/gnames-latest.sql.gz
	//
	// If data are imported already, it still takes several seconds to
	// load lookup data into memory.
	cfg := config.New()
	em := bloom.New(cfg)
	fm := trie.New(cfg)
	vm := virusio.New(cfg)
	gnm := gnmatcher.New(em, fm, vm, cfg)
	res := gnm.MatchNames([]string{"Pomatomus saltator", "Pardosa moesta"})
	for _, match := range res.Matches {
		fmt.Println(match.Name)
		fmt.Println(match.MatchType)
		for _, item := range match.MatchItems {
			fmt.Println(item.MatchStr)
			fmt.Println(item.EditDistance)
		}
	}
}
Output:

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// Version of the gnmatcher. When make runs, it automatically
	// sets the variable using git tags and hashes.
	Version = "v1.1.4"
	// Build timestamp. When make runs, it automatically sets the variable.
	Build = "n/a"
)

Functions

This section is empty.

Types

type GNmatcher

type GNmatcher interface {
	// MatchNames takes a slice of scientific name-strings with options and
	// returns back matches to canonical forms of known scientific names. The
	// following matches are attempted:
	// - Exact string match for viruses
	// - Exact match of the name-string's canonical form
	// - Fuzzy match of the canonical form
	// - Partial match of the canonical form where the middle parts of the name
	//   or last elements of the name are removed.
	// - Partial fuzzy match of the canonical form.
	//
	// In case if a name is determined as a "virus" (a non-celular entity like
	// virus, prion, plasmid etc.), It is not matched, and returned back
	// to be found in a database.
	//
	// The resulting output does provide canonical forms, but not the sources
	// where they are registered.
	MatchNames(names []string, opts ...config.Option) mlib.Output

	// GetConfig provides configuration object of GNmatcher.
	GetConfig() config.Config

	// GetVersion returns version number and build timestamp.
	GetVersion() gnvers.Version
}

GNmatcher is a public API to the project functionality.

func New

New is a constructor for GNmatcher interface. It takes two interfaces ExactMatcher and FuzzyMatcher.

Directories

Path Synopsis
package config contains information needed to run gnmatcher project.
package config contains information needed to run gnmatcher project.
io
bloom
package bloom creates and serves bloom filters for stemmed canonical names, and names of viruses.
package bloom creates and serves bloom filters for stemmed canonical names, and names of viruses.
dbase
package dbase provides convenience methods for accessing PostgreSQL database.
package dbase provides convenience methods for accessing PostgreSQL database.
rest
package rest provides http REST interface to gnmatcher functionality.
package rest provides http REST interface to gnmatcher functionality.
trie
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names.
package trie implements FuzzyMatcher interface that is responsible for fuzzy-matching strings to canonical forms of scientific names.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL