gnames

package module
v0.5.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2021 License: MIT Imports: 17 Imported by: 0

README

gnames

API GoDoc

The goal of the gnames project is to provide an accurate and fast verification of scientific names in unlimited quantities. The verification should be fast (at least 1000 names per second) and include exact and fuzzy matching of input strings to scientific names aggregated from a large number of data-sources.

In case if you do not need exact records of matched names from data-sources, and just want to know if a name-string is known, you can use gnmatcher instead of this project. The gnmatcher is significantly faster and has simpler output.

RESTful API of the project is described using OpenAPI Specification.

Features

  • Fast verification of unlimited number of scientific names.
  • Multiple levels of verification:
    • Exact matching (exact string match for viruses, exact canonical form match for Plantae, Fungi, Bacteria, and Animalia).
    • Fuzzy matching detects human and/or Optical Character Recognition (OCR) errors without producing large number of false positives. To avoid false positives uninomial names only checked for exact match.
    • PartialExact matching happens when a match for the full name-string is not found. In such cases middle or end words are removed and each variant is verified. Matches of names with the last word intact does have a preference.
    • PartialFuzzy matching is provided for partial matches of species and infraspecies. To avoid false positives uninomials only checked for exact match.
  • Providing names information from data-sources that contain a particular name.
    • Returning the "best" result. Best result is calculated by a scoring algorithm.
    • Optionally, returning multiple results for data-sources that are important to a gnames user.
  • Outlink URLs to some data-sources websites to show the original record of a name.
  • Providing meta-information about aggregated data-sources.

Installation

Most of the users do not need to install gnames and can use fast remote service at http://verifier.globalnames.org/api/v1 or use a command line client to gnames: gnverifier. Nevertheless, it is possible to install a local copy of the service.

Installation prerequesites
  • A Linux-based operating system.
  • At least 32GB of memory.
  • At least 50GB of a free disk space.
  • Fast internet connection during installation. After installation gnames can operate without remote connection.
  • PostgreSQL database.
Installation process
  1. PostgreSQL

    We are not covering basics of PostgreSQL administration here. There are many tutorials and resources for many Linux-based operating systems that can help.

    PostgreSQL database has to have C or C.UTF-8 collation. This dependency on collation will be removed in the future.

    Create a database named gnames. Download the gnames database dump. Restore the database with:

    gunzip -c gnames_latest.sql.gz | pg_restore -d gnames
    
  2. gnmatcher

    Refer to the gnmatcher documentation for its installation.

  3. gnames

    Download the latest [release] of gnames, unpack it, place is somewhere in the PATH.

    Run gnames -V. It will show you the version of gnames and also generate $HOME/.config/gnames.yaml configuration file.

    Edit $HOME/.config/gnames.yaml according to your preferences.

    Try it by running

    gnames rest -p 8888
    

    To load service automatically you can create systemctl configuration for the service, if your system supports systemctl.

    Alternatively you can use docker image to run gnames. You will need to create a file with corresponding environment variables that are described in the .env.example file.

    docker pull gnames/gnames:latest
    docker run -env_file path_to_env_file -d -i -t -p 8888:8888 \
      gnames/gnames:latest rest -p 8888
    

Usage as API

Refer to gnames' OpenAPI Specification about interacting with gnames API. We are planning to add web-GUI in the future as well.

Usage with gnverifier

gnverifier is a command line client for gnames backend. Install and use it according to gnverifier documentation.

Known limitations of the verification

  • Exact matches of misspellings that might exist in poorly curated databases prevent to find fuzzy matches from better curated sources.

    To increase performance we stop any further tries if a name matched successfully. This prevents fuzzy-matching if a misspelled name is found somewhere. It is helpful to check 'curation' field of returned result, and see how many data-sources do contain the name.

  • Fuzzy matching of a name where genus string is broken by a space.

    For example we cannot match 'Abro stola triplasia' to 'Abrostola triplasia'. There is only 1 edit distance between the strings, however we stem specific epithets, so in reality we fuzzy-match 'Abro stol triplas' to 'Abrostola triplas'. That means now we have edit distance 2 which is usually beyond our threshold.

Development

  • Install Go language for your Linux operating system.
  • Create PostgreSQL database as described in installation.
  • Clone the gnames code.
  • Clone the gnmatcher and set it up for development.
  • Install docker and docker compose.
  • Go to your local gnames directory
    • Run make dc
    • Run docker-compose up
    • In another terminal window run go test ./...

Authors

License

The gnames code is released under MIT license.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// Version of the gnames
	Version = "v0.5.2+"
	// Build timestamp
	Build = "n/a"
)

Functions

func FirstUpperCase added in v0.2.0

func FirstUpperCase(name string) string

Types

type GNames

type GNames interface {
	// GetVersion returns the version of GNames and a timestamp of its build.
	GetVersion() gnvers.Version

	// Verify takes a slice of name-strings together with query parameters and
	// returns back results of verification.
	Verify(ctx context.Context, params verifier.Input) (verifier.Output, error)

	// Search takes
	Search(ctx context.Context, srch search.Input) search.Output

	// Datasources take IDs of data-sourses and return back list of corresponding
	// metadata. If no IDs are given, it returns metadata for all data-sources.
	DataSources(ids ...int) ([]*verifier.DataSource, error)
}

GNames is the main use-case interface of the app. Its purpose to provide metadata of registered DataSources and provide functionality for verification (resolution/reconciliation) of name-strings to known to gnames scientific names, as well as providing data where these names occur.

func NewGNames

func NewGNames(
	cnf config.Config,
	vf verifier.Verifier,
	fc facet.Facet,
) GNames

NewGNames is a constructor that returns implmentation of GNames interface.

Directories

Path Synopsis
ent
score
Package score allows to sort matched records according to their relevancy.
Package score allows to sort matched records according to their relevancy.
verifier
Package verifier provides data about biodiversity data-sources and metadata about scientific names aggregated from them.
Package verifier provides data about biodiversity data-sources and metadata about scientific names aggregated from them.
cmd
io
verifierpg
Package verifierpg is a PostgreSQL-based implementaion of the Verifier interface.
Package verifierpg is a PostgreSQL-based implementaion of the Verifier interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL