gnfinder

package module
Version: v0.12.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 25, 2021 License: MIT Imports: 9 Imported by: 3

README

Global Names Finder

Build Status Doc Status Go Report Card

Finds scientific names using dictionary and nlp approaches.

Features

  • Multiplatform packages (Linux, Windows, Mac OS X).
  • Self-contained, no external dependencies, only binary gnfinder or gnfinder.exe (~15Mb) is needed. However the internet connection is required for name-verification.
  • Takes UTF8-encoded text and returns back JSON-formatted output that contains detected scientific names.
  • Optionally, automatically detects the language of the text, and adjusts Bayes algorithm for the language. English and German languages are currently supported.
  • Uses complementary heuristic and natural language processing algorithms.
  • Optionally verifies found names against multiple biodiversity databases using gnindex service.
  • Detection of nomenclatural annotations like sp. nov., comb. nov., ssp. nov. and their variants.
  • Ability to see words that surround detected name-strings.
  • The library can be used concurrently to significantly improve speed. On a server with 40threads it is able to detect names on 50 million pages in approximately 3 hours using both heuristic and Bayes algorithms. Check bhlindex project for an example.

Install as a command line app

Download the binary executable for your operating system from the latest release.

Linux or OS X

Move gnfinder executabe somewhere in your PATH (for example /usr/local/bin)

sudo mv path_to/gnfinder /usr/local/bin
Windows

One possible way would be to create a default folder for executables and place gnfinder there.

Use Windows+R keys combination and type "cmd". In the appeared terminal window type:

mkdir C:\bin
copy path_to\gnfinder.exe C:\bin

Add C:\bin directory to your PATH environment variable.

Go

Install Go >= v1.16

git clone git@github.com:/gnames/gnfinder
cd gnfinder
make tools
make install

Usage

Usage as a command line app

To see flags and usage:

gnfinder --help
# or just
gnfinder

To see the version of its binary:

gnfinder -V

Examples:

Getting data from a pipe forcing English language and verification

echo "Pomatomus saltator and Parus major" | gnfinder -v -l eng
echo "Pomatomus saltator and Parus major" | gnfinder --verify --lang eng

Displaying matches from NCBI and Encyclopedia of Life, if exist. For the list of data source ids go to gnverifier's data sources page.

echo "Pomatomus saltator and Parus major" | gnfinder -v -l eng -s "4,12"
echo "Pomatomus saltator and Parus major" | gnfinder --verify --lang eng --sources "4,12"

Adjusting Prior Odds using information about found names. They are calculated as "found names number / (capitalized words number - found names number)". Such adjustment will decrease Odds for texts with very few names, and increase odds for texts with a lot of found names.

gnfinder -a -d -f pretty file_with_names.txt

Returning 5 words before and after found name-candidate.

gnfinder -w 5 file_with_names.txt
gnfinder --words-around 5 file_with_names.txt

Getting data from a file and redirecting result to another file

gnfinder file1.txt > file2.json

Detection of nomenclatural annotations

echo "Parus major sp. n." | gnfinder
Usage as a library
import (
  "github.com/gnames/gnfinder"
  "github.com/gnames/gnfinder/ent/nlp"
  "github.com/gnames/gnfinder/io/dict"
)

func Example() {
  txt := []byte(`Blue Adussel (Mytilus edulis) grows to about two
inches the first year,Pardosa moesta Banks, 1892`)
  cfg := gnfinder.NewConfig()
  dictionary := dict.LoadDictionary()
  weights := nlp.BayesWeights()
  gnf := gnfinder.New(cfg, dictionary, weights)
  res := gnf.Find(txt)
  name := res.Names[0]
  fmt.Printf(
    "Name: %s, start: %d, end: %d",
    name.Name,
    name.OffsetStart,
    name.OffsetEnd,
  )
  // Output:
  // Name: Mytilus edulis, start: 13, end: 29
}
Usage as a docker container
docker pull gnames/gnfinder

# run gnfinder server, and map it to port 8888 on the host machine
docker run -d -p 8888:8778 --name gnfinder gnames/gnfinder

Development

To install the latest gnfinder

git clone git@github.com:/gnames/gnfinder
cd gnfinder
make tools
make install

Testing

make tools
# run make install for CLI testing
make install

To run tests go to the root directory of the project and run

go test ./...

#or

make test

Documentation

Overview

Example
package main

import (
	"fmt"

	"github.com/gnames/gnfinder"
	"github.com/gnames/gnfinder/config"
	"github.com/gnames/gnfinder/ent/nlp"
	"github.com/gnames/gnfinder/io/dict"
)

func main() {
	txt := []byte(`Blue Adussel (Mytilus edulis) grows to about two
inches the first year,Pardosa moesta Banks, 1892`)
	cfg := config.New()
	dictionary := dict.LoadDictionary()
	weights := nlp.BayesWeights()
	gnf := gnfinder.New(cfg, dictionary, weights)
	res := gnf.Find(txt)
	name := res.Names[0]
	fmt.Printf(
		"Name: %s, start: %d, end: %d",
		name.Name,
		name.OffsetStart,
		name.OffsetEnd,
	)
}
Output:

Name: Mytilus edulis, start: 13, end: 29

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	Version = "v0.12.0+"
	Build   string
)

Functions

This section is empty.

Types

type GNfinder added in v0.8.4

type GNfinder interface {
	Find(data []byte) output.Output

	GetConfig() config.Config

	ChangeConfig(opts ...config.Option) GNfinder

	GetVersion() gnvers.Version
}

func New added in v0.12.0

func New(
	cfg config.Config,
	dictionaries *dict.Dictionary,
	weights map[lang.Language]*bayes.NaiveBayes,
) GNfinder

Directories

Path Synopsis
ent
api
nlp
token
Package token deals with breaking a text into tokens.
Package token deals with breaking a text into tokens.
cmd
io
dict
package dict contains dictionaries for finding scientific names
package dict contains dictionaries for finding scientific names
tools

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL