gnfinder

package module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 23, 2018 License: MIT Imports: 12 Imported by: 3

README

Global Names Finder Build Status Doc Status

Finds scientific names using dictionary and nlp approaches.

Usage as a command line.

Download the binary executable for your operating system from the latest release. To see flags and usage:

gnfinder --help

Usage as a library

go get github.com/gnames/gnfinder
go get github.com/json-iterator/go
go get github.com/rakyll/statik
# To update dictionaries if they are changed
cd $GOPATH/srs/github.com/gnames/gnfinder
go generate
import (
  "github.com/gnames/gnfinder"
  "github.com/gnames/gnfinder/dict"
)

dict = &dict.LoadDictionary()
bytesText := []byte(utfText)

jsonNames := FindNamesJSON(bytesText, dict, opts)
Development

To install latest gnfinder

git get github.com/gnames/gnfinder
cd $GOPATH/src/github.com/gnames/gnfinder
make
gnfinder -h
Testing

Install [ginkgo], a [BDD] testing framefork for Go.

go get github.com/onsi/ginkgo/ginkgo
go get github.com/onsi/gomega

To run tests go to root directory of the project and run

ginkgo

#or

go test

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FindNamesJSON

func FindNamesJSON(data []byte, dict *dict.Dictionary,
	opts ...util.Opt) []byte

FindNamesJSON takes a text and returns scientific names found in the text, as well as tokens

Types

type Meta

type Meta struct {
	// Date represents time when output was generated.
	Date time.Time `json:"date"`
	// Language of the document
	Language string `json:"language"`
	// TotalTokens is a number of 'normalized' words in the text
	TotalTokens int `json:"total_words"`
	// TotalNameCandidates is a number of words that might be a start of
	// a scientific name
	TotalNameCandidates int `json:"total_candidates"`
	// TotalNames is a number of scientific names found
	TotalNames int `json:"total_names"`
	// CurrentName (optional) is the index of the names array that designates a
	// "position of a cursor". It is used by programs like gntagger that allow
	// to work on the list of found names interactively.
	CurrentName int `json:"current_index,omitempty"`
}

Meta contains meta-information of name-finding result.

type Name

type Name struct {
	Type        string               `json:"type"`
	Verbatim    string               `json:"verbatim"`
	Name        string               `json:"name"`
	Odds        float64              `json:"odds,omitempty"`
	OddsDetails token.OddsDetails    `json:"odds_details,omitempty"`
	OffsetStart int                  `json:"start"`
	OffsetEnd   int                  `json:"end"`
	Annotation  string               `json:"annotation"`
	Validation  *resolver.NameOutput `json:"validation"`
}

Name represents one found name.

func TokensToName

func TokensToName(ts []token.Token, text []rune) Name

type OddsDatum

type OddsDatum struct {
	Name bool
	Odds float64
}

OddsDatum is a simplified version of a name, that stores boolean decision (Name/NotName), and corresponding odds of the name.

type Output

type Output struct {
	Meta  `json:"metadata"`
	Names []Name `json:"names"`
}

Output type is the result of name-finding.

func CollectOutput

func CollectOutput(ts []token.Token, text []rune, m *util.Model) Output

CollectOutput takes tagged tokens and assembles gnfinder output out of them.

func FindNames

func FindNames(text []rune, d *dict.Dictionary, opts ...util.Opt) Output

FindNames traverses a text and finds scientific names in it.

func NewOutput

func NewOutput(names []Name, ts []token.Token, m *util.Model) Output

NewOutput is a constructor for Output type.

func (*Output) FromJSON

func (o *Output) FromJSON(data []byte)

FromJSON converts JSON representation of Outout to Output object.

func (*Output) ToJSON

func (o *Output) ToJSON() []byte

ToJSON converts Output to JSON representation.

Directories

Path Synopsis
package dict contains dictionaries for finding scientific names
package dict contains dictionaries for finding scientific names
cmd
scripts
Package token deals with breaking a text into tokens.
Package token deals with breaking a text into tokens.
Package util contains useful shared functions
Package util contains useful shared functions

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL