Documentation ¶
Index ¶
- Variables
- type GNfinder
- type Option
- func OptBayes(b bool) Option
- func OptBayesOddsDetails(o bool) Option
- func OptBayesThreshold(odds float64) Option
- func OptBayesWeights(bw map[lang.Language]*bayes.NaiveBayes) Option
- func OptDetectLanguage(bool) Option
- func OptDict(d *dict.Dictionary) Option
- func OptLanguage(l lang.Language) Option
- func OptTokensAround(tokensNum int) Option
- func OptVerify(opts ...verifier.Option) Option
Constants ¶
This section is empty.
Variables ¶
var ( Version = "v0.11.1" Build string )
Functions ¶
This section is empty.
Types ¶
type GNfinder ¶ added in v0.8.4
type GNfinder struct { // Language for name-finding in the text. Language lang.Language // LanguageDetected is the code of a language that was detected in text. // It is an empty string, if detection of language is not set. LanguageDetected string // DetectLanguage flag is true if we want to detect language automatically. DetectLanguage bool // Bayes is true when we run Bayes algorithm, and false when we dont. Bayes bool // BayesOddsThreshold sets the limit of posterior odds. Everything bigger // that this limit will go to the names output. BayesOddsThreshold float64 // BayesOddsDetails show odds calculation details in the CLI output. BayesOddsDetails bool // TextOdds captures "concentration" of names as it is found for the whole // text by heuristic name-finding. It should be close enough for real // number of names in text. We use it when we do not have local conentration // of names in a region of text. TextOdds bayes.LabelFreq // TokensAround gives number of tokens kepts before and after each // name-candidate. TokensAround int // Verifier for scientific names. Verifier *verifier.Verifier // Dict contains black, grey, and white list dictionaries. Dict *dict.Dictionary // BayesTrained contains training for all supported bayes dictionaries. BayesWeights map[lang.Language]*bayes.NaiveBayes }
GNfinder is responsible for name-finding operations.
func NewGNfinder ¶ added in v0.8.4
NewGNfinder creates GNfinder object with default data, or with data coming from opts.
func (*GNfinder) FindNames ¶ added in v0.8.4
FindNames traverses a text and finds scientific names in it.
func (*GNfinder) FindNamesJSON ¶ added in v0.8.4
FindNamesJSON takes a text as bytes and returns JSON representation of scientific names found in the text
type Option ¶ added in v0.8.4
type Option func(*GNfinder)
Option type for changing GNfinder settings.
func OptBayes ¶ added in v0.8.4
OptBayes is an option that forces running bayes name-finding even when the language is not supported by training sets.
func OptBayesOddsDetails ¶ added in v0.11.0
OptBayesOddsDetails option to show details of odds calculations.
func OptBayesThreshold ¶ added in v0.8.4
OptBayesThreshold is an option for name finding, that sets new threshold for results from the Bayes name-finding. All the name candidates that have a higher threshold will appear in the resulting names output.
func OptBayesWeights ¶ added in v0.8.10
func OptBayesWeights(bw map[lang.Language]*bayes.NaiveBayes) Option
OptBayesWeights allows to set already created Bayes Training data and store it in gnfinder's BayesWeights field. It saves time if multiple workers have to be created by a client app.
func OptDetectLanguage ¶ added in v0.9.0
OptDetectLanguage when true sets automatic detection of text's language.
func OptDict ¶ added in v0.8.4
func OptDict(d *dict.Dictionary) Option
OptDict allows to set already created dictionary for GNfinder. It saves time, because then dictionary does not have to be loaded at the construction time.
func OptLanguage ¶ added in v0.8.4
OptLanguage sets a language of a text.
func OptTokensAround ¶ added in v0.10.0
OptTokensAround sets number of tokens rememberred on the left and right side of a name-candidate.
Directories ¶
Path | Synopsis |
---|---|
package dict contains dictionaries for finding scientific names
|
package dict contains dictionaries for finding scientific names |
scripts
|
|
Package token deals with breaking a text into tokens.
|
Package token deals with breaking a text into tokens. |