Version: v0.17.0 Latest Latest

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2022 License: MIT Imports: 3 Imported by: 1




This section is empty.


This section is empty.


This section is empty.


type Config

type Config struct {
	// BayesOddsThreshold sets the limit of posterior odds. Everything higher
	// this limit will be classified as a name.
	BayesOddsThreshold float64

	// Format output format for finding results. Possible formats are
	// csv - CSV output
	// compact - JSON in one line
	// pretty - JSON with new lines and indentations.
	Format gnfmt.Format

	// IncludeInputText can be set to true if the user wants to get back the text
	// used for name-finding. This feature is epspecilly useful if original file
	// was a PDF, MS Word, HTML etc. and a user wants to use OffsetStart and
	// OffsetEnd indices to find names in the text.
	IncludeInputText bool

	// InputTextOnly can be set to true if the user wants only the UTF8-encoded text
	// of the file without name-finding. If this option is true, then most of other
	// options are ignored.
	InputTextOnly bool

	// Language that is prevalent in the text. This setting helps to get
	// a better result for NLP name-finding, because languages differ in their
	// training patterns.
	// Currently only the following languages are supported:
	// eng - English
	// deu - German
	Language lang.Language

	// LanguageDetected is the code of a language that was detected in text.
	// It is an empty string, if detection of language is not set.
	LanguageDetected string

	// PreferredSources is a list of data-source IDs used for the
	// name-verification. These data-sources will always be matched with the
	// verified names. You can find the list of all data-sources at
	PreferredSources []int

	// TikaURL contains the URL of Apache Tika service. This service is used
	// for extraction of UTF8-encoded texts from a variety of file formats.
	TikaURL string

	// TokensAround sets the number of tokens (words) before and after each
	// name-candidate. These words will be returned with the output.
	TokensAround int

	// VerifierURL contains the URL of a name-verification service.
	VerifierURL string

	// WithBayes determines if both heuristic and Naive Bayes algorithms run
	// during the name-finnding.
	// false - only heuristic algorithms run
	// true - both heuristic and Naive Bayes algorithms run.
	WithBayes bool

	// WithPositionInBytes can be set to true to receive offsets in number of
	// bytes instead of UTF-8 characters.
	WithPositionInBytes bool

	// WithBayesOddsDetails show in detail how odds are calculated.
	WithBayesOddsDetails bool

	// WithOddsAdjustment can be set to true to adjust calculated odds using the
	// ratio of scientific names found in text to the number of capitalized
	// words.
	WithOddsAdjustment bool

	// WithPlainInput flag can be set to true if the input is a plain
	// UTF8-encoded text. In this case file is read directly instead of going
	// through file type and encoding checking.
	WithPlainInput bool

	// WithUniqueNames can be set to true to get a unique list of names.
	WithUniqueNames bool

	// WithVerification is true if names should be verified
	WithVerification bool

Config is responsible for name-finding operations.

func New

func New(opts ...Option) Config

New creates GNfinder object with default data, or with data coming from opts.

type Option

type Option func(*Config)

Option type for changing GNfinder settings.

func OptBayesOddsThreshold added in v0.14.0

func OptBayesOddsThreshold(f float64) Option

OptBayesOddsThreshold is an option for name finding, that sets new threshold for results from the Bayes name-finding. All the name candidates that have a higher threshold will appear in the resulting names output.

func OptFormat

func OptFormat(f gnfmt.Format) Option

OptFormat sets output format

func OptIncludeInputText added in v0.14.0

func OptIncludeInputText(b bool) Option

OptIncludeInputText indicates if to return original UTF8-encoded input.

func OptInputTextOnly added in v0.14.1

func OptInputTextOnly(b bool) Option

OptInputTextOnly indicates if to return original UTF8-encoded input.

func OptLanguage

func OptLanguage(l lang.Language) Option

OptLanguage sets a language of a text.

func OptPreferredSources

func OptPreferredSources(is []int) Option

OptPreferredSources sets data sources that will always be checked during verification process.

func OptTikaURL added in v0.14.0

func OptTikaURL(s string) Option

OptTikaURL sets URL for UTF8 text extraction service.

func OptTokensAround

func OptTokensAround(i int) Option

OptTokensAround sets number of tokens rememberred on the left and right side of a name-candidate.

func OptVerifierURL added in v0.14.0

func OptVerifierURL(s string) Option

OptVerifierURL sets URL for verification service.

func OptWithBayes

func OptWithBayes(b bool) Option

OptWithBayes is an option that forces running bayes name-finding even when the language is not supported by training sets.

func OptWithBayesOddsDetails

func OptWithBayesOddsDetails(b bool) Option

OptWithBayesOddsDetails option to show details of odds calculations.

func OptWithOddsAdjustment

func OptWithOddsAdjustment(b bool) Option

OptWithOddsAdjustment is an option that triggers recalculation of prior odds using number of found names divided by number of all name candidates.

func OptWithPlainInput added in v0.14.0

func OptWithPlainInput(b bool) Option

OptWithPlainInput sets WithPlainInput option indicating there is no need to check file type and encoding, and the file can be read directly.

func OptWithPositonInBytes added in v0.17.0

func OptWithPositonInBytes(b bool) Option

OptWithPositonInBytes is an option that allows to have offsets in number of bytes of number of UTF-8 characters.

func OptWithUniqueNames added in v0.14.0

func OptWithUniqueNames(b bool) Option

OptWithUniqueNames indicates if to return the unique list of names instead of all occurences of names in the text.

func OptWithVerification

func OptWithVerification(b bool) Option

OptWithVerification indicates either to run or not to run the verification process after name-finding.

Source Files

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL