verifier

package
Version: v0.14.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 10, 2022 License: MIT Imports: 3 Imported by: 31

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CurationLevel

type CurationLevel int

CurationLevel tells if matched result was returned by at least one DataSource in the following categories.

const (
	// NotCurated means that all DataSources where the name-string was matched
	// are not curated sufficiently.
	NotCurated CurationLevel = iota

	// AutoCurated means that at least one of the returned DataSources invested
	// significantly in curating their data by scripts.
	AutoCurated

	// Curated means that at least one DataSource is marked as sufficiently
	// curated. It does not mean that the particular match was manually checked
	// though.
	Curated
)

func (CurationLevel) MarshalJSON

func (cl CurationLevel) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaller interface and converts MatchType into a string.

func (CurationLevel) String

func (cl CurationLevel) String() string

func (*CurationLevel) UnmarshalJSON

func (cl *CurationLevel) UnmarshalJSON(bs []byte) error

UnmarshalJSON implements json.Unmarshaller interface and converts a string into MatchType.

type DataSource

type DataSource struct {
	// ID is a DataSource Id.
	ID int `json:"id"`

	// UUID generated by GlobalNames and associated with the DataSource
	UUID string `json:"uuid,omitempty"`

	// Title is a full title of a DataSource
	Title string `json:"title"`

	// TitleShort is a shortened/abbreviated title of a DataSource.
	TitleShort string `json:"titleShort"`

	// Version of the data-set for a DataSource.
	Version string `json:"version,omitempty"`

	// RevisionDate of a data-set from a data-provider.
	// It follows format of 'YYYY-MM-DD' || 'YYYY-MM' || 'YYYY'
	// This data comes from the information given by the data-provider,
	// while UpdatedAt field is the date of harvesting of the
	// resource.
	RevisionDate string `json:"releaseDate,omitempty"`

	// DOI of a DataSource;
	DOI string `json:"doi,omitempty"`

	// Citation representing a DataSource
	Citation string `json:"citation,omitempty"`

	// Authors associated with the DataSource
	Authors string `json:"authors,omitempty"`

	// Description of the DataSource.
	Description string `json:"description,omitempty"`

	// WebsiteURL is a hompage of a DataSource
	WebsiteURL string `json:"homeURL,omitempty"`

	// OutlinkURL is a template for generating outlink URLs. Verification
	// output will substitute '{}' with an OutlinkID
	OutlinkURL string `json:"-"`

	// IsOutlinkReady is true for data-sources that have enough data and
	// metadata to be recommended for outlinking by third-party applications
	// (be included into data-sources). When false, it does not
	// mean that the original resource is not valuable, it means that
	// its representation at gnames is not complete/resent enough.
	IsOutlinkReady bool `json:"isOutlinkReady,omitempty"`

	// Curation determines how much of manual or programmatic work is put
	// into assuring the quality of the data.
	Curation CurationLevel `json:"curation"`

	// RecordCount tells how many entries are in a DataSource.
	RecordCount int `json:"recordCount"`

	// UpdatedAt is the last import date (YYYY-MM-DD). In contrast,
	// RevisionDate field indicates when the resource was
	// updated according to its data-provider.
	UpdatedAt string `json:"updatedAt"`
}

DataSource provides metadata for an externally collected data-set.

type Input added in v0.4.3

type Input struct {
	// NameStrings is a list of name-strings to verify.
	NameStrings []string `json:"nameStrings"`

	// DataSources field contains DataSources IDs to limit results to only these
	// sources. The best result is calculated only out of this limited set of
	// data. By default only the BestResult is shown. To see all results use
	// WithAllMatches flag.
	DataSources []int `json:"dataSources"`

	// WithAllMatches provides all results, instead of only the BestResult.
	// The results are sorted by score, not by data-source. The top result is
	// the the best result.
	WithAllMatches bool `json:"withAllMatches"`

	// WithVernaculars indicates if corresponding vernacular results will be
	// returned as well.
	WithVernaculars bool `json:"withVernaculars"`

	// WithCapitalization flag; when true, the first rune of low-case
	// input name-strings will be capitalized if appropriate.
	WithCapitalization bool `json:"withCapitalization"`

	// WithSpeciesGroup flag; when true, species names also get matched by
	// their species group. It means that the request will take in account
	// botanical autonyms and zoological coordinated names.
	WithSpeciesGroup bool `json:"withSpeciesGroup"`

	// WithStats flag; when true, results will return the most prevalent
	// kingdom for the text, as well as the taxon which contains a given
	// percentage of all names in the text (MainTaxon).
	//
	// For example MainTaxon with the MainTaxonThreshold of 0.5 would correspond
	// to a taxon that contains at least half of all names. We use the managerial
	// classification of Catalogue of Life for the MainTaxon calculation.
	WithStats bool `json:"withStats"`

	// MainTaxonThreshold sets the minimal percentage of names in a taxon
	// to be counted as a MainTaxon of a text. This field is ignored if
	// WithStats is false.
	//
	// MainTaxon is a taxon that contains at least MainTaxonThreshold percentage
	// of all names (genus and below) in the text. We use the managerial
	// classification of Catalogue of Life for the MainTaxon calculation.
	MainTaxonThreshold float32 `json:"mainTaxonThreshold"`
}

Input is options/parameters for the Verify method.

type Kingdom added in v0.4.2

type Kingdom struct {
	// KingdomName is the name of a kingdom.
	KingdomName string `json:"kingdomName"`

	// NamesNumber is the number of names found in a kingdom.
	NamesNumber int `json:"namesNumber"`

	// Percentage is a percentage of names found in a kingdom.
	Percentage float32 `json:"percentage"`
}

Kingdom provides statistics of matched names found in a particular kingdom.

type MatchTypeValue

type MatchTypeValue int

MatchTypeValue describes how a name-string matched a name in gnames database.

const (
	// NoMatch means that matching failed.
	NoMatch MatchTypeValue = iota

	// PartialFuzzy is the same as PartialExact, but also the match was not
	// exact. We never do fuzzy matches for uninomials, due to high rate of false
	// positives.
	PartialFuzzy

	// PartialExact used if GNames failed to match full name string. Now the match
	// happened by removing either middle species epithets, or by choppping the
	// 'tail' words of the input name-string canonical form.
	PartialExact

	// Fuzzy means that matches were not exact due to similarity of name-strings,
	// OCR or typing errors. Take these results with more suspition than
	// Exact matches. Fuzzy match is never done on uninomials due to the
	// high rate of false positives.
	Fuzzy

	// Exact means either canonical form, or the whole name-string matched
	// perfectlly.
	Exact

	// ExactSpeciesGroup means that match happened not with the name, but
	// with either an autonym (botany)/coordinated name (zoology) of species,
	// or binomial part of a trinomial.
	ExactSpeciesGroup

	// Virus names are matched in the database. `Virus` is a wide
	// term and includes a variety of non-cellular terms (virus, prion, plasmid,
	// vector etc.)
	Virus

	// FacetedSearch is a match made by search procedure. It does not happen
	// during verification.
	FacetedSearch
)

func NewMatchType

func NewMatchType(t string) MatchTypeValue

NewMatchType takes a string and converts it into a MatchType. If the string is unkown, it returns NoMatch type.

func (MatchTypeValue) MarshalJSON

func (mt MatchTypeValue) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaller interface and converts MatchType into a string.

func (MatchTypeValue) String

func (mt MatchTypeValue) String() string

String implements fmt.String interface and returns a string representation of a MatchType. The returned string can be converted back to MatchType via NewMatchType function.

func (*MatchTypeValue) UnmarshalJSON

func (mt *MatchTypeValue) UnmarshalJSON(bs []byte) error

UnmarshalJSON implements json.Unmarshaller interface and converts a string into MatchType.

type Meta added in v0.4.1

type Meta struct {
	// NamesNumber is the number of name-strings in the request.
	NamesNumber int `json:"namesNumber"`

	// WithAllSources indicates if `Results` will include all matched
	// sources.
	WithAllSources bool `json:"withAllSources,omitempty"`

	// WithAllMatches indicates if response provides more then one result
	// per source, if such results were found.
	WithAllMatches bool `json:"withAllMatches,omitempty"`

	// WithStats indicates that the kingdom and a taxon that contain
	// majority of names (MainTaxon) will be calculated.
	WithStats bool `json:"withStats,omitempty"`

	// WithCapitalization is true, if there was a request to capitalize input
	WithCapitalization bool `json:"withCapitalization,omitempty"`

	// WithSpeciesGroup is true, if Input included `WithSpeciesGroup` option.
	WithSpeciesGroup bool `json:"withSpeciesGroup,omitempty"`

	// DataSources provides IDs of data-sources from the request.
	DataSources []int `json:"dataSources,omitempty"`

	// MainTaxonThreshold provides a minimal percentage names that a taxon should
	// have to be qualified as a MainTaxon.
	MainTaxonThreshold float32 `json:"mainTaxonThreshold,omitempty"`

	// StatsNamesNum is the number of names qualified for MainTaxon/Kingdoms
	// calculation.
	StatsNamesNum int `json:"statsNamesNum,omitempty"`

	// MainTaxon provides the lowest taxon that contains most of the names from
	// the request.
	//
	// Non-matched names, names that are not in the Catalogue of Life, names
	// higher than genus are not part of the calculation.
	MainTaxon string `json:"mainTaxon,omitempty"`

	// MainTaxonPercentage indicates the percentage of names that are placed
	// in the MainTaxon. This number should be higher than
	// MainTaxonThreshold unless MainTaxon is empty.
	MainTaxonPercentage float32 `json:"mainTaxonPercentage,omitempty"`

	// Kingdom provides what kingdom includes the majority of names from the
	// request accorging to the managerial classification of Catalogue of Life.
	//
	// Non-matched names, or names that are not in Catalogue of Life are
	// not part of the calculation.
	Kingdom string `json:"kingdom,omitempty"`

	// KingdomPercentage provides the percentage of names in the most
	// prevalent kingdom.
	//
	// Non-matched names, or names that are not in Catalogue of Life are
	// not part of the calculation.
	KingdomPercentage float32 `json:"kingdomPercentage,omitempty"`

	// Kingdoms provides all kingdoms with matched names and names distribution
	// between the kingdoms.
	Kingdoms []Kingdom `json:"kingdoms,omitempty"`
}

Meta is metadata of the request. It provides information about parameters used for the request, and, optionally give information about the kingdom that contains most of the names from the request, as well as the lowest taxon that contains majority of the names.

type Name added in v0.4.1

type Name struct {
	// ID is a UUIDv5 generated out of the Input string.
	ID string `json:"id"`

	// Name is a verified name-string
	Name string `json:"name"`

	// Cardinality is the cardinality of input name:
	// 0 - No match, virus or hybrid formula,
	// 1 - Uninomial, 2 - Binomial, 3 - Trinomial etc.
	Cardinality int `json:"cardinality"`

	// MatchType is best available match.
	MatchType MatchTypeValue `json:"matchType"`

	// BestResult is the best result according to GNames scoring.
	BestResult *ResultData `json:"bestResult,omitempty"`

	// Results contain all detected matches from preverred data sources
	// provided by user.
	Results []*ResultData `json:"results,omitempty"`

	// DataSourcesNum is a number of data sources that matched an
	// input name-string.
	DataSourcesNum int `json:"dataSourcesNum,omitempty"`

	// DataSourcesIDs is a list of ids of all data-sources with a match.
	DataSourcesIDs []int `json:"dataSourcesIds,omitempty"`

	// Curation estimates reliability of matched data sources. If
	// matches are returned by at least one manually curated data source, or by
	// automatically curated data source, or only by sources that are not
	// significantly manually curated.
	Curation CurationLevel `json:"curation"`

	// OverloadDetected might be triggered if a virus name or a canonical name
	// contain many variations and/or strains. In this case not all data are
	// queried.
	OverloadDetected string `json:"overloadDetected,omitempty"`

	// Error provides an error message, if any. If error is not empty, the match
	// failed because of a bug in the service.
	Error string `json:"error,omitempty"`
}

Name is a result of verification of one name-string from the input.

func (Name) Taxons added in v0.12.0

func (n Name) Taxons() []stats.Taxon

type Output added in v0.5.2

type Output struct {
	// Meta is the metadata of the request results.
	Meta `json:"metadata"`
	// Names are results of name-verification.
	Names []Name `json:"names"`
}

Output is a result returned by Verify method.

type ResultData

type ResultData struct {
	// DataSourceID is the ID of a matched DataSource.
	DataSourceID int `json:"dataSourceId"`

	// Shortened/abbreviated title of the data source.
	DataSourceTitleShort string `json:"dataSourceTitleShort"`

	// Curation of the data source.
	Curation CurationLevel `json:"curation"`

	// RecordID from a data source. We try our best to return ID that corresponds to
	// dwc:taxonID of a DataSource. If such ID is not provided, this ID will be
	// auto-generated.  Auto-generated IDs will have 'gn_' prefix.
	RecordID string `json:"recordId"`

	// GlobalID that is exposed globally by a DataSource. Such IDs are usually
	// self-resolved, like for example LSID, pURL, DOI etc.
	GlobalID string `json:"globalId,omitempty"`

	// LocalID used by a DataSource internally. If an OutLink field is provided,
	// LocalID serves as a 'dynamic' component of the URL.
	LocalID string `json:"localId,omitempty"`

	// Outlink to the record in the DataSource. It consists of a 'stable'
	// URL and an appended 'dynamic' LocalID
	Outlink string `json:"outlink,omitempty"`

	// EntryDate is a timestamp created on entry of the data.
	EntryDate string `json:"entryDate"`

	// SortScore is a numeric representation of the whole score.
	// It can be used to find the BestMatch overall, as well as the
	// best match for every data-source.
	//
	// SortScore takes data from all other scores, using the priority
	// sequence from highest to lowest: InfraSpecificRankScore, FuzzyLessScore,
	// CuratedDataScore, AuthorMatchScore, AcceptedNameScore,
	// ParsingQualityScore. Every highest priority trumps everything below.
	// When the final score value is calculated, it is used to
	// sort verification or search results.
	//
	// Comparing this score between results of different verifications will
	// not necessary be accurate. The score is used for comparison of names
	// from the same result.
	SortScore float64 `json:"sortScore"`

	// ParsingQuality determines how well gnparser was able to break the
	// name-string to its components. 0 - no parse, 1 - clean parse,
	// 2 - some problems, 3 - significant problems.
	ParsingQuality int `json:"-"`

	// MatchedName is a name-string from the DataSource that was matched
	// by GNames algorithm.
	MatchedName string `json:"matchedName"`

	// MatchCardinality is the cardinality of returned name:
	// 0 - No match, virus or hybrid formula,
	// 1 - Uninomial, 2 - Binomial, 3 - trinomial etc.
	MatchedCardinality int `json:"matchedCardinality"`

	// MatchedCanonicalSimple is a simplified canonical form without ranks for
	// names lower than species, and with omitted hybrid signs for named hybrids.
	// Quite often simple canonical is the same as full canonical. Hybrid signs
	// are preserved for hybrid formulas.
	MatchedCanonicalSimple string `json:"matchedCanonicalSimple,omitempty"`

	// MatchedCanonicalFull is a canonical form that preserves hybrid signs
	// and infraspecific ranks.
	MatchedCanonicalFull string `json:"matchedCanonicalFull,omitempty"`

	// MatchedAuthors is a list of authors mentioned in the name.
	MatchedAuthors []string `json:"-"`

	// MatchedYear is a year mentioned in the name. Multiple years or
	// approximate years are ignored.
	MatchedYear int `json:"-"`

	// CurrentRecordID is the id of currently accepted name given by
	// the data-source.
	CurrentRecordID string `json:"currentRecordId"`

	// CurrentName is a currently accepted name (it is only provided by
	// DataSources with taxonomic data).
	CurrentName string `json:"currentName"`

	// CurrentCardinality is a cardinality of the accepted name.
	// It might differ from the matched name cardinality.
	CurrentCardinality int `json:"currentCardinality"`

	// CurrentCanonicalSimple is a canonical form for the currently accepted name.
	CurrentCanonicalSimple string `json:"currentCanonicalSimple"`

	// CurrentCanonicalFull is a full version of canonicall form for the
	// currently accepted name.
	CurrentCanonicalFull string `json:"currentCanonicalFull"`

	// IsSynonym is true if there is an indication in the DataSource that the
	// name is not a currently accepted name for one or another reason.
	IsSynonym bool `json:"isSynonym"`

	// ClassificationPath to the name (if provided by the DataSource).
	// Classification path consists of a hierarchy of name-strings.
	ClassificationPath string `json:"classificationPath,omitempty"`

	// ClassificationRanks of the classification path. They follow the
	// same order as the classification path.
	ClassificationRanks string `json:"classificationRanks,omitempty"`

	// ClassificationIDs of the names-strings. They always correspond to
	// the "id" field.
	ClassificationIDs string `json:"classificationIds,omitempty"`

	// EditDistance is a Levenshtein edit distance between canonical form of the
	// input name-string and the matched canonical form. If match type is
	// "EXACT", edit-distance will be 0.
	EditDistance int `json:"editDistance"`

	// StemEditDistance is a Levenshtein edit distance after removing suffixes
	// from specific epithets from canonical forms.
	StemEditDistance int `json:"stemEditDistance"`

	//MatchType describes what kind of a match happened to a name-string.
	MatchType MatchTypeValue `json:"matchType"`

	// ScoreDetails provides data about matching of authors, year, rank,
	// parsingQuality...
	ScoreDetails `json:"scoreDetails"`

	// Vernacular names that correspond to the matched name. (Will be implemented
	// later)
	Vernaculars []Vernacular `json:"vernaculars,omitempty"`
}

ResultData are returned data of the `BestResult` or `Results` of name verification.

type ScoreDetails added in v0.6.5

type ScoreDetails struct {
	// CardinalityScore is 1 when cardinality of input name and match name
	// match and neither cardinality is 0. In all other cases it this score
	// equal 0.
	CardinalityScore float32 `json:"cardinalityScore"`

	// InfraSpecificRankScore matches infraspecific rank. For example if a
	// query name is `Aus bus var. cus`, and the match has the same rank,
	// this field is 1.
	InfraSpecificRankScore float32 `json:"infraSpecificRankScore"`

	// FuzzyLessScore scores edit distance for fuzzy matching. If edit distance
	// is 0 the score is maxed to 1.
	FuzzyLessScore float32 `json:"fuzzyLessScore"`

	// CuratedDataScore scores highest if the matched data-source is known for
	// having a significant manual curation effort of the data.
	CuratedDataScore float32 `json:"curatedDataScore"`

	// AuthorMatchScore tries to match authors and years in the name. If
	// a year and all authors match, the score is 1.
	AuthorMatchScore float32 `json:"authorMatchScore"`

	// AcceptedNameScore is a binary field, if matched name is also currently
	// accepted name according to the data-source, the value is 1.
	AcceptedNameScore float32 `json:"acceptedNameScore"`

	// ParsingQualityScore is the highest for matched names that were parsed
	// without any problems.
	ParsingQualityScore float32 `json:"parsingQualityScore"`
}

ScoreDetails provides explanations how sorting of result occures and why something became selected as the `BestResult`. Score data for every item is normalized to a range from 0 to 1 where 0 means there were no match by the factor, and 1 means a "perfect" match by the item. Fields located higher on the list have more weight than lower fields. It means that lower fields are getting into account only if higher fields provide equal values. For all scores 1 is the best, 0 is the worst.

type Vernacular

type Vernacular struct {
	Name string `json:"name"`

	// Language of the name, hopefully in ISO form.
	Language string `json:"language,omitempty"`

	// Locality is geographic places where the name is used.
	Locality string `json:"locality,omitempty"`
}

Vernacular name

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL