wikidata

package
v1.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 24, 2023 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package wikidata contains the majority of the functions needed to build a Wikidata identifier (compiled signature file) compatible with Siegfried. Package Wikidata then also contains the majority of the functions required to enable Siegfried to consume that same identifier. The ability to do this is enabled by implementing Siegfried's Identifier and Parseable interfaces.

Index

Constants

This section is empty.

Variables

View Source
var ErrNoEndpoint = errors.New("Endpoint in custom Wikibase sparql results not set")

ErrNoEndpoint provides a method of validating the error received from this package when the custom SPARQL endpoint cannot be read from the harvest data.

Functions

func GetBOFandEOFFromConfig added in v1.9.2

func GetBOFandEOFFromConfig()

GetBOFandEOFFromConfig will read the current value of the BOF/EOF properties from the configuration, e.g. after being updated using a custom SPARQL query.

func GetPronomURIFromConfig added in v1.9.2

func GetPronomURIFromConfig()

GetPronomURIFromConfig will read the current value of the PRONOM properties from the configuration, e.g. after being updated using a custom SPARQL query.

func Load

func Load(ls *persist.LoadSaver) core.Identifier

Load back into memory from the signature file the same information that we wrote to the file using Save().

func New

func New(opts ...config.Option) (core.Identifier, error)

New is the entry point for an Identifier when it is compiled by the Roy tool to a brand new signature file.

New will read a Wikidata report, and parse its information into structures suitable for compilation by Roy.

New will also update its identification information with provenance-like info. It will enable signature extensions to be added by the utility, and enables configuration to be applied as well.

Types

type ByteSequence

type ByteSequence = mappings.ByteSequence

ByteSequence provides an alias for the mappings.ByteSequence object.

type Identification

type Identification struct {
	Namespace string   // Namespace of the identifier, e.g. this will be the 'wikidata' namespace.
	ID        string   // QID of the file format according to Wikidata.
	Name      string   // Complete name of the format identification. Often includes version.
	LongName  string   // IRI of the Wikidata record.
	MIME      string   // MIMEtypes associated with the record.
	Basis     []string // Basis for the result returned by Siegfried.
	Source    []string // Provenance information associated with the result.
	Permalink string   // Permalink from the Wikibase record used to build the signature definition.
	Warning   string   // Warnings generated by Siegfried.
	// contains filtered or unexported fields
}

Identification contains the result of a single ID for a file. There may be multiple, per file. The identification to the user looks something like as follows:

  • ns : 'wikidata' id : 'Q1343830' format : 'Executable and Linkable Format' URI : 'http://www.wikidata.org/entity/Q1343830' mime : basis : 'byte match at 0, 4 (signature 1/5); byte match at 0, 7 (signature 4/5)' source : 'Gary Kessler”s File Signature Table (source date: 2017-08-08) PRONOM (Official (fmt/689))' warning :

func (Identification) Archive

func (id Identification) Archive() config.Archive

Archive should tell us if any identifiers match those considered to be an archive format so that they can be extracted and the contents identified.

func (Identification) Known

func (id Identification) Known() bool

Known returns false if the ID isn't recognized or true if so.

func (Identification) String

func (id Identification) String() string

String creates a human readable representation of an identifier for output by fmt-like functions.

func (Identification) Values

func (id Identification) Values() []string

Values returns a string slice containing each of the identifier segments.

func (Identification) Warn

func (id Identification) Warn() string

Warn returns the warning associated with an identification.

type Identifier

type Identifier struct {
	*identifier.Base
	// contains filtered or unexported fields
}

Identifier contains a set of Wikidata records and an implementation of the identifier interface for consuming.

func (*Identifier) Fields

func (i *Identifier) Fields() []string

Fields describes a portion of YAML that will be output by Siegfried's identifier for an individual match. E.g.

matches  :
  - ns      : 'wikidata'
    id      : 'Q475488'
    format  : 'EPUB'
    ...     : '...'
    ...     : '...'
    custom  : 'your custom field'
    custom  : '...'

siegfried/pkg/writer/writer.go normalizes the output of this field grouping so that if it sees certain fields, e.g. namespace, then it can convert that to something anticipated by the consumer,

e.g. namespace => becomes => ns

func (*Identifier) Recorder

func (i *Identifier) Recorder() core.Recorder

Recorder provides a recorder for matching.

func (*Identifier) Save

func (i *Identifier) Save(ls *persist.LoadSaver)

Save will write a Wikidata identifier to the Siegfried signature file using the persist package to save primitives in the identifier's data structure.

type Recorder

type Recorder struct {
	*Identifier
	// contains filtered or unexported fields
}

Recorder comment...

func (*Recorder) Active

func (recorder *Recorder) Active(matcher core.MatcherType)

Active tells the recorder what matchers are active which helps when providing a detailed response to the caller.

func (*Recorder) Record

func (recorder *Recorder) Record(matcher core.MatcherType, result core.Result) bool

Record will build possible results sets associated with an identification.

func (*Recorder) Report

func (recorder *Recorder) Report() []core.Identification

Report organizes the identification output so that the highest priority results are output first.

func (*Recorder) Satisfied

func (recorder *Recorder) Satisfied(mt core.MatcherType) (bool, core.Hint)

Satisfied is drawn from the PRONOM identifier and tells us whether or not we should continue with any particular matcher...

type Signature

type Signature = mappings.Signature

Signature provides an alias for mappings.Signature for convenience.

type Summary

type Summary struct {
	AllSparqlResults               int      // All rows of data returned from our SPARQL request.
	CondensedSparqlResults         int      // All unique records once the SPARQL is processed.
	SparqlRowsWithSigs             int      // All SPARQL rows with signatures (SPARQL necessarily returns duplicates).
	RecordsWithPotentialSignatures int      // Records that have signatures that can be processed.
	FormatsWithBadHeuristics       int      // Formats that have bad heuristics that we can't process.
	RecordsWithSignatures          int      // Records remaining that were processed.
	MultipleSequences              int      // Records that have been parsed out into multiple signatures per record.
	AllLintingMessages             []string // All linting messages returned.
	AllLintingMessageCount         int      // Count of all linting messages output.
	RecordCountWithLintingMessages int      // A count of the records that have linting messages to investigate.
}

Summary of the identifier once processed.

func (Summary) String

func (summary Summary) String() string

String will serialize the summary report as JSON to be printed.

Directories

Path Synopsis
internal
converter
Convert file-format signature sequences to something compatible with Siegfried's identifiers.
Convert file-format signature sequences to something compatible with Siegfried's identifiers.
mappings
Package mappings provides data structures and helpers that describe Wikidata signature resources that we want to work with.
Package mappings provides data structures and helpers that describe Wikidata signature resources that we want to work with.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL