Documentation ¶
Overview ¶
Package wikidata contains the majority of the functions needed to build a Wikidata identifier (compiled signature file) compatible with Siegfried. Package Wikidata then also contains the majority of the functions required to enable Siegfried to consume that same identifier. The ability to do this is enabled by implementing Siegfried's Identifier and Parseable interfaces.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrNoEndpoint = errors.New("Endpoint in custom Wikibase sparql results not set")
ErrNoEndpoint provides a method of validating the error received from this package when the custom SPARQL endpoint cannot be read from the harvest data.
Functions ¶
func GetBOFandEOFFromConfig ¶ added in v1.9.2
func GetBOFandEOFFromConfig()
GetBOFandEOFFromConfig will read the current value of the BOF/EOF properties from the configuration, e.g. after being updated using a custom SPARQL query.
func GetPronomURIFromConfig ¶ added in v1.9.2
func GetPronomURIFromConfig()
GetPronomURIFromConfig will read the current value of the PRONOM properties from the configuration, e.g. after being updated using a custom SPARQL query.
func Load ¶
func Load(ls *persist.LoadSaver) core.Identifier
Load back into memory from the signature file the same information that we wrote to the file using Save().
func New ¶
func New(opts ...config.Option) (core.Identifier, error)
New is the entry point for an Identifier when it is compiled by the Roy tool to a brand new signature file.
New will read a Wikidata report, and parse its information into structures suitable for compilation by Roy.
New will also update its identification information with provenance-like info. It will enable signature extensions to be added by the utility, and enables configuration to be applied as well.
Types ¶
type ByteSequence ¶
type ByteSequence = mappings.ByteSequence
ByteSequence provides an alias for the mappings.ByteSequence object.
type Identification ¶
type Identification struct { Namespace string // Namespace of the identifier, e.g. this will be the 'wikidata' namespace. ID string // QID of the file format according to Wikidata. Name string // Complete name of the format identification. Often includes version. LongName string // IRI of the Wikidata record. MIME string // MIMEtypes associated with the record. Basis []string // Basis for the result returned by Siegfried. Source []string // Provenance information associated with the result. Permalink string // Permalink from the Wikibase record used to build the signature definition. Warning string // Warnings generated by Siegfried. // contains filtered or unexported fields }
Identification contains the result of a single ID for a file. There may be multiple, per file. The identification to the user looks something like as follows:
- ns : 'wikidata' id : 'Q1343830' format : 'Executable and Linkable Format' URI : 'http://www.wikidata.org/entity/Q1343830' mime : basis : 'byte match at 0, 4 (signature 1/5); byte match at 0, 7 (signature 4/5)' source : 'Gary Kessler”s File Signature Table (source date: 2017-08-08) PRONOM (Official (fmt/689))' warning :
func (Identification) Archive ¶
func (id Identification) Archive() config.Archive
Archive should tell us if any identifiers match those considered to be an archive format so that they can be extracted and the contents identified.
func (Identification) Known ¶
func (id Identification) Known() bool
Known returns false if the ID isn't recognized or true if so.
func (Identification) String ¶
func (id Identification) String() string
String creates a human readable representation of an identifier for output by fmt-like functions.
func (Identification) Values ¶
func (id Identification) Values() []string
Values returns a string slice containing each of the identifier segments.
func (Identification) Warn ¶
func (id Identification) Warn() string
Warn returns the warning associated with an identification.
type Identifier ¶
type Identifier struct { *identifier.Base // contains filtered or unexported fields }
Identifier contains a set of Wikidata records and an implementation of the identifier interface for consuming.
func (*Identifier) Fields ¶
func (i *Identifier) Fields() []string
Fields describes a portion of YAML that will be output by Siegfried's identifier for an individual match. E.g.
matches : - ns : 'wikidata' id : 'Q475488' format : 'EPUB' ... : '...' ... : '...' custom : 'your custom field' custom : '...'
siegfried/pkg/writer/writer.go normalizes the output of this field grouping so that if it sees certain fields, e.g. namespace, then it can convert that to something anticipated by the consumer,
e.g. namespace => becomes => ns
func (*Identifier) Recorder ¶
func (i *Identifier) Recorder() core.Recorder
Recorder provides a recorder for matching.
func (*Identifier) Save ¶
func (i *Identifier) Save(ls *persist.LoadSaver)
Save will write a Wikidata identifier to the Siegfried signature file using the persist package to save primitives in the identifier's data structure.
type Recorder ¶
type Recorder struct { *Identifier // contains filtered or unexported fields }
Recorder comment...
func (*Recorder) Active ¶
func (recorder *Recorder) Active(matcher core.MatcherType)
Active tells the recorder what matchers are active which helps when providing a detailed response to the caller.
func (*Recorder) Record ¶
Record will build possible results sets associated with an identification.
func (*Recorder) Report ¶
func (recorder *Recorder) Report() []core.Identification
Report organizes the identification output so that the highest priority results are output first.
type Summary ¶
type Summary struct { AllSparqlResults int // All rows of data returned from our SPARQL request. CondensedSparqlResults int // All unique records once the SPARQL is processed. SparqlRowsWithSigs int // All SPARQL rows with signatures (SPARQL necessarily returns duplicates). RecordsWithPotentialSignatures int // Records that have signatures that can be processed. FormatsWithBadHeuristics int // Formats that have bad heuristics that we can't process. RecordsWithSignatures int // Records remaining that were processed. MultipleSequences int // Records that have been parsed out into multiple signatures per record. AllLintingMessages []string // All linting messages returned. AllLintingMessageCount int // Count of all linting messages output. RecordCountWithLintingMessages int // A count of the records that have linting messages to investigate. }
Summary of the identifier once processed.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
internal
|
|
converter
Convert file-format signature sequences to something compatible with Siegfried's identifiers.
|
Convert file-format signature sequences to something compatible with Siegfried's identifiers. |
mappings
Package mappings provides data structures and helpers that describe Wikidata signature resources that we want to work with.
|
Package mappings provides data structures and helpers that describe Wikidata signature resources that we want to work with. |