siegfried

package module
v1.11.0-rc6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 15, 2023 License: Apache-2.0 Imports: 23 Imported by: 18

README

Siegfried

Siegfried is a signature-based file format identification tool, implementing:

  • the National Archives UK's PRONOM file format signatures
  • freedesktop.org's MIME-info file format signatures
  • the Library of Congress's FDD file format signatures (beta).
  • Wikidata (beta).
Version

1.11.1

GoDoc Go Report Card

Usage

Command line
sf file.ext
sf *.ext
sf DIR
Options
sf -csv file.ext | *.ext | DIR             // Output CSV rather than YAML
sf -json file.ext | *.ext | DIR            // Output JSON rather than YAML
sf -droid file.ext | *.ext | DIR           // Output DROID CSV rather than YAML
sf -nr DIR                                 // Don't scan subdirectories
sf -z file.zip | *.ext | DIR               // Decompress and scan zip, tar, gzip, warc, arc
sf -zs gzip,tar file.tar.gz | *.ext | DIR  // Selectively decompress and scan 
sf -hash md5 file.ext | *.ext | DIR        // Calculate md5, sha1, sha256, sha512, or crc hash
sf -sig custom.sig *.ext | DIR             // Use a custom signature file
sf -                                       // Scan stream piped to stdin
sf -name file.ext -                        // Provide filename when scanning stream 
sf -f myfiles.txt                          // Scan list of files and directories
sf -v | -version                           // Display version information
sf -home c:\junk -sig custom.sig file.ext  // Use a custom home directory
sf -serve hostname:port                    // Server mode
sf -throttle 10ms DIR                      // Pause for duration (e.g. 1s) between file scans
sf -multi 256 DIR                          // Scan multiple (e.g. 256) files in parallel 
sf -log [comma-sep opts] file.ext          // Log errors etc. to stderr (default) or stdout
sf -log e,w file.ext | *.ext | DIR         // Log errors and warnings to stderr
sf -log u,o file.ext | *.ext | DIR         // Log unknowns to stdout
sf -log d,s file.ext | *.ext | DIR         // Log debugging and slow messages to stderr
sf -log p,t DIR > results.yaml             // Log progress and time while redirecting results
sf -log fmt/1,c DIR > results.yaml         // Log instances of fmt/1 and chart results
sf -replay -log u -csv results.yaml        // Replay results file, convert to csv, log unknowns
sf -setconf -multi 32 -hash sha1           // Save flag defaults in a config file
sf -setconf -serve :5138 -conf srv.conf    // Save/load named config file with '-conf filename' 
Example

asciicast

Signature files

By default, siegfried uses the latest PRONOM signatures without buffer limits (i.e. it may do full file scans). To use MIME-info or LOC signatures, or to add buffer limits or other customisations, use the roy tool to build your own signature file.

Install

With go installed:
go install github.com/richardlehane/siegfried/cmd/sf@latest

sf -update
Or, without go installed:
Win:

Download a pre-built binary from the releases page. Unzip to a location in your system path. Then run:

sf -update
Mac Homebrew (or Linuxbrew):
brew install mistydemeo/digipres/siegfried

Or, for the most recent updates, you can install from this fork:

brew install richardlehane/digipres/siegfried
Ubuntu/Debian (64 bit):
curl -sL "http://keyserver.ubuntu.com/pks/lookup?op=get&search=0x20F802FE798E6857" | gpg --dearmor | sudo tee /usr/share/keyrings/siegfried-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/siegfried-archive-keyring.gpg] https://www.itforarchivists.com/ buster main" | sudo tee -a /etc/apt/sources.list.d/siegfried.list
sudo apt-get update && sudo apt-get install siegfried
FreeBSD:
pkg install siegfried
Arch Linux:
git clone https://aur.archlinux.org/siegfried.git
cd siegfried
makepkg -si

Changes

v1.11.1 (2023-12-16)
Added
Changed
  • default location for siegfried HOME now follows XDG Base Directory Specification; see #216. Implemented by Bernhard Hampel-Waffenthal
  • siegfried prints version before erroring with failed signature load; requested by Ross Spencer
  • update PRONOM to v116
  • update LOC to 2023-12-14
  • update tika-mimetypes to v3.0.0-BETA
  • update freedesktop.org to v2.4
Fixed
  • panic on malformed zip file during container matching; reported by James Mooney
v1.10.1 (2023-04-24)
Fixed
  • glob expansion now only on Windows & when no explicit path match. Implemented by Bernhard Hampel-Waffenthal
  • compression algorithm for debian packages changed back to xz. Implemented by Paul Millar
  • -multi droid setting returned empty results when priority lists contained self-references. See #218
  • CGO disabled for debian package and linux binaries. See #219
v1.10.0 (2023-03-25)
Added
  • format classification included as "class" field in PRONOM results. Requested by Robin François. Implemented by Ross Spencer
  • -noclass flag added to roy build command. Use this flag to build signatures that omit the new "class" field from results.
  • glob paths can be used in place of file or directory paths for identification (e.g. sf *.jpg). Implemented by Ross Spencer
  • -multi droid setting for roy build command. Applies priorities after rather than during identification for more DROID-like results. Reported by David Clipsham
  • /update command for server mode. Requested by Luis Faria
Changed
  • new algorithm for dynamic multi-sequence matching for improved wildcard performance
  • update PRONOM to v111
  • update LOC to 2023-01-27
  • update tika-mimetypes to v2.7.0
  • minimum go version to build siegfried is now 1.18
Fixed
  • archivematica extensions built into wikidata signatures. Reported by Ross Spencer
  • trailing slash for folder paths in URI field in droid output. Reported by Philipp Wittwer
  • crash when using sf -replay with droid output

See the CHANGELOG for the full history.

Rights

Copyright 2024 Richard Lehane, Ross Spencer

Licensed under the Apache License, Version 2.0

Announcements

Join the Google Group for updates, signature releases, and help.

Contributing

Like siegfried and want to get involved in its development? That'd be wonderful! There are some notes on the wiki to get you started, and please get in touch.

Thanks

Thanks TNA for http://www.nationalarchives.gov.uk/pronom/ and http://www.nationalarchives.gov.uk/information-management/projects-and-work/droid.htm

Thanks Ross for https://github.com/exponential-decay/skeleton-test-suite-generator and http://exponentialdecay.co.uk/sd/index.htm, both are very handy!

Thanks Misty for the brew and ubuntu packaging

Thanks Steffen for the FreeBSD and Arch Linux packaging

Documentation

Overview

Package siegfried identifies file formats

Example:

s, err := siegfried.Load("pronom.sig")
if err != nil {
	log.Fatal(err)
}
f, err := os.Open("file")
if err != nil {
	log.Fatal(err)
}
defer f.Close()
ids, err := s.Identify(f, "filename.ext", "application/xml")
if err != nil {
	log.Fatal(err)
}
for _, id := range ids {
	fmt.Println(id)
}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Siegfried

type Siegfried struct {
	// immutable fields
	C time.Time // signature create time
	// contains filtered or unexported fields
}

Siegfried structs are persisent objects that can be serialised to disk and used to identify file formats. They contain three matchers as well as a slice of identifiers. When identifiers are added to a Siegfried struct, they are registered with each matcher.

func Load

func Load(path string) (*Siegfried, error)

Load creates a Siegfried struct and loads content from path

func LoadReader added in v1.7.12

func LoadReader(r io.Reader) (*Siegfried, error)

LoadReader creates a Siegfried struct and loads content from a reader

func New

func New() *Siegfried

New creates a new Siegfried struct. It initializes the three matchers.

Example:

s := New()
p, err := pronom.New() // create a new PRONOM identifier
if err != nil {
	log.Fatal(err)
}
err = s.Add(p) // add the identifier to the Siegfried
if err != nil {
	log.Fatal(err)
}
err = s.Save("pronom.sig") // save the Siegfried

func (*Siegfried) Add

func (s *Siegfried) Add(i core.Identifier) error

Add adds an identifier to a Siegfried struct.

func (*Siegfried) Blame

func (s *Siegfried) Blame(idx, ct int, cn string) string

Blame checks with the byte matcher to see what identification results subscribe to a particular result or test tree index. It can be used when identifying in a debug mode to check which identification results trigger which strikes.

func (*Siegfried) Buffer

func (s *Siegfried) Buffer(r io.Reader) (*siegreader.Buffer, error)

Buffer gets a siegreader buffer from the pool

func (*Siegfried) Fields

func (s *Siegfried) Fields() [][]string

Fields returns a slice of the names of the fields in each identifier.

func (*Siegfried) Identifiers added in v1.7.1

func (s *Siegfried) Identifiers() [][2]string

Identifiers returns a slice of the names and details of each identifier.

func (*Siegfried) Identify

func (s *Siegfried) Identify(r io.Reader, name, mime string) ([]core.Identification, error)

Identify identifies a stream or file object. It takes an io.Reader and the name and mimetype of the file/stream (if unknown, give empty strings). It returns a slice of identifications and an error.

func (*Siegfried) IdentifyBuffer added in v1.7.1

func (s *Siegfried) IdentifyBuffer(buffer *siegreader.Buffer, err error, name, mime string) ([]core.Identification, error)

IdentifyBuffer identifies a siegreader buffer. Supply the error from Get as the second argument.

func (*Siegfried) Inspect

func (s *Siegfried) Inspect(t core.MatcherType) string

Inspect returns a string containing detail about the various matchers in the Siegfried struct.

func (*Siegfried) Label added in v1.7.1

func (s *Siegfried) Label(id core.Identification) [][2]string

Label takes the values of a core.Identification and returns a slice that pairs these values with the relevant identifier's field labels.

func (*Siegfried) Put added in v1.7.1

func (s *Siegfried) Put(buffer *siegreader.Buffer)

Put returns a siegreader buffer to the pool

func (*Siegfried) Save

func (s *Siegfried) Save(path string) error

Save persists a Siegfried struct to disk (path)

func (*Siegfried) SaveWriter added in v1.7.12

func (s *Siegfried) SaveWriter(w io.Writer) error

SaveWriter persists a Siegfried struct to an io.Writer

Directories

Path Synopsis
cmd
roy
sf
internal
bytematcher
Package bytematcher builds a matching engine from a set of signatures and performs concurrent matching against an input siegreader.Buffer.
Package bytematcher builds a matching engine from a set of signatures and performs concurrent matching against an input siegreader.Buffer.
bytematcher/frames
Package frames describes the Frame interface.
Package frames describes the Frame interface.
bytematcher/frames/tests
Package tests exports shared frames and signatures for use by the other bytematcher packages
Package tests exports shared frames and signatures for use by the other bytematcher packages
bytematcher/patterns
Package patterns describes the Pattern interface.
Package patterns describes the Pattern interface.
bytematcher/patterns/tests
Package tests exports shared patterns for use by the other bytematcher packages
Package tests exports shared patterns for use by the other bytematcher packages
persist
Package persist marshals and unmarshals siegfried signatures as binary data
Package persist marshals and unmarshals siegfried signatures as binary data
priority
Package priority creates a subordinate-superiors map of identifications.
Package priority creates a subordinate-superiors map of identifications.
siegreader
Package siegreader implements multiple independent Readers (and ReverseReaders) from a single Buffer.
Package siegreader implements multiple independent Readers (and ReverseReaders) from a single Buffer.
pkg
config
Package config sets up defaults used by both the SF and roy tools Config options can be overridden with build flags e.g.
Package config sets up defaults used by both the SF and roy tools Config options can be overridden with build flags e.g.
core
Package core defines a set of core interfaces: Identifier, Recorder, Identification, and Matcher
Package core defines a set of core interfaces: Identifier, Recorder, Identification, and Matcher
decompress
Package decompress provides zip, tar, gzip and webarchive decompression/unpacking
Package decompress provides zip, tar, gzip and webarchive decompression/unpacking
lib
loc
pronom
Define custom patterns (implementing the siegfried.Pattern interface) for the different patterns allowed by the PRONOM spec.
Define custom patterns (implementing the siegfried.Pattern interface) for the different patterns allowed by the PRONOM spec.
pronom/internal/mappings
Package mappings contains struct mappings to unmarshal three different PRONOM XML formats: the DROID signature file format, the report format, and the container format.
Package mappings contains struct mappings to unmarshal three different PRONOM XML formats: the DROID signature file format, the report format, and the container format.
static
Code generated by go generate; DO NOT EDIT.
Code generated by go generate; DO NOT EDIT.
wikidata
Package wikidata contains the majority of the functions needed to build a Wikidata identifier (compiled signature file) compatible with Siegfried.
Package wikidata contains the majority of the functions needed to build a Wikidata identifier (compiled signature file) compatible with Siegfried.
wikidata/internal/converter
Convert file-format signature sequences to something compatible with Siegfried's identifiers.
Convert file-format signature sequences to something compatible with Siegfried's identifiers.
wikidata/internal/mappings
Package mappings provides data structures and helpers that describe Wikidata signature resources that we want to work with.
Package mappings provides data structures and helpers that describe Wikidata signature resources that we want to work with.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL