siegfried

package module
v1.7.13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 18, 2019 License: Apache-2.0 Imports: 22 Imported by: 17

README

Siegfried

Siegfried is a signature-based file format identification tool, implementing:

  • the National Archives UK's PRONOM file format signatures
  • freedesktop.org's MIME-info file format signatures
  • the Library of Congress's FDD file format signatures (beta).
Version

1.7.13

Build Status GoDoc Go Report Card

Usage

Command line
sf file.ext
sf DIR
Options
sf -csv file.ext | DIR                     // Output CSV rather than YAML
sf -json file.ext | DIR                    // Output JSON rather than YAML
sf -droid file.ext | DIR                   // Output DROID CSV rather than YAML
sf -nr DIR                                 // Don't scan subdirectories
sf -z file.zip | DIR                       // Decompress and scan zip, tar, gzip, warc, arc
sf -hash md5 file.ext | DIR                // Calculate md5, sha1, sha256, sha512, or crc hash
sf -sig custom.sig file.ext                // Use a custom signature file
sf -                                       // Scan stream piped to stdin
sf -name file.ext -                        // Provide filename when scanning stream 
sf -f myfiles.txt                          // Scan list of files and directories
sf -v | -version                           // Display version information
sf -home c:\junk -sig custom.sig file.ext  // Use a custom home directory
sf -serve hostname:port                    // Server mode
sf -throttle 10ms DIR                      // Pause for duration (e.g. 1s) between file scans
sf -multi 256 DIR                          // Scan multiple (e.g. 256) files in parallel 
sf -log [comma-sep opts] file.ext | DIR    // Log errors etc. to stderr (default) or stdout
sf -log e,w file.ext | DIR                 // Log errors and warnings to stderr
sf -log u,o file.ext | DIR                 // Log unknowns to stdout
sf -log d,s file.ext | DIR                 // Log debugging and slow messages to stderr
sf -log p,t DIR > results.yaml             // Log progress and time while redirecting results
sf -log fmt/1,c DIR > results.yaml         // Log instances of fmt/1 and chart results
sf -replay -log u -csv results.yaml        // Replay results file, convert to csv, log unknowns
sf -setconf -multi 32 -hash sha1           // Save flag defaults in a config file
sf -setconf -serve :5138 -conf srv.conf    // Save/load named config file with '-conf filename' 
Example

asciicast

Signature files

By default, siegfried uses the latest PRONOM signatures without buffer limits (i.e. it may do full file scans). To use MIME-info or LOC signatures, or to add buffer limits or other customisations, use the roy tool to build your own signature file.

Install

With go installed:
go get github.com/richardlehane/siegfried/cmd/sf

sf -update
Or, without go installed:
Win:

Download a pre-built binary from the releases page. Unzip to a location in your system path. Then run:

sf -update
Mac Homebrew (or Linuxbrew):
brew install mistydemeo/digipres/siegfried

Or, for the most recent updates, you can install from this fork:

brew install richardlehane/digipres/siegfried
Ubuntu/Debian (64 bit):
wget -qO - https://bintray.com/user/downloadSubjectPublicKey?username=bintray | sudo apt-key add -
echo "deb http://dl.bintray.com/siegfried/debian wheezy main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update && sudo apt-get install siegfried
FreeBSD:
pkg install siegfried
Arch Linux:
git clone https://aur.archlinux.org/siegfried.git
cd siegfried
makepkg -si

Changes

v1.7.13 (2019-08-18)
Added
  • the -f flag now scans directories, as well as files. Requested by Harry Moss
Changed
  • update LOC signatures to 2019-06-16
  • update tika-mimetypes signatures to v1.22
Fixed
  • filenames with "?" were parsed as URLs, breaking name/extension matching; reported by workflowsguy
v1.7.12 (2019-06-15)
Changed
  • update PRONOM to v95
  • update LOC signatures to 2019-05-20
  • update tika-mimetypes signatures to v1.21
Fixed
  • .docx files with .doc extensions panic due to bug in division of hints in container matcher. Thanks to Jean-Séverin Lair for reporting and sharing samples and to VAIarchief for additional report with example.
  • mime-info signatures panic on some files due to duplicate entries in the freedesktop and tika signature files; spotted during an attempt at pair coding with Ross Spencer... thanks Ross and sorry for hogging the laptop! #125
v1.7.11 (2019-02-16)
Changed
  • update LOC signatures to 2019-01-06
  • update tika-mimetypes signatures to v1.20
Fixed
  • container matching can now match against directory names. Thanks Ross Spencer for reporting and for the sample SIARD signature file. Thanks Dave Clipsham, Martin Hoppenheit and Phillip Tommerholt for contributions on the ticket.
  • fixes to travis.yml for auto-deploy of debian release; #124
v1.7.10 (2018-09-19)
Added
  • print configuration defaults with sf -version
Changed
  • update PRONOM to v94
Fixed
  • LOC identifier fixed after regression in v1.7.9
  • remove skeleton-suite files triggering malware warnings by adding to .gitignore; reported by Dave Rice
  • release built with Go version 11, which includes a fix for a CIFS error that caused files to be skipped during file walk; reported by Maarten Savels
v1.7.9 (2018-08-30)
Added
  • save defaults in a configuration file: use the -setconf flag to record any other flags used into a config file. These defaults will be loaded each time you run sf. E.g. sf -multi 16 -setconf then sf DIR (loads the new multi default)
  • use -conf filename to save or load from a named config file. E.g. sf -multi 16 -serve :5138 -conf srv.conf -setconf and then sf -conf srv.conf
  • added -yaml flag so, if you set json/csv in default config :(, you can override with YAML instead. Choose the YAML!
Changed
  • the roy compare -join options that join on filepath now work better when comparing results with mixed windows and unix paths
  • exported decompress package to give more functionality for users of the golang API; requested by Byron Ruth
  • update LOC signatures to 2018-06-14
  • update freedesktop.org signatures to v1.10
  • update tika-mimetype signatures to v1.18
Fixed
  • misidentifications of some files e.g. ODF presentation due to sf quitting early on strong matches. Have adjusted this algorithm to make sf wait longer if there is evidence (e.g. from filename) that the file might be something else. Reported by Jean-Séverin Lair
  • read and other file errors caused sf to hang; reports by Greg Lepore and Andy Foster; fix contributed by Ross Spencer
  • bug reading streams where EOF returned for reads exactly adjacent the end of file
  • bug in mscfb library (race condition for concurrent access to a global variable)
  • some matches result in extremely verbose basis fields; reported by Nick Krabbenhoeft. Partly fixed: basis field now reports a single basis for a match but work remains to speed up matching for these cases.

See the CHANGELOG for the full history.

Rights

Copyright 2019 Richard Lehane

Licensed under the Apache License, Version 2.0

Announcements

Join the Google Group for updates, signature releases, and help.

Contributing

Like siegfried and want to get involved in its development? That'd be wonderful! There are some notes on the wiki to get you started, and please get in touch.

Thanks

Thanks TNA for http://www.nationalarchives.gov.uk/pronom/ and http://www.nationalarchives.gov.uk/information-management/projects-and-work/droid.htm

Thanks Ross for https://github.com/exponential-decay/skeleton-test-suite-generator and http://exponentialdecay.co.uk/sd/index.htm, both are very handy!

Thanks Misty for the brew and ubuntu packaging

Thanks Steffen for the FreeBSD and Arch Linux packaging

Documentation

Overview

Package siegfried identifies file formats

Example:

s, err := siegfried.Load("pronom.sig")
if err != nil {
	log.Fatal(err)
}
f, err := os.Open("file")
if err != nil {
	log.Fatal(err)
}
defer f.Close()
ids, err := s.Identify(f, "filename.ext", "application/xml")
if err != nil {
	log.Fatal(err)
}
for _, id := range ids {
	fmt.Println(id)
}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Siegfried

type Siegfried struct {
	// immutable fields
	C time.Time // signature create time
	// contains filtered or unexported fields
}

Siegfried structs are persisent objects that can be serialised to disk and used to identify file formats. They contain three matchers as well as a slice of identifiers. When identifiers are added to a Siegfried struct, they are registered with each matcher.

func Load

func Load(path string) (*Siegfried, error)

Load creates a Siegfried struct and loads content from path

func LoadReader added in v1.7.12

func LoadReader(r io.Reader) (*Siegfried, error)

LoadReader creates a Siegfried struct and loads content from a reader

func New

func New() *Siegfried

New creates a new Siegfried struct. It initializes the three matchers.

Example:

s := New()
p, err := pronom.New() // create a new PRONOM identifier
if err != nil {
	log.Fatal(err)
}
err = s.Add(p) // add the identifier to the Siegfried
if err != nil {
	log.Fatal(err)
}
err = s.Save("pronom.sig") // save the Siegfried

func (*Siegfried) Add

func (s *Siegfried) Add(i core.Identifier) error

Add adds an identifier to a Siegfried struct.

func (*Siegfried) Blame

func (s *Siegfried) Blame(idx, ct int, cn string) string

Blame checks with the byte matcher to see what identification results subscribe to a particular result or test tree index. It can be used when identifying in a debug mode to check which identification results trigger which strikes.

func (*Siegfried) Buffer

func (s *Siegfried) Buffer(r io.Reader) (*siegreader.Buffer, error)

Buffer gets a siegreader buffer from the pool

func (*Siegfried) Fields

func (s *Siegfried) Fields() [][]string

Fields returns a slice of the names of the fields in each identifier.

func (*Siegfried) Identifiers added in v1.7.1

func (s *Siegfried) Identifiers() [][2]string

Identifiers returns a slice of the names and details of each identifier.

func (*Siegfried) Identify

func (s *Siegfried) Identify(r io.Reader, name, mime string) ([]core.Identification, error)

Identify identifies a stream or file object. It takes an io.Reader and the name and mimetype of the file/stream (if unknown, give empty strings). It returns a slice of identifications and an error.

func (*Siegfried) IdentifyBuffer added in v1.7.1

func (s *Siegfried) IdentifyBuffer(buffer *siegreader.Buffer, err error, name, mime string) ([]core.Identification, error)

IdentifyBuffer identifies a siegreader buffer. Supply the error from Get as the second argument.

func (*Siegfried) Inspect

func (s *Siegfried) Inspect(t core.MatcherType) string

Inspect returns a string containing detail about the various matchers in the Siegfried struct.

func (*Siegfried) Label added in v1.7.1

func (s *Siegfried) Label(id core.Identification) [][2]string

Label takes the values of a core.Identification and returns a slice that pairs these values with the relevant identifier's field labels.

func (*Siegfried) Put added in v1.7.1

func (s *Siegfried) Put(buffer *siegreader.Buffer)

Put returns a siegreader buffer to the pool

func (*Siegfried) Save

func (s *Siegfried) Save(path string) error

Save persists a Siegfried struct to disk (path)

func (*Siegfried) SaveWriter added in v1.7.12

func (s *Siegfried) SaveWriter(w io.Writer) error

SaveWriter persists a Siegfried struct to an io.Writer

Directories

Path Synopsis
cmd
roy
sf
internal
bytematcher
Package bytematcher builds a matching engine from a set of signatures and performs concurrent matching against an input siegreader.Buffer.
Package bytematcher builds a matching engine from a set of signatures and performs concurrent matching against an input siegreader.Buffer.
bytematcher/frames
Package frames describes the Frame interface.
Package frames describes the Frame interface.
bytematcher/frames/tests
Package tests exports shared frames and signatures for use by the other bytematcher packages
Package tests exports shared frames and signatures for use by the other bytematcher packages
bytematcher/patterns
Package patterns describes the Pattern interface.
Package patterns describes the Pattern interface.
bytematcher/patterns/tests
Package tests exports shared patterns for use by the other bytematcher packages
Package tests exports shared patterns for use by the other bytematcher packages
persist
Package persist marshals and unmarshals siegfried signatures as binary data
Package persist marshals and unmarshals siegfried signatures as binary data
priority
Package priority creates a subordinate-superiors map of identifications.
Package priority creates a subordinate-superiors map of identifications.
siegreader
Package siegreader implements multiple independent Readers (and ReverseReaders) from a single Buffer.
Package siegreader implements multiple independent Readers (and ReverseReaders) from a single Buffer.
pkg
config
Package config sets up defaults used by both the SF and roy tools Config options can be overridden with build flags e.g.
Package config sets up defaults used by both the SF and roy tools Config options can be overridden with build flags e.g.
core
Package core defines a set of core interfaces: Identifier, Recorder, Identification, and Matcher
Package core defines a set of core interfaces: Identifier, Recorder, Identification, and Matcher
decompress
Package decompress provides zip, tar, gzip and webarchive decompression/unpacking
Package decompress provides zip, tar, gzip and webarchive decompression/unpacking
loc
pronom
Define custom patterns (implementing the siegfried.Pattern interface) for the different patterns allowed by the PRONOM spec.
Define custom patterns (implementing the siegfried.Pattern interface) for the different patterns allowed by the PRONOM spec.
pronom/internal/mappings
This file contains struct mappings to unmarshal three different PRONOM XML formats: the signature file format, the report format, and the container format
This file contains struct mappings to unmarshal three different PRONOM XML formats: the signature file format, the report format, and the container format

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL