ocfl

package module
v0.0.25 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 16, 2024 License: MIT Imports: 30 Imported by: 7

README

An OCFL implementation for Go

godocs

This is an implementation of the Oxford Common File Layout (OCFL) for Go. The API is under heavy development and will have constant breaking changes.

What is OCFL?

This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories. (https://ocfl.io/)

Functionality

Here is a high-level overview of what's working and what's not:

  • Filesystme and S3 backends
    • S3: support writing/copying large files (>5GiB).
  • Storage root creation and validation
  • Object creation and validation
  • Flexible API for 'staging' object changes between versions.
  • Support for OCFL v1.0 and v1.1
  • Reasonable test coverage
  • Ability to purge objects from a storage root
  • Consistent, informative error/log messages
  • Well-documented API
  • Stable API

Development

Requires go >= 1.21.

Documentation

Overview

This module is an implementation of the Oxford Common File Layout (OCFL) specification. The top-level package provides version-independent functionality. The ocflv1 package provides the bulk of implementation.

Index

Constants

View Source
const (
	SHA512  = `sha512`
	SHA256  = `sha256`
	SHA1    = `sha1`
	MD5     = `md5`
	BLAKE2B = `blake2b-512`
)
View Source
const (
	NamasteTypeObject = "ocfl_object" // type string for OCFL Object declaration
	NamasteTypeStore  = "ocfl"        // type string for OCFL Storage Root declaration
)
View Source
const (
	// package version
	Version       = "0.0.25"
	ExtensionsDir = "extensions"
)
View Source
const (
	Spec1_0 = Spec("1.0")
	Spec1_1 = Spec("1.1")
)

Variables

View Source
var (
	ErrNoNamaste       = fmt.Errorf("missing NAMASTE declaration: %w", fs.ErrNotExist)
	ErrNamasteInvalid  = errors.New("invalid NAMASTE declaration contents")
	ErrNamasteMultiple = errors.New("multiple NAMASTE declarations found")
)
View Source
var (
	ErrObjectNotFound = fmt.Errorf("missing object declaration: %w", ErrNoNamaste)
	ErrObjectExists   = fmt.Errorf("existing OCFL object declaration: %w", fs.ErrExist)
)
View Source
var (
	ErrSpecInvalid  = errors.New("invalid OCFL spec version")
	ErrSpecNotFound = errors.New("OCFL spec file not found")
)
View Source
var (
	ErrVNumInvalid = errors.New(`invalid version`)
	ErrVNumPadding = errors.New(`inconsistent version padding in version sequence`)
	ErrVNumMissing = errors.New(`missing version in version sequence`)
	ErrVerEmpty    = errors.New("no versions found")

	// Some functions in this package use the zero value VNum to indicate the
	// most recent, "head" version.
	Head = VNum{}
)
View Source
var ErrMapMakerExists = errors.New("path and digest exist")

ErrMapMakerExists is returned when calling Add with a path and digest that are already present in the MapMaker

View Source
var ErrNotFile = errors.New("not a file")
View Source
var ErrUnknownAlg = errors.New("unknown digest algorithm")

Functions

func Copy added in v0.0.23

func Copy(ctx context.Context, dstFS WriteFS, dst string, srcFS FS, src string) (err error)

Copy copies src in srcFS to dst in dstFS. If srcFS and dstFS are the same refererence and it implements CopyFS, then Copy uses the fs's Copy() method.

func DigestConcurrency

func DigestConcurrency() int

DigestConcurrency is a global configuration for the number of files to digest concurrently.

func DigestFS

func DigestFS(ctx context.Context, fsys FS, setupFunc func(addFile func(string, ...string) bool), resultFn func(string, DigestSet, error) error) error

DigestFS concurrently digests files in an FS. The setup function adds files to the work quue using the addFile function passed to it. addFile returns a bool indicating if the file was added to the queue. Results are passed back using the result function. If resultFn returns an error, not more results will be produced, and new calls to addFile will return false. DigestFS uses the value from DigestConcurrency() to determine to set the number of files that are digested concurrently.

func Files

func Files(ctx context.Context, fsys FS, pth PathSelector, fn func(name string) error) error

Files walks the directory tree under root, calling fn

func ObjectRoots

func ObjectRoots(ctx context.Context, fsys FS, sel PathSelector, fn func(*ObjectRoot) error) error

ObjectRoots searches root and its subdirectories for OCFL object declarations and calls fn for each object root it finds. The *ObjectRoot passed to fn is confirmed to have an object declaration, but no other validation checks are made.

func ParseVNum

func ParseVNum(v string, vn *VNum) error

ParseVNum parses string as an a VNum and sets the value referenced by vn.

func ReadNamaste added in v0.0.24

func ReadNamaste(ctx context.Context, fsys FS, name string) error

ReadNamaste validates a namaste declaration

func RegisterAlg added in v0.0.21

func RegisterAlg(alg string, newDigester func() Digester)

RegisterAlg registers the Digester constructor for alg, so that alg.New() can be used.

func RegisteredAlgs added in v0.0.22

func RegisteredAlgs() []string

RegisteredAlgs returns a slice of all available digest algorithms

func SetDigestConcurrency

func SetDigestConcurrency(i int)

SetDigestConcurrency sets the max number of files to digest concurrently.

func SetXferConcurrency

func SetXferConcurrency(i int)

SetXferConcurrency sets the maximum number of files transferred concurrently during a commit operation.

func WriteDeclaration

func WriteDeclaration(ctx context.Context, root WriteFS, dir string, d Namaste) error

func WriteSpecFile

func WriteSpecFile(ctx context.Context, fsys WriteFS, dir string, n Spec) (string, error)

func XferConcurrency

func XferConcurrency() int

XferConcurrency is a global configuration for the maximum number of files transferred concurrently during a commit operation. It defaults to runtime.NumCPU().

Types

type ContentSource added in v0.0.23

type ContentSource interface {
	// GetContent returns an FS and path to a file in FS for a file with the given digest.
	// If no content is associated with the digest, fsys is nil and path is an empty string.
	GetContent(digest string) (fsys FS, path string)
}

ContentSource is used to access content with a given digest when creating and upadting objects.

type CopyFS

type CopyFS interface {
	WriteFS
	// Copy creates or updates the file at dst with the contents of src. If dst
	// exists, it should be overwritten
	Copy(ctx context.Context, dst string, src string) error
}

CopyFS is a storage backend that supports copying files.

type DigestErr

type DigestErr struct {
	Name     string // Content path
	Alg      string // Digest algorithm
	Got      string // Calculated digest
	Expected string // Expected digest
}

DigestErr is returned when content's digest conflicts with an expected value

func (DigestErr) Error

func (e DigestErr) Error() string

type DigestMap

type DigestMap map[string][]string

DigestMap maps digests to file paths.

func (DigestMap) Clone added in v0.0.23

func (m DigestMap) Clone() DigestMap

func (DigestMap) Digests

func (m DigestMap) Digests() []string

Digests returns a slice of the digest values in the DigestMap. Digest strings are not normalized; they may be uppercase, lowercase, or mixed.

func (DigestMap) EachPath

func (m DigestMap) EachPath(fn func(pth, digest string) bool) bool

EachPath calls fn for each path in the Map. If fn returns false, iteration stops and EachPath returns false.

func (DigestMap) Eq

func (m DigestMap) Eq(other DigestMap) bool

Eq returns true if m and the other have the same content: they have the same (normalized) digests corresponding to the same set of paths. If either map has a digest conflict (same digest appears twice with different case), Eq returns false.

func (DigestMap) GetDigest

func (m DigestMap) GetDigest(p string) string

GetDigest returns the digest for path p or an empty string if the digest is not present.

func (DigestMap) HasDigestCase added in v0.0.23

func (m DigestMap) HasDigestCase() (hasLower bool, hasUpper bool)

HasDigestCase returns two booleans indicating whether m's digests use lowercase and uppercase characters.

func (DigestMap) Merge

func (m1 DigestMap) Merge(m2 DigestMap, replace bool) (DigestMap, error)

Merge returns a new DigestMap constructed by normalizing and merging m1 and m2. If a paths has different digests in m1 and m2, an error returned unless replace is true, in which case the value from m2 is used.

func (DigestMap) Normalize

func (m DigestMap) Normalize() (norm DigestMap, err error)

Normalize returns a normalized copy on m (with lowercase digests). An error is returned if m has a digest conflict.

func (DigestMap) NumPaths added in v0.0.23

func (m DigestMap) NumPaths() int

NumPaths returns the number of paths in the m

func (DigestMap) PathMap

func (m DigestMap) PathMap() PathMap

PathMap returns the DigestMap's contents as a map with path names for keys and digests for values. PathMap doesn't check if the same path appears twice in the DigestMap.

func (DigestMap) PathMapValid

func (m DigestMap) PathMapValid() (PathMap, error)

PathMapValid is like PathMap, except it returns an error if it encounters invalid path names or if the same path appears multiple times.

func (DigestMap) Paths

func (m DigestMap) Paths() []string

Paths returns a sorted slice of all path names in the DigestMap.

func (DigestMap) Remap

func (m DigestMap) Remap(fns ...RemapFunc)

func (DigestMap) Valid

func (m DigestMap) Valid() error

Valid returns a non-nil error if m is invalid.

type DigestSet

type DigestSet map[string]string

Set is a set of digest results

func (DigestSet) ConflictWith

func (s DigestSet) ConflictWith(other DigestSet) []string

ConflictWith returns keys in s with values that do not match the corresponding key in other.

func (DigestSet) Validate

func (s DigestSet) Validate(reader io.Reader) (err error)

Validate digests reader and return an error if the resulting digest for any algorithm in s doesn't match the value in s.

type Digester

type Digester interface {
	io.Writer
	// String() returns the digest value for the bytes written to the digester.
	String() string
}

Digester is an interface used for generating digest values.

func NewDigester added in v0.0.22

func NewDigester(alg string) Digester

New returns a new Digester for generated digest values. If a Digester constructor was not registered for a, nil is returne.

type FS

type FS interface {
	OpenFile(ctx context.Context, name string) (fs.File, error)
	ReadDir(ctx context.Context, name string) ([]fs.DirEntry, error)
}

FS is a minimal, read-only storage layer abstraction. It is similar to the standard library's io/fs.FS, except it uses contexts and OpenFile is not required to gracefully handle directories.

func DirFS

func DirFS(dir string) FS

DirFS is shorthand for NewFS(os.DirFS(dir))

func NewFS

func NewFS(fsys fs.FS) FS

NewFS wraps an io/fs.FS as an ocfl.FS

type FileIterator

type FileIterator interface {
	FS
	// Files calls a function for each filename satisfying the path selector.
	// The function should only be called for "regular" files (never for
	// directories or symlinks).
	Files(context.Context, PathSelector, func(name string) error) error
}

FileIterator is used to iterate over regular files in an FS

type FixitySource added in v0.0.23

type FixitySource interface {
	// GetFixity returns a DigestSet with alternate digests for the content with
	// the digest derrived using the stage's primary digest algorithm.
	GetFixity(digest string) DigestSet
}

FixitySource is used to access alternate digests for content with a given digest (sha512 or sha256) when creating or updating objects.

type InvType

type InvType struct {
	Spec
}

InvType represents an inventory type string for example: https://ocfl.io/1.0/spec/#inventory

func (InvType) MarshalText

func (invT InvType) MarshalText() ([]byte, error)

func (InvType) String

func (inv InvType) String() string

func (*InvType) UnmarshalText

func (invT *InvType) UnmarshalText(t []byte) error

type MapDigestConflictErr

type MapDigestConflictErr struct {
	Digest string
}

MapDigestConflictErr indicates same digest found multiple times in the digest map (i.e., with different cases)

func (*MapDigestConflictErr) Error

func (d *MapDigestConflictErr) Error() string

type MapPathConflictErr

type MapPathConflictErr struct {
	Path string
}

MapPathConflictErr indicates a path appears more than once in the digest map. It's also used in cases where the path as used as a directory in one instance and a file in another.

func (*MapPathConflictErr) Error

func (p *MapPathConflictErr) Error() string

type MapPathInvalidErr

type MapPathInvalidErr struct {
	Path string
}

MapPathInvalidErr indicates an invalid path in a Map.

func (*MapPathInvalidErr) Error

func (p *MapPathInvalidErr) Error() string

type MultiDigester

type MultiDigester struct {
	io.Writer
	// contains filtered or unexported fields
}

MultiDigester is used to generate digests for multiple digest algorithms at the same time.

func NewMultiDigester

func NewMultiDigester(algs ...string) *MultiDigester

func (MultiDigester) Sum added in v0.0.22

func (md MultiDigester) Sum(alg string) string

func (MultiDigester) Sums

func (md MultiDigester) Sums() DigestSet

Sums returns a DigestSet with all digest values for the MultiDigester

type Namaste added in v0.0.24

type Namaste struct {
	Type    string
	Version Spec
}

Namaste represents a NAMASTE declaration

func FindNamaste added in v0.0.24

func FindNamaste(items []fs.DirEntry) (Namaste, error)

FindNamaste returns the Namasted declaration from a fs.DirEntry slice. An error is returned if the number of declarations is not one.

func ParseNamaste added in v0.0.24

func ParseNamaste(name string) (n Namaste, err error)

func (Namaste) Body added in v0.0.24

func (n Namaste) Body() string

Body returns the expected file contents of the namaste declaration

func (Namaste) Name added in v0.0.24

func (n Namaste) Name() string

Name returns the filename for d (0=TYPE_VERSION) or an empty string if d is empty

type ObjectRoot

type ObjectRoot struct {
	FS          FS       // the FS where the object is stored
	Path        string   // object path in FS
	Spec        Spec     // the OCFL spec from the object's NAMASTE declaration
	VersionDirs VNums    // versions directories found in the object directory
	SidecarAlg  string   // digest algorithm declared by the inventory sidecar
	NonConform  []string // non-conforming entries found in the object root (max=8)
	Flags       ObjectRootFlag
}

ObjectRoot represents an existing OCFL object root directory. Instances are typically created with functions like GetObjectRoot().

func GetObjectRoot

func GetObjectRoot(ctx context.Context, fsys FS, dir string) (*ObjectRoot, error)

GetObjectRoot reads the contents of directory dir in fsys, confirms that an OCFL Object declaration is present, and returns a new ObjectRoot reference based on the directory contents. If the directory cannot be read or a declaration is not found, ErrObjectNotFound is returned. Note, the object declaration is not read or fully validated. The returned ObjectRoot will have the FoundNamaste flag set, but other flags expected for a complete object root may not be set (e.g., if the inventory is missing).

func NewObjectRoot

func NewObjectRoot(fsys FS, dir string, entries []fs.DirEntry) *ObjectRoot

NewObjectRoot constructs an ObjectRoot for the directory dir in fsys using the given fs.DirEntry slice as dir's contents. The returned ObjectRoot may be invalid.

func (ObjectRoot) HasExtensions added in v0.0.24

func (obj ObjectRoot) HasExtensions() bool

HasExtensions returns true if the object's HasExtensions flag is set

func (ObjectRoot) HasInventory

func (obj ObjectRoot) HasInventory() bool

HasInventory returns true if the object's FoundInventory flag is set

func (ObjectRoot) HasNamaste added in v0.0.24

func (obj ObjectRoot) HasNamaste() bool

HasNamaste returns true if the object's FoundDeclaration flag is set

func (ObjectRoot) HasSidecar

func (obj ObjectRoot) HasSidecar() bool

HasSidecar returns true if the object's FoundSidecar flag is set

func (ObjectRoot) HasVersionDir

func (obj ObjectRoot) HasVersionDir(dir VNum) bool

func (ObjectRoot) ValidateNamaste added in v0.0.24

func (obj ObjectRoot) ValidateNamaste(ctx context.Context) error

ValidateNamaste reads and validates the contents of the OCFL object declaration in the object root.

type ObjectRootFlag

type ObjectRootFlag int
const (
	// HasNamaste indicates that an ObjectRoot has been initialized
	// and an object declaration file is confirmed to exist in the object's root
	// directory
	HasNamaste ObjectRootFlag = 1 << iota
	// HasInventory indicates that an ObjectRoot includes an "inventory.json"
	// file
	HasInventory
	// HasSidecar indicates that an ObjectRoot includes an "inventory.json.*"
	// file (the inventory sidecar).
	HasSidecar
	// HasExtensions indicates that an ObjectRoot includes a directory
	// named "extensions"
	HasExtensions
)

type ObjectRootIterator

type ObjectRootIterator interface {
	// ObjectRoots searches root and its subdirectories for OCFL object declarations
	// and calls fn for each object root it finds. The *ObjectRoot passed to fn is
	// confirmed to have an object declaration, but no other validation checks are
	// made.
	ObjectRoots(ctx context.Context, sel PathSelector, fn func(obj *ObjectRoot) error) error
}

ObjectRootIterator is used to iterate over object roots

type PathMap added in v0.0.23

type PathMap map[string]string

PathMap maps filenames to digest strings.

func (PathMap) DigestMap added in v0.0.23

func (pm PathMap) DigestMap() DigestMap

DigestMap returns a new DigestMap using the pathnames and digests in pm. The resulting DigestMap may be invalid if pm includes invalid paths or digests.

func (PathMap) DigestMapValid added in v0.0.23

func (pm PathMap) DigestMapValid() (DigestMap, error)

DigestMap returns a new DigestMap using the pathnames and digests in pm. If the resulting DigestMap is invalid, an error is returned.

type PathSelector

type PathSelector struct {
	// Dir is a parent directory for all paths that satisfy the selector. All
	// paths in the selection match Dir or have Dir as a parent (prefix). If Dir
	// is not a well-formed path (see fs.ValidPath), then no path names will
	// satisfy the path selector. There is one exception: The empty string is
	// converted to "." by consumers of the selector using Path().
	Dir string

	// SkipDirFn is used to skip directories during an iteration process. If the
	// function returns true for a given path, the directory's contents will be
	// skipped. The string parameter is always a directory name relative to an
	// FS.
	SkipDirFn func(dir string) bool
}

PathSelector is used to configure iterators that walk an FS. See FileIterator and ObjectRootIterator.

func Dir

func Dir(name string) PathSelector

Dir is a convenient way to construct a PathSelector for a given directory.

func (PathSelector) Path

func (ps PathSelector) Path() string

Path returns name as a valid path or an empty string if name is not a valid path

func (PathSelector) SkipDir

func (ps PathSelector) SkipDir(name string) bool

SkipDir returns true if dir should be skipped during an interation process that uses the path selector

type RemapFunc

type RemapFunc func(oldPaths []string) (newPaths []string)

RemapFunc is a function used to transform a DigestMap

func Remove

func Remove(name string) RemapFunc

Remove returns a RemapFunc that removes name.

func Rename

func Rename(from, to string) RemapFunc

Rename returns a RemapFunc that renames from to to.

type Spec

type Spec string

Spec represent an OCFL specification number

func (Spec) AsInvType

func (n Spec) AsInvType() InvType

AsInvType returns n as an InventoryType

func (Spec) Cmp

func (v1 Spec) Cmp(v2 Spec) int

Cmp compares Spec v1 to another v2. - If v1 is less than v2, returns -1. - If v1 is the same as v2, returns 0 - If v1 is greater than v2, returns 1

func (Spec) Empty

func (n Spec) Empty() bool

func (Spec) Valid added in v0.0.24

func (s Spec) Valid() error

type Stage

type Stage struct {
	// State is a DigestMap representing the new object version state.
	State DigestMap
	// DigestAlgorithm is the primary digest algorithm (sha512 or sha256) used by the stage
	// state.
	DigestAlgorithm string
	// ContentSource is used to access new content needed to construct
	// an object. It may be nil
	ContentSource
	// FixitySource is used to access fixity information for new
	// content. It may be nil
	FixitySource
}

Stage is used to create/update objects.

func StageBytes added in v0.0.23

func StageBytes(content map[string][]byte, algs ...string) (*Stage, error)

StageBytes builds a stage from a map of filenames to file contents

func StageDir added in v0.0.23

func StageDir(ctx context.Context, fsys FS, dir string, algs ...string) (*Stage, error)

StageDir builds a stage based on the contents of the directory dir in

func (Stage) HasContent added in v0.0.23

func (s Stage) HasContent(digest string) bool

HasContent returns true if the stage's content source provides an FS and path for the digest

func (*Stage) Overlay added in v0.0.23

func (s *Stage) Overlay(stages ...*Stage) error

Overlay merges the state and content/fixity sources from all stages into s. All stages mush share the same digest algorithm.

type User

type User struct {
	Name    string `json:"name"`
	Address string `json:"address,omitempty"`
}

User is a generic user information struct

type VNum

type VNum struct {
	// contains filtered or unexported fields
}

VNum represents an OCFL object version number (e.g., "v1", "v02"). A VNum has a sequence number (1,2,3...) and a padding number, which defaults to zero. The padding is the maximum number of numeric digits the version number can include (a padding of 0 is no maximum). The padding value constrains the maximum valid sequence number.

func MustParseVNum

func MustParseVNum(str string) VNum

MustParseVNum parses str as a VNUm and returns a new VNum. It panics if str cannot be parsed as a VNum.

func V

func V(ns ...int) VNum

V returns a new Vnum. The first argument is a sequence number. An optional second argument can be used to set the padding. Additional arguments are ignored. Without any arguments, V() returns a zero value VNum.

func (VNum) AsHead

func (v VNum) AsHead() VNums

AsHead returns a VNums with v as the head.

func (VNum) First

func (v VNum) First() bool

First returns true if v is a version 1.

func (VNum) IsZero

func (v VNum) IsZero() bool

IsZero returns if v is the zero value

func (VNum) MarshalText

func (v VNum) MarshalText() ([]byte, error)

func (VNum) Next

func (v VNum) Next() (VNum, error)

Next returns the next ocfl.VNum after v with the same padding. A non-nil error is returned if padding > 0 and next would overflow the padding

func (VNum) Num

func (v VNum) Num() int

Num returns v's number as an int

func (VNum) Padding

func (v VNum) Padding() int

Padding returns v's padding number.

func (VNum) Prev

func (v VNum) Prev() (VNum, error)

Prev returns the previous version before v, with the same padding. An error is returned if v.Num() == 1

func (VNum) String

func (v VNum) String() string

String returns string representation of v

func (*VNum) UnmarshalText

func (v *VNum) UnmarshalText(text []byte) error

func (VNum) Valid

func (v VNum) Valid() error

Valid returns an error if v is invalid

type VNums

type VNums []VNum

VNums is a slice of VNum elements

func (VNums) Head

func (vs VNums) Head() VNum

Head returns the last VNum in vs.

func (VNums) Len

func (vs VNums) Len() int

Len implements sort.Interface on VNums

func (VNums) Less

func (vs VNums) Less(i, j int) bool

Less implements sort.Interface on VNums

func (VNums) Padding

func (vs VNums) Padding() int

Padding returns the padding for the VNums in vs

func (VNums) Swap

func (vs VNums) Swap(i, j int)

Swap implements sort.Interface on VNums

func (VNums) Valid

func (vs VNums) Valid() error

Valid returns a non-nill error if VNums is empty, is not a continuous sequence (1,2,3...), has inconsistent padding or padding overflow.

type WriteFS

type WriteFS interface {
	FS
	Write(ctx context.Context, name string, buffer io.Reader) (int64, error)
	// Remove the file with path name
	Remove(ctx context.Context, name string) error
	// Remove the directory with path name and all its contents. If the path
	// does not exist, return nil.
	RemoveAll(ctx context.Context, name string) error
}

WriteFS is a storage backend that supports write and remove operations.

Directories

Path Synopsis
backend
s3
examples
internal
pathtree
Package pathree provides Node[T] and generic functions used for storing arbitrary values in a hierarchical data structure following filesystem naming conventions.
Package pathree provides Node[T] and generic functions used for storing arbitrary values in a hierarchical data structure following filesystem naming conventions.
Package [ocflv1] provides an implementation of OCFL v1.0 and v1.1.
Package [ocflv1] provides an implementation of OCFL v1.0 and v1.1.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL