curate-export

command
v0.0.0-...-d4d6539 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 24, 2021 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

curate-export will harvest curate records from a fedora instance write them as JSON structures to a given directory. Unlike the f3cp tool, this tool understands the Curate object layout and will deconstruct a curate object's complicated datastream layout into a bunch of key-value pairs.

All configuration is done using envrionment variables. It is intended to be called as a cron job, or to run inside a docker container.

The envrionment variable FEDORA_PATH gives the URL to the fedora 3.X instance you want to get data from. For example:

FEDORA_PATH="https://fedoraAdmin@xxxx@fedoraprod.lc.nd.edu:8443/fedora/"

The environment variable CONTENT_PATH gives the local directory to write all the metadata files (as "$PID") as well as any content files in fedora as "PID-content". Only the most recent version of a content file is written. Since fedora keeps ALL versions of a file there could be others, but based on how Curate uses fedora, there are additional versions of content files only rarely.

The harvest can be either "everything" or all items changed since a given date. A harvest date can be given in a few ways:

  • use the envrionment variable SINCE in the form "2022-10-11"
  • If CONTENT_PATH is set, a file named "LAST-HARVEST" is read, if it exists and that date contained in the file is used.

USAGE

To dump records and content files into the directory "stuff"

env FEDORA_PATH="..." CONTENT_PATH="./stuff" ./curate-export

To only harvest records and files changed since a given date

env FEDORA_PATH="..." CONTENT_PATH="./stuff" SINCE="2021-01-01T00:00:00Z" ./curate-export

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL