metha

package module
v0.3.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2024 License: GPL-3.0 Imports: 31 Imported by: 8

README

metha

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. -- https://www.openarchives.org/pmh/

The metha command line tools can gather information on OAI-PMH endpoints and harvest data incrementally. The goal of metha is to make it simple to get access to data, its focus is not to manage it.

DOI

The metha tool has been developed for Project finc at Leipzig University Library.

Why yet another OAI harvester?

  • I wanted to crawl Arxiv but found that existing tools would timeout.
  • Some harvesters would start to download all records anew, if I interrupted a running harvest.
  • There are many OAI endpoints out there. It is a widely used protocol and somewhat worth knowing.
  • I wanted something simple for the command line; also fast and robust - metha as it is implemented now, is relatively robust and more efficient than requesting all record one-by-one (there is one annoyance which will hopefully be fixed soon).

How it works

The functionality is spread accross a few different executables:

  • metha-sync for harvesting
  • metha-cat for viewing
  • metha-id for gathering data about endpoints
  • metha-ls for inspecting the local cache
  • metha-files for listing the associated files for a harvest

To harvest and endpoint in the default oai_dc format:

$ metha-sync http://export.arxiv.org/oai2
...

All downloaded files are written to a directory below a base directory. The base directory is ~/.cache/metha by default and can be adjusted with the METHA_DIR environment variable.

When the -dir flag is set, only the directory corresponding to a harvest is printed.

$ metha-sync -dir http://export.arxiv.org/oai2
/home/miku/.metha/I29haV9kYyNodHRwOi8vZXhwb3J0LmFyeGl2Lm9yZy9vYWky
$ METHA_DIR=/tmp/harvest metha-sync -dir http://export.arxiv.org/oai2
/tmp/harvest/I29haV9kYyNodHRwOi8vZXhwb3J0LmFyeGl2Lm9yZy9vYWky

The harvesting can be interrupted at any time and the HTTP client will automatically retry failed requests a few times before giving up.

Currently, there is a limitation which only allows to harvest data up to the last day. Example: If the current date would be Thu Apr 21 14:28:10 CEST 2016, the harvester would request all data since the repositories earliest date and 2016-04-20 23:59:59.

To stream the harvested XML data to stdout run:

$ metha-cat http://export.arxiv.org/oai2

You can emit records based on datestamp as well:

$ metha-cat -from 2016-01-01 http://export.arxiv.org/oai2

This will only stream records with a datestamp equal or after 2016-01-01.

To just stream all data really fast, use find and zcat over the harvesting directory.

$ find $(metha-sync -dir http://export.arxiv.org/oai2) -name "*gz" | xargs unpigz -c

To display basic repository information:

$ metha-id http://export.arxiv.org/oai2

To list all harvested endpoints:

$ metha-ls

Further examples can be found in the metha man page:

$ man metha

Installation

Use a deb, rpm release, or the go tool:

$ go install -v github.com/miku/metha/cmd/...@latest

Limitations

Currently the endpoint URL, the format and the set are concatenated and base64 encoded to form the target directory, e.g:

$ echo "U291bmRzI29haV9kYyNodHRwOi8vY29wYWMuamlzYy5hYy51ay9vYWktcG1o" | base64 -d
Sounds#oai_dc#http://copac.jisc.ac.uk/oai-pmh

If you have very long set names or a very long URL and the target directory exceeds e.g. 255 chars (on ext4), the harvest won't work.

Harvesting Roulette

$ URL=$(shuf -n 1 <(curl -Lsf https://git.io/vKXFv)); metha-sync $URL && metha-cat $URL

In 0.1.27 a metha-fortune command was added, which fetches a random article description and displays it.

$ metha-fortune
Active Networking is concerned with the rapid definition and deployment of
innovative, but reliable and robust, networking services. Towards this end we
have developed a composite protocol and networking services architecture that
encourages re-use of protocol functions, is well defined, and facilitates
automatic checking of interfaces and protocol component properties. The
architecture has been used to implement common Internet protocols and services.
We will report on this work at the workshop.

    -- http://drops.dagstuhl.de/opus/phpoai/oai2.php

$ metha-fortune
In this paper we show that the Lempert property (i.e., the equality between the
Lempert function and the Carathéodory distance) holds in the tetrablock, a
bounded hyperconvex domain which is not biholomorphic to a convex domain. The
question whether such an equality holds was posed by Abouhajar et al. in J.
Geom. Anal. 17(4), 717–750 (2007).

    -- http://ruj.uj.edu.pl/oai/request

$ metha-fortune
I argue that Gödel's incompleteness theorem is much easier to understand when
thought of in terms of computers, and describe the writing of a computer
program which generates the undecidable Gödel sentence.

    -- http://quantropy.org/cgi/oai2

$ metha-fortune
Nigeria, a country in West Africa, sits on the Atlantic coast with a land area
of approximately 90 million hectares and a population of more than 140 million
people. The southern part of the country falls within the tropical rainforest
which has now been largely depleted and is in dire need of reforestation. About
10 percent of the land area was constituted into forest reserves for purposes
of conservation but this has suffered perturbations over the years to the
extent that what remains of the constituted forest reserves currently is less
than 4 percent of the country land area. As at today about 382,000 ha have been
reforested with indigenous and exotic species representing about 4 percent of
the remaining forest estate. Regrettably, funding of the Forestry sector in
Nigeria has been critically low, rendering reforestation programme near
impossible, especially in the last two decades. To revive the forestry sector
government at all levels must re-strategize and involve the local communities
as co-managers of the forest estates in order to create mutual dependence and
interaction in resource conservation.

    -- http://journal.reforestationchallenges.org/index.php/REFOR/oai

Scrape all metadata in a best-effort way

Use an endless loop with a timeout to get out of any hanging connection (which happen). Example scrape, converted to JSON (326M records, 60+ GB: 2023-11-01-metha-oai.ndjson.zst).

$ while true; do \
    timeout 120 metha-sync -list | \
    shuf | \
    parallel -j 64 -I {} "metha-sync -base-dir ~/.cache/metha {}"; \
done

Alternatively, use a metha.service file to run harvests continuously.

metha stores harvested data in one file per interval; to combine all XML files into a single JSON file you can utilize the xmlstream.go (adjust the harvest directory):

$ fd . '/data/.cache/metha' -e xml.gz | parallel unpigz -c | xmlstream -D

For notes on parallel processing of XML see: Faster XML processing in Go.

Errors this harvester can somewhat handle

  • responses with resumption tokens that lead to empty responses
  • gzipped responses, that are not advertised as such
  • funny (illegal) control characters in XML responses
  • repositories, that won't respond unless the dates are given with the exact granualarity
  • repositories with endless token loops
  • repositories that do not support selective harvesting, use -no-intervals flag
  • limited repositories, metha will try a few times with an exponential backoff
  • repositories, which throw occasional HTTP errors, although most of the responses look good, use -ignore-http-errors flag

Authors

Misc

Show formats of random repository:

$ shuf -n 1 <(curl -Lsf https://git.io/vKXFv) | xargs -I {} metha-id {} | jq .formats

A snippet from a 2010 publication:

The Open Archives Protocol for Metadata Harvesting (OAI-PMH) (Lagoze and van de Sompel, 2002) is currently implemented by more than 1,700 digital library reposi- tories world-wide and enables the exchange of metadata via HTTP. -- Interweaving OAI-PMH Data Sources with the Linked Data Cloud

Metha elsewhere

Asciicast

asciicast

Documentation

Index

Constants

View Source
const (
	// DefaultTimeout on requests.
	DefaultTimeout = 10 * time.Minute
	// DefaultMaxRetries is the default number of retries on a single request.
	DefaultMaxRetries = 8
)
View Source
const Day = 24 * time.Hour

Day has 24 hours.

View Source
const Version = "0.3.3"

Version of tools.

Variables

View Source
var (
	// StdClient is the standard lib http client.
	StdClient = &Client{Doer: http.DefaultClient}
	// DefaultClient is the more resilient client, that will retry and timeout.
	DefaultClient = &Client{Doer: CreateDoer(DefaultTimeout, DefaultMaxRetries)}
	// DefaultUserAgent to identify crawler, some endpoints do not like the Go
	// default (https://golang.org/src/net/http/request.go#L462), e.g.
	// https://calhoun.nps.edu/oai/request.
	DefaultUserAgent = fmt.Sprintf("metha/%s", Version)
	// ControlCharReplacer helps to deal with broken XML: http://eprints.vu.edu.au/perl/oai2. Add more
	// weird things to be cleaned before XML parsing here. Another faulty:
	// http://digitalcommons.gardner-webb.edu/do/oai/?from=2016-02-29&metadataPr
	// efix=oai_dc&until=2016-03-31&verb=ListRecords. Replace control chars
	// outside XML char range.
	ControlCharReplacer = strings.NewReplacer(
		"\u0000", "", "\u0001", "", "\u0002", "", "\u0003", "", "\u0004", "",
		"\u0005", "", "\u0006", "", "\u0007", "", "\u0008", "", "\u0009", "",
		"\u000B", "", "\u000C", "", "\u000E", "", "\u000F", "", "\u0010", "",
		"\u0011", "", "\u0012", "", "\u0013", "", "\u0014", "", "\u0015", "",
		"\u0016", "", "\u0017", "", "\u0018", "", "\u0019", "", "\u001A", "",
		"\u001B", "", "\u001C", "", "\u001D", "", "\u001E", "", "\u001F", "",
		"\uFFFD", "", "\uFFFE", "",
	)
)
View Source
var (
	// BaseDir is where all data is stored.
	BaseDir = filepath.Join(UserHomeDir(), ".cache", "metha")

	// ErrAlreadySynced signals completion.
	ErrAlreadySynced = errors.New("already synced")
	// ErrInvalidEarliestDate for unparsable earliest date.
	ErrInvalidEarliestDate = errors.New("invalid earliest date")
)
View Source
var (
	ErrInvalidVerb      = errors.New("invalid OAI verb")
	ErrMissingVerb      = errors.New("missing verb")
	ErrCannotGenerateID = errors.New("cannot generate ID")
	ErrMissingURL       = errors.New("missing URL")
	ErrParameterMissing = errors.New("missing required parameter")
)
View Source
var EndpointList string
View Source
var Endpoints = splitNonEmpty(EndpointList, "\n")

Endpoints from https://git.io/fxvs0.

Functions

func FindRepositoriesByString added in v0.1.29

func FindRepositoriesByString(s string) (urls []string, err error)

FindRepositoriesByString returns a list of already harvested base URLs given a fragment of the base URL.

func GetBaseDir added in v0.1.43

func GetBaseDir() string

GetBaseDir returns the base directory for the cache.

func MoveCompressFile added in v0.1.25

func MoveCompressFile(src, dst string) (err error)

MoveCompressFile will atomically move and compress a source file to a destination file.

func MustGlob

func MustGlob(pattern string) []string

MustGlob is like filepath.Glob, but panics on bad pattern.

func PrependSchema

func PrependSchema(s string) string

PrependSchema prepends http, if its missing.

func RandomEndpoint added in v0.1.27

func RandomEndpoint() string

RandomEndpoint returns a random endpoint url.

func Render added in v0.2.16

func Render(opts *RenderOpts) error

RenderHarvest renders harvest to JSON or XML.

func UserHomeDir

func UserHomeDir() string

UserHomeDir returns the home directory of the user.

Types

type About

type About struct {
	Body []byte `xml:",innerxml" json:"body,omitempty"`
}

About has addition record information.

func (About) GoString

func (ab About) GoString() string

GoString is a formatter for About content.

type Client

type Client struct {
	Doer Doer
}

Client can execute requests.

func CreateClient

func CreateClient(timeout time.Duration, retries int) *Client

CreateClient creates a client with timeout and retry properties.

func (*Client) Do

func (c *Client) Do(r *Request) (*Response, error)

Do executes a single OAIRequest. ResumptionToken handling must happen in the caller. Only Identify and GetRecord requests will return a complete response.

type CopyHook added in v0.1.38

type CopyHook struct {
	io.Writer
	// contains filtered or unexported fields
}

CopyHook is a Logrus hook that copies messages to a writer.

func NewCopyHook added in v0.1.38

func NewCopyHook(w io.Writer, levels ...log.Level) CopyHook

NewCopyHook initializes a copy hook. By default, it copies Warn, Error, Fatal and Panic level messages. Override these by passing in other logrus.Level values.

func (CopyHook) Fire added in v0.1.38

func (hook CopyHook) Fire(entry *log.Entry) error

Fire writes a logrus message.

func (CopyHook) Levels added in v0.1.38

func (hook CopyHook) Levels() []log.Level

Levels returns the levels the CopyLogger logs.

type Description

type Description struct {
	Body []byte `xml:",innerxml"`
}

Description holds information about a set.

func (Description) GoString

func (desc Description) GoString() string

GoString is a formatter for Description content.

type DirLaster

type DirLaster struct {
	Dir           string
	DefaultValue  string
	ExtractorFunc func(os.FileInfo) string
}

DirLaster extract the maximum value from the files of a directory. The values are extracted per file via TransformFunc, which gets a filename and returns a token. The tokens are sorted and the lexikographically largest element is returned.

func (DirLaster) Last

func (l DirLaster) Last() (string, error)

Last extracts the maximum value from a directory, given an extractor function.

type Doer

type Doer interface {
	Do(*http.Request) (*http.Response, error)
}

Doer is a minimal HTTP interface.

func CreateDoer

func CreateDoer(timeout time.Duration, retries int) Doer

CreateDoer will return http request clients with specific timeout and retry properties.

type GetRecord

type GetRecord struct {
	Record Record `xml:"record,omitempty" json:"record,omitempty"`
}

GetRecord returns a single record.

type HTTPError added in v0.1.8

type HTTPError struct {
	URL          *url.URL
	StatusCode   int
	RequestError error
}

HTTPError saves details of an HTTP error.

func (HTTPError) Error added in v0.1.8

func (e HTTPError) Error() string

Error prints the error message.

type Harvest

type Harvest struct {
	BaseURL string
	Format  string
	Set     string
	From    string
	Until   string
	Client  *Client

	// XXX: Factor these out into options.
	MaxRequests                int
	DisableSelectiveHarvesting bool
	CleanBeforeDecode          bool
	IgnoreHTTPErrors           bool
	MaxEmptyResponses          int
	SuppressFormatParameter    bool
	HourlyInterval             bool
	DailyInterval              bool
	ExtraHeaders               http.Header
	KeepTemporaryFiles         bool

	Delay int

	// XXX: Lazy via sync.Once?
	Identify *Identify
	Started  time.Time

	// Protects the rare case, where we are in the process of renaming
	// harvested files and get a termination signal at the same time.
	sync.Mutex
}

Harvest contains parameters for mass-download. MaxRequests and CleanBeforeDecode are switches to handle broken token implementations and funny chars in responses. Some repos do not support selective harvesting (e.g. zvdd.org/oai2). Set "DisableSelectiveHarvesting" to try to grab metadata from these repositories. From and Until must always be given with 2006-01-02 layout. TODO(miku): make zero type work (lazily run identify).

func NewHarvest

func NewHarvest(baseURL string) (*Harvest, error)

NewHarvest creates a new harvest. A network connection will be used for an initial Identify request.

func (*Harvest) DateLayout

func (h *Harvest) DateLayout() string

DateLayout converts the repository endpoints advertised granularity to Go date format strings.

func (*Harvest) Dir

func (h *Harvest) Dir() string

Dir returns the absolute path to the harvesting directory.

func (*Harvest) Files

func (h *Harvest) Files() []string

Files returns all files for a given harvest, without the temporary files.

func (*Harvest) MkdirAll

func (h *Harvest) MkdirAll() error

MkdirAll creates necessary directories.

func (*Harvest) Run

func (h *Harvest) Run() error

Run starts the harvest.

type Header struct {
	Status     string   `xml:"status,attr" json:"status,omitempty"`
	Identifier string   `xml:"identifier,omitempty" json:"identifier,omitempty"`
	DateStamp  string   `xml:"datestamp,omitempty" json:"datestamp,omitempty"`
	SetSpec    []string `xml:"setSpec,omitempty" json:"setSpec,omitempty"`
}

A Header is part of other requests.

type Identify

type Identify struct {
	RepositoryName    string        `xml:"repositoryName,omitempty" json:"repositoryName,omitempty"`
	BaseURL           string        `xml:"baseURL,omitempty" json:"baseURL,omitempty"`
	ProtocolVersion   string        `xml:"protocolVersion,omitempty" json:"protocolVersion,omitempty"`
	AdminEmail        []string      `xml:"adminEmail,omitempty" json:"adminEmail,omitempty"`
	EarliestDatestamp string        `xml:"earliestDatestamp,omitempty" json:"earliestDatestamp,omitempty"`
	DeletedRecord     string        `xml:"deletedRecord,omitempty" json:"deletedRecord,omitempty"`
	Granularity       string        `xml:"granularity,omitempty" json:"granularity,omitempty"`
	Description       []Description `xml:"description,omitempty" json:"description,omitempty"`
}

Identify reports information about a repository.

type Interval

type Interval struct {
	Begin time.Time
	End   time.Time
}

Interval represents a span of time.

func (Interval) DailyIntervals added in v0.1.14

func (iv Interval) DailyIntervals() []Interval

DailyIntervals segments a given interval into daily intervals.

func (Interval) HourlyIntervals added in v0.2.5

func (iv Interval) HourlyIntervals() []Interval

HourlyIntervals segments a given interval into hourly intervals.

func (Interval) MonthlyIntervals

func (iv Interval) MonthlyIntervals() []Interval

MonthlyIntervals segments a given interval into monthly intervals.

func (Interval) String added in v0.1.14

func (iv Interval) String() string

String formats the interval.

type Laster

type Laster interface {
	Last() (string, error)
}

Laster extracts some maximum value as string.

type ListIdentifiers

type ListIdentifiers struct {
	Headers         []Header        `xml:"header,omitempty" json:"header,omitempty"`
	ResumptionToken ResumptionToken `xml:"resumptionToken,omitempty" json:"resumptionToken,omitempty"`
}

ListIdentifiers lists headers only.

type ListMetadataFormats

type ListMetadataFormats struct {
	MetadataFormat []MetadataFormat `xml:"metadataFormat,omitempty" json:"metadataFormat,omitempty"`
}

ListMetadataFormats lists supported metadata formats.

type ListRecords

type ListRecords struct {
	Records         []Record        `xml:"record" json:"record"`
	ResumptionToken ResumptionToken `xml:"resumptionToken,omitempty" json:"resumptionToken,omitempty"`
}

ListRecords lists records.

type ListSets

type ListSets struct {
	Set             []Set           `xml:"set,omitempty"  json:"set,omitempty"`
	ResumptionToken ResumptionToken `xml:"resumptionToken,omitempty" json:"resumptionToken,omitempty"`
}

ListSets lists available sets.

type Metadata

type Metadata struct {
	Body []byte `xml:",innerxml"`
}

Metadata contains the actual metadata, conforming to varying schemas.

func (Metadata) GoString

func (md Metadata) GoString() string

GoString is a formatter for Metadata content.

func (Metadata) MarshalJSON

func (md Metadata) MarshalJSON() ([]byte, error)

MarshalJSON marshals the metadata body.

type MetadataFormat

type MetadataFormat struct {
	MetadataPrefix    string `xml:"metadataPrefix,omitempty" json:"metadataPrefix,omitempty"`
	Schema            string `xml:"schema,omitempty" json:"schema,omitempty"`
	MetadataNamespace string `xml:"metadataNamespace,omitempty" json:"metadataNamespace,omitempty"`
}

MetadataFormat holds information about a format.

type MultiError

type MultiError struct {
	Errors []error
}

MultiError collects a number of errors.

func (*MultiError) Error

func (e *MultiError) Error() string

Error formats all error strings into a single string.

type OAIError

type OAIError struct {
	Code    string `xml:"code,attr" json:"code,omitempty"`
	Message string `xml:",chardata" json:"message,omitempty"`
}

OAIError is an OAI protocol error.

func (OAIError) Error

func (e OAIError) Error() string

Error formats code and message.

type Record

type Record struct {
	XMLName  xml.Name
	Header   Header   `xml:"header,omitempty" json:"header,omitempty"`
	Metadata Metadata `xml:"metadata,omitempty" json:"metadata,omitempty"`
	About    About    `xml:"about,omitempty" json:"about,omitempty"`
}

Record represents a single record.

type RenderOpts added in v0.2.16

type RenderOpts struct {
	Writer  io.Writer
	Harvest Harvest
	Root    string
	From    string
	Until   string
	UseJson bool
}

RenderOpts controls output by the metha-cat command.

type Repository

type Repository struct {
	BaseURL string
}

Repository represents an OAI endpoint.

func (Repository) Formats

func (r Repository) Formats() ([]MetadataFormat, error)

Formats returns a list of metadata formats.

func (Repository) Sets

func (r Repository) Sets() ([]Set, error)

Sets returns a list of sets.

type Request

type Request struct {
	BaseURL                 string
	Verb                    string
	Identifier              string
	MetadataPrefix          string
	From                    string
	Until                   string
	Set                     string
	ResumptionToken         string
	CleanBeforeDecode       bool
	SuppressFormatParameter bool
	ExtraHeaders            http.Header
}

A Request can express any OAI request. Not all combination of values will yield valid requests.

func (*Request) URL

func (r *Request) URL() (*url.URL, error)

URL returns the URL for a given request. Invalid verbs and missing parameters are reported here.

type RequestNode

type RequestNode struct {
	Verb           string `xml:"verb,attr" json:"verb,omitempty"`
	Set            string `xml:"set,attr" json:"set,omitempty"`
	MetadataPrefix string `xml:"metadataPrefix,attr" json:"metadataPrefix,omitempty"`
}

RequestNode carries the request information into the response.

type Response

type Response struct {
	ResponseDate string      `xml:"responseDate,omitempty" json:"responseDate,omitempty"`
	Request      RequestNode `xml:"request,omitempty" json:"request,omitempty"`
	Error        OAIError    `xml:"error,omitempty" json:"error,omitempty"`

	GetRecord           GetRecord           `xml:"GetRecord,omitempty" json:"GetRecord,omitempty"`
	Identify            Identify            `xml:"Identify,omitempty" json:"Identify,omitempty"`
	ListIdentifiers     ListIdentifiers     `xml:"ListIdentifiers,omitempty" json:"ListIdentifiers,omitempty"`
	ListMetadataFormats ListMetadataFormats `xml:"ListMetadataFormats,omitempty" json:"ListMetadataFormats,omitempty"`
	ListRecords         ListRecords         `xml:"ListRecords,omitempty" json:"ListRecords,omitempty"`
	ListSets            ListSets            `xml:"ListSets,omitempty" json:"ListSets,omitempty"`
}

Response is the envelope. It can hold any OAI response kind.

func Do

func Do(r *Request) (*Response, error)

Do is a shortcut for DefaultClient.Do.

func (*Response) CompleteListSize added in v0.1.38

func (response *Response) CompleteListSize() string

CompleteListSize returns the value of completeListSize, if it exists.

func (*Response) Cursor added in v0.1.38

func (response *Response) Cursor() string

CompleteListSize returns the value of completeListSize, if it exists.

func (*Response) GetResumptionToken

func (response *Response) GetResumptionToken() string

GetResumptionToken returns the resumption token or an empty string if it does not have a token. In addition, return an empty string, if cursor and complete list size are defined and are equal (doaj, refs #14865).

func (*Response) HasResumptionToken

func (response *Response) HasResumptionToken() bool

HasResumptionToken determines if the request has a ResumptionToken.

type ResumptionToken added in v0.1.38

type ResumptionToken struct {
	Text             string `xml:",chardata"` // eyJhIjogWyIyMDE5LTAyLTIxV...
	CompleteListSize string `xml:"completeListSize,attr"`
	Cursor           string `xml:"cursor,attr"`
	ExpirationDate   string `xml:"expirationDate,attr"`
}

ResupmtionToken with optional extra information.

type Set

type Set struct {
	SetSpec        string      `xml:"setSpec,omitempty" json:"setSpec,omitempty"`
	SetName        string      `xml:"setName,omitempty" json:"setName,omitempty"`
	SetDescription Description `xml:"setDescription,omitempty" json:"setDescription,omitempty"`
}

A Set has a spec, name and description.

type Values

type Values struct {
	url.Values
}

Values enhances the builtin url.Values.

func NewValues

func NewValues() Values

NewValues create a new Values container.

func (Values) EncodeVerbatim

func (v Values) EncodeVerbatim() string

EncodeVerbatim is like Encode(), but does not escape the keys and values.

Directories

Path Synopsis
cmd
metha-snapshot
Download metadata from all known endpoints (or some supplied list), generate a single JSON file.
Download metadata from all known endpoints (or some supplied list), generate a single JSON file.
extra
_largecrawl
genjson extracts info from a stream of OAI DC XML records, e.g.
genjson extracts info from a stream of OAI DC XML records, e.g.
pkpindex
Small util to get journal info from https://index.pkp.sfu.ca currently including 1264043 records indexed from 4960 publications.
Small util to get journal info from https://index.pkp.sfu.ca currently including 1264043 records indexed from 4960 publications.
Package xflag add an additional flag type Array for repeated string flags.
Package xflag add an additional flag type Array for repeated string flags.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL