islint

package module
v0.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2015 License: GPL-3.0 Imports: 9 Imported by: 2

README

islint

Intermediate Schema linter. What is linting?

Documentation on godoc.org.

Install current version:

$ go get github.com/miku/islint/cmd/...

Outdated precompiled Linux 64-bit toy: islint

Usage

$ islint -h
Usage of islint:
  -ls
        list tests
  -sample float
        ratio of records to test (default 1)
  -v    show version and exit
  -verbose
        show every error

$ islint -ls
CurrencyInTitle
EndPageBeforeStartPage
EtAlAuthorName
ExcessivePunctuation
InvalidCollection
InvalidEndPage
InvalidStartPage
InvalidURL
KeyTooLong
NAInAuthorName
NoPublisher
PublicationDateTooEarly
PublicationDateTooLate
RepeatedSlash
RepeatedSubtitle
ShortAuthorName
SuspiciousPageCount
WhitespaceAuthor

$ islint < file.is | jq
{
  "damaged": 53262,
  "dist": {
    "CurrencyInTitle": 2177,
    "EndPageBeforeStartPage": 352,
    "EtAlAuthorName": 29,
    "ExcessivePunctuation": 8,
    "InvalidCollection": 6006,
    "InvalidStartPage": 220,
    "PublicationDateTooEarly": 3680,
    "RepeatedSlash": 13,
    "RepeatedSubtitle": 37501,
    "ShortAuthorName": 4352
  },
  "elapsed": 47.49654878,
  "errcount": {
    "0": 946738,
    "1": 52188,
    "2": 1072,
    "3": 2
  },
  "ratio": "5.326",
  "start": "2015-12-07T18:41:06.50489407+01:00",
  "total": 1000000
}
...
{
  "damaged": 1994583,
  "dist": {
    "CurrencyInTitle": 33179,
    "EndPageBeforeStartPage": 8391,
    "EtAlAuthorName": 1363,
    "ExcessivePunctuation": 337,
    "InvalidCollection": 206737,
    "InvalidEndPage": 387,
    "InvalidStartPage": 9087,
    "InvalidURL": 1,
    "NoPublisher": 1393457,
    "PublicationDateTooEarly": 58379,
    "RepeatedSlash": 6717,
    "RepeatedSubtitle": 242985,
    "ShortAuthorName": 97244,
    "SuspiciousPageCount": 5
  },
  "elapsed": 509.939547991,
  "errcount": {
    "0": 6913680,
    "1": 1931478,
    "2": 62524,
    "3": 581
  },
  "ratio": "22.390",
  "start": "2015-12-07T18:41:06.50489407+01:00",
  "total": 8908263
}

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// EarliestDate is the earliest publication date we accept.
	EarliestDate = time.Date(1458, 1, 1, 0, 0, 0, 0, time.UTC)
	// LatestDate represents the latest publication date we accept.
	LatestDate = time.Now().AddDate(5, 0, 0)

	// AllowedCollections
	AllowedCollections = assetutil.MustLoadStringSet("assets/collections/collections.tsv",
		"assets/collections/crossref.tsv")
)

Functions

func AllowedCollectionNames

func AllowedCollectionNames(is finc.IntermediateSchema) error

AllowedCollectionNames checks for a fixed list of allowed collection names, stored under assets, refs. #6496.

func FeasibleAuthor added in v0.1.2

func FeasibleAuthor(is finc.IntermediateSchema) error

FeasibleAuthor checks for a few suspicious authors patterns, refs. #4892, #4940.

func HasPublisher added in v0.1.1

func HasPublisher(is finc.IntermediateSchema) error

HasPublisher tests, whether a publisher is given.

func HasURL added in v0.1.8

func HasURL(is finc.IntermediateSchema) error

func KeyLength

func KeyLength(is finc.IntermediateSchema) error

KeyLength checks the length of the record id. memcachedb limits this to 250 bytes.

func NoCurrencyInTitle

func NoCurrencyInTitle(is finc.IntermediateSchema) error

NoCurrencyInTitle, e.g. http://goo.gl/HACBcW Cartier , Marie . Baby, You Are My Religion: Women, Gay Bars, and Theology Before Stonewall . Gender, Theology and Spirituality. Durham, UK: Acumen, 2013. xii+256 pp. $90.00 (cloth); $29.95 (paper).

func NoExcessivePunctuation added in v0.1.1

func NoExcessivePunctuation(is finc.IntermediateSchema) error

NoExcessivePuctuation should detect things like this title: CrossRef????????????? https://goo.gl/AD0V1o

func NoRepeatedSlash added in v0.1.4

func NoRepeatedSlash(is finc.IntermediateSchema) error

NoRepeatedSlash checks a DOI for repeated slashes, refs. #6312.

func PlausibleDate

func PlausibleDate(is finc.IntermediateSchema) error

PlausibleDate checks for suspicious dates, refs. #5686.

func PlausiblePageCount

func PlausiblePageCount(is finc.IntermediateSchema) error

PlausiblePageCount checks, wether the start and end page look plausible.

func SubtitleRepetition

func SubtitleRepetition(is finc.IntermediateSchema) error

SubtitleRepetition, refs #6553.

func ValidURL

func ValidURL(is finc.IntermediateSchema) error

ValidURL checks, if a URL string is parseable.

Types

type Issue added in v0.1.1

type Issue struct {
	Kind    Kind
	Record  finc.IntermediateSchema
	Message string
}

Issue contains information about a quality issue in an intermediate schema record.

func (Issue) Error added in v0.1.1

func (e Issue) Error() string

Error formats the error.

func (Issue) TSV added in v0.1.1

func (e Issue) TSV() string

TSV returns a tab representation.

type Kind

type Kind uint16
const (
	KeyTooLong Kind = iota
	InvalidStartPage
	InvalidEndPage
	EndPageBeforeStartPage
	InvalidURL
	SuspiciousPageCount
	PublicationDateTooEarly
	PublicationDateTooLate
	InvalidCollection
	RepeatedSubtitle
	CurrencyInTitle
	ExcessivePunctuation
	NoPublisher
	ShortAuthorName
	EtAlAuthorName
	NAInAuthorName
	WhitespaceAuthor
	RepeatedSlash
	NoURL
)

type TestSuite

type TestSuite []Tester

TestSuite is a group of tests.

type Tester added in v0.1.1

type Tester interface {
	TestRecord(finc.IntermediateSchema) error
}

Tester is a intermediate record checker.

type TesterFunc added in v0.1.1

type TesterFunc func(finc.IntermediateSchema) error

TesterFunc makes a function satisfy an interface.

func (TesterFunc) TestRecord added in v0.1.1

func (f TesterFunc) TestRecord(is finc.IntermediateSchema) error

TestRecord delegates test to the given func.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL