bibcheck

command module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2026 License: BSD-3-Clause Imports: 1 Imported by: 0

README

bibcheck

Catch errors in paper bibliographies.

[!WARNING] This tool is not a substitute for your professional judgement!

[!WARNING] This tool may make mistakes!

Quick Start (macOS)

Download the appropriate binary from the latest release

  1. Remove the quarantine bit (the binaries aren't signed or whatever)
  2. Make executable
  3. Run
xattr -d com.apple.quarantine bibcheck-darwin-arm64
chmod +x bibcheck-darwin-arm64
./bibcheck-darwin-arm64

Quick Start (Linux)

Download the appropriate binary from the latest release

  1. Make executable
  2. Run
chmod +x bibcheck-linux-amd64
./bibcheck-linux-amd64

Quick Start (Build from Source)

  1. Install Go >= 1.24.0
  2. Compile and run:
go run main.go

Examples

Analyze a whole document

export SHIRTY_API_KEY=sk-...
go run main.go test/20231113_siefert_pmbs.pdf
go run main.go --shirty-api-key sk-... test/20231113_siefert_pmbs.pdf 
=== Entry 1 ===
[1] 2017. NVIDIA Tesla V100 GPU Architecture. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
Detected URL: https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
Kind: website
direct URL access...
recieved application/pdf from https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
URL: ✓ LOOKS OKAY
     Both entries have the same title, with minor transcription-style differences (capitalization and spacing). The contributing organization in ENTRY 1 matches the implied author/organization in ENTRY 2.
=== Entry 2 ===
[2] 2018. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. https://www.netlib.org/benchmark/hpl
Detected URL: https://www.netlib.org/benchmark/hpl
Kind: software_package
direct URL access...
recieved text/html; charset=UTF-8 from https://www.netlib.org/benchmark/hpl
URL: ✓ LOOKS OKAY
     The title field matches exactly between the two entries, but there is no author information in the second entry to compare with the first entry. However, based on the available data, it appears that both entries reference the same thing, which is the HPL benchmark.
=== Entry 3 ===
[3] 2020. NVIDIA A100 Tensor Core GPU Architecture. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
Detected URL: https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
Kind: website
direct URL access...
recieved application/pdf from https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
URL: ✓ LOOKS OKAY
     Both entries have the same title, which suggests they reference the same thing.
...

Analyze a single entry

export SHIRTY_API_KEY=sk-...
go run main.go test/20231113_siefert_pmbs.pdf --entry 14
go run main.go --shirty-api-key sk-... test/20231113_siefert_pmbs.pdf --entry 14
=== Entry 14 ===
[14] 2023. OSU Micro-benchmarks. http://mvapich.cse.ohio-state.edu/benchmarks/
Detected URL: http://mvapich.cse.ohio-state.edu/benchmarks/
Kind: website
direct URL access...
recieved text/html; charset=utf-8 from http://mvapich.cse.ohio-state.edu/benchmarks/
Website:
  URL:     http://mvapich.cse.ohio-state.edu/benchmarks/
  Title:   OSU Micro-benchmarks
  Authors: 
URL: NO MATCH
     The titles do not match, but the URL in ENTRY 2 suggests a connection to MVAPICH, which is present in the title of ENTRY 1.

Interactive GUI (shirty only)

export SHIRTY_API_KEY=sk-...
go run main.go serve

or

go run main.go serve sk-...

Then navigate to http://localhost:8080 in your browser

OPENROUTER_API_KEY and SHIRTY_API_KEY are used automatically when set. Command-line flags still override environment values. OPENROUTER_BASE_URL and SHIRTY_BASE_URL are also supported.

Features

  • Extracts bibliography entries from PDF documents and analyzes them one-by-one
  • Supports both CLI analysis and a lightweight web UI for uploaded PDFs
  • Uses configured LLM backends for bibliography counting, entry extraction, metadata parsing, and optional result summarization
    • SHIRTY_API_KEY enables the Shirty-based pipeline
    • OPENROUTER_API_KEY enables the OpenRouter-based CLI pipeline for bibliography counting, entry extraction, and metadata parsing
  • Verifies entries with direct lookups against
    • doi.org
    • arXiv
    • OSTI
    • Crossref
    • Elsevier Scopus search (when ELSEVIER_API_KEY is configured)
  • Fetches and analyzes linked online resources when an entry points to a URL
    • HTML pages
    • PDF documents
  • Includes bibliography-oriented CLI helpers
    • bib extracts the bibliography
    • entry extracts a single bibliography entry
    • list-entries lists numeric bibliography entry IDs

"Search" strategy

For each extracted bibliography entry, bibcheck currently works in this order:

  • DOI check
    • If a DOI is present, resolve it through doi.org to confirm that it exists
    • This does not stop the search, because DOI resolution alone does not provide enough metadata for comparison
  • OSTI lookup
    • If an OSTI identifier is present, fetch the OSTI record directly
    • A successful OSTI match is treated as sufficient
  • arXiv lookup
    • If an arXiv identifier is present, fetch the arXiv metadata directly
    • A successful arXiv match is treated as sufficient
  • Elsevier search
    • If ELSEVIER_API_KEY is configured, parse authors, title, and publication venue, then query Elsevier
  • Crossref bibliographic search
    • Query Crossref with the full bibliography entry text
    • Only accept a result when the top score is strong enough and not effectively tied with the next match
  • Online resource lookup
    • If no database/source match was found, parse the entry as an online resource
    • Fetch the URL directly and extract metadata from HTML or PDF content for comparison

Contributing

See CONTRIBUTING.md

Acknowledgements

  • Thank you to arXiv for use of its open access interoperability.
  • Thank you to OSTI for providing a free API
  • Thank you to Crossref for providing a free API
  • Thank you to doi.org for providing a free API

Roadmap

  • OpenAlex
  • If a URL is available, try that first, e.g. for
[3] C. Bormann, M. Ersue, and A. Keranen, "Terminology for Constrained-Node Networks," RFC 7228, Internet Engineering Task Force, May 2014. [Online]. Available: https://tools.ietf.org/html/rfc7228
  • offer a google scholar link when we can't find it, e.g.
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C32&q=Quantum+information&btnG=
  • offer a DOI URL when the DOI is found
  • Show selected file in upload GUI
  • extract alphanumeric bibliography entry
    • structured response asking for all bibliography IDs
  • docker/podman CLI
  • Allow user to provide email address (for crossref.org API)
  • DBLP search
  • OpenAlex search
  • Elsevier API key in Web UI
  • Version / SHA in WebUI
    • if versioned, only show that
    • version links to release, sha links to commit
  • Version / SHA in CLI

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
internal
web
app command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL