cmd/

directory
v0.1.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 22, 2023 License: GPL-3.0

Directories

Path Synopsis
analysestats analyses a set of 'best', 'conf', and 'hocr' files in a directory, outputting results to a .csv file for further investigation.
analysestats analyses a set of 'best', 'conf', and 'hocr' files in a directory, outputting results to a .csv file for further investigation.
avg-lines prints a report of the average confidence for each line, sorted from worst to best
avg-lines prints a report of the average confidence for each line, sorted from worst to best
boxtotxt converts a Tesseract .box file to plain text
boxtotxt converts a Tesseract .box file to plain text
bucket-lines copies image-text line pairs into different directories according to the average character probability for the line
bucket-lines copies image-text line pairs into different directories according to the average character probability for the line
dehyphenate does basic dehyphenation on a hocr file
dehyphenate does basic dehyphenation on a hocr file
dlgbook is a wrapper around getgbook which gets metadata and uses it to save to a specially formatted directory
dlgbook is a wrapper around getgbook which gets metadata and uses it to save to a specially formatted directory
eeboxmltohocr converts the XML from an EEBO download to hOCR, which can be easily incorporated into a searchable PDF
eeboxmltohocr converts the XML from an EEBO download to hOCR, which can be easily incorporated into a searchable PDF
extracthocrlines copies the text and corresponding image section for each line of a HOCR file into separate files, which is useful for OCR training
extracthocrlines copies the text and corresponding image section for each line of a HOCR file into separate files, which is useful for OCR training
fonttobytes outputs a font file as a series of bytes in go format, allowing a font to be easily embedded into a go binary
fonttobytes outputs a font file as a series of bytes in go format, allowing a font to be easily embedded into a go binary
hocrtotxt prints the text from a hocr file
hocrtotxt prints the text from a hocr file
iiifdownloader attempts to download every page of a IIIF book in the best available quality, given a manifest url
iiifdownloader attempts to download every page of a IIIF book in the best available quality, given a manifest url
pare-gt moves some ground truth, ensuring that the same proportions of each ground truth source are represented in the moved section
pare-gt moves some ground truth, ensuring that the same proportions of each ground truth source are represented in the moved section
pgconf prints the total confidence for a page of hOCR
pgconf prints the total confidence for a page of hOCR

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL