utils

module
v0.1.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 22, 2023 License: GPL-3.0

README

# rescribe.xyz/utils

This repository contains miscellaneous commands and small packages
useful for the OCR of books.

This is a collection of Go packages, and can be installed in the
standard go way, by running `go get rescribe.xyz/utils/...` and documentation
can be read with the `go doc` command or online at
<https://pkg.go.dev/rescribe.xyz/utils>.

If you just want to install and use the commands, you can get the
package with `git clone https://git.rescribe.xyz/utils`, and then
install them with `go install ./...` from within the `utils`
directory.

## Contributions

Any and all comments, bug reports, patches or pull requests would
be very welcomely received. Please email them to <nick@rescribe.xyz>.

## License

This package is licensed under the GPLv3. See the LICENSE file for
more details.

Directories

Path Synopsis
cmd
analysestats
analysestats analyses a set of 'best', 'conf', and 'hocr' files in a directory, outputting results to a .csv file for further investigation.
analysestats analyses a set of 'best', 'conf', and 'hocr' files in a directory, outputting results to a .csv file for further investigation.
avg-lines
avg-lines prints a report of the average confidence for each line, sorted from worst to best
avg-lines prints a report of the average confidence for each line, sorted from worst to best
boxtotxt
boxtotxt converts a Tesseract .box file to plain text
boxtotxt converts a Tesseract .box file to plain text
bucket-lines
bucket-lines copies image-text line pairs into different directories according to the average character probability for the line
bucket-lines copies image-text line pairs into different directories according to the average character probability for the line
dehyphenate
dehyphenate does basic dehyphenation on a hocr file
dehyphenate does basic dehyphenation on a hocr file
dlgbook
dlgbook is a wrapper around getgbook which gets metadata and uses it to save to a specially formatted directory
dlgbook is a wrapper around getgbook which gets metadata and uses it to save to a specially formatted directory
eeboxmltohocr
eeboxmltohocr converts the XML from an EEBO download to hOCR, which can be easily incorporated into a searchable PDF
eeboxmltohocr converts the XML from an EEBO download to hOCR, which can be easily incorporated into a searchable PDF
extracthocrlines
extracthocrlines copies the text and corresponding image section for each line of a HOCR file into separate files, which is useful for OCR training
extracthocrlines copies the text and corresponding image section for each line of a HOCR file into separate files, which is useful for OCR training
fonttobytes
fonttobytes outputs a font file as a series of bytes in go format, allowing a font to be easily embedded into a go binary
fonttobytes outputs a font file as a series of bytes in go format, allowing a font to be easily embedded into a go binary
hocrtotxt
hocrtotxt prints the text from a hocr file
hocrtotxt prints the text from a hocr file
iiifdownloader
iiifdownloader attempts to download every page of a IIIF book in the best available quality, given a manifest url
iiifdownloader attempts to download every page of a IIIF book in the best available quality, given a manifest url
pare-gt
pare-gt moves some ground truth, ensuring that the same proportions of each ground truth source are represented in the moved section
pare-gt moves some ground truth, ensuring that the same proportions of each ground truth source are represented in the moved section
pgconf
pgconf prints the total confidence for a page of hOCR
pgconf prints the total confidence for a page of hOCR
pkg
hocr
hocr contains structures and functions for parsing and analysing hocr files
hocr contains structures and functions for parsing and analysing hocr files
line
line contains various functions to manipulate ocr lines
line contains various functions to manipulate ocr lines
prob
prob processes .prob files generated by ocropus
prob processes .prob files generated by ocropus

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL