utils

module
Version: v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 14, 2020 License: GPL-3.0

README

# rescribe.xyz/utils

This repository contains miscellaneous commands and small packages
useful for the OCR of books.

This is a collection of Go packages, and can be installed in the
standard go way, by running `go get rescribe.xyz/utils/...`

## Contributions

Any and all comments, bug reports, patches or pull requests would
be very welcomely received. Please email them to <nick@rescribe.xyz>.

## License

This package is licensed under the GPLv3. See the LICENSE file for
more details.

Directories

Path Synopsis
cmd
avg-lines
avg-lines prints a report of the average confidence for each line, sorted from worst to best
avg-lines prints a report of the average confidence for each line, sorted from worst to best
boxtotxt
boxtotxt converts a Tesseract .box file to plain text
boxtotxt converts a Tesseract .box file to plain text
bucket-lines
bucket-lines copies image-text line pairs into different directories according to the average character probability for the line
bucket-lines copies image-text line pairs into different directories according to the average character probability for the line
dehyphenate
dehyphenate does basic dehyphenation on a hocr file
dehyphenate does basic dehyphenation on a hocr file
eeboxmltohocr
eeboxmltohocr converts the XML from an EEBO download to hOCR, which can be easily incorporated into a searchable PDF
eeboxmltohocr converts the XML from an EEBO download to hOCR, which can be easily incorporated into a searchable PDF
fonttobytes
fonttobytes outputs a font file as a series of bytes in go format, allowing a font to be easily embedded into a go binary
fonttobytes outputs a font file as a series of bytes in go format, allowing a font to be easily embedded into a go binary
hocrtotxt
hocrtotxt prints the text from a hocr file
hocrtotxt prints the text from a hocr file
pare-gt
pare-gt moves some ground truth, ensuring that the same proportions of each ground truth source are represented in the moved section
pare-gt moves some ground truth, ensuring that the same proportions of each ground truth source are represented in the moved section
pgconf
pgconf prints the total confidence for a page of hOCR
pgconf prints the total confidence for a page of hOCR
pkg
hocr
hocr contains structures and functions for parsing and analysing hocr files
hocr contains structures and functions for parsing and analysing hocr files
line
line contains various functions to manipulate ocr lines
line contains various functions to manipulate ocr lines
prob
prob processes .prob files generated by ocropus
prob processes .prob files generated by ocropus

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
t or T : Toggle theme light dark auto