preproc

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 27, 2020 License: GPL-3.0 Imports: 11 Imported by: 0

README

# rescribe.xyz/preproc package

This package contains various image processing methods which are
useful for preprocessing page images for OCR. It also contains
several commands in the cmd/ directory which can be used to
preprocess images directly.

This is a Go package, and can be installed in the standard go way,
by running `go get rescribe.xyz/preproc/...`

## Commands

There are several commands in the cmd/ directory which are useful
in their own right as well as serving as examples of using the
package.

  - binarize     : binarises an image using the sauvola algorithm
  - preproc      : binarises and wipes an image
  - preprocmulti : binarises and wipes an image with multiple
                   binarisation ksize values
  - wipe         : wipes sections of an image that are outside an
                   area detected as content

## Bugs

The integral image operations don't produce exactly the same result
as their non-integral image counterparts. The difference is small
enough that it has little effect on the output images, but it ought
to be identical.

## Contributions

Any and all comments, bug reports, patches or pull requests would
be very welcomely received. Please email them to <nick@rescribe.xyz>.

## License

This package is licensed under the GPLv3. See the LICENSE file for
more details.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BinToZeroInv

func BinToZeroInv(bin *image.Gray, orig *image.RGBA) (*image.RGBA, error)

func IntegralSauvola

func IntegralSauvola(img *image.Gray, ksize float64, windowsize int) *image.Gray

Implements Sauvola's algorithm using Integral Images, see paper "Efficient Implementation of Local Adaptive Thresholding Techniques Using Integral Images" and https://stackoverflow.com/questions/13110733/computing-image-integral

func PreCalcedSauvola

func PreCalcedSauvola(integrals integralimg.WithSq, img *image.Gray, ksize float64, windowsize int) *image.Gray

PreCalcedSauvola Implements Sauvola's algorithm using precalculated Integral Images TODO: have this be the root function that the other two reference

func PreProcMulti

func PreProcMulti(inPath string, ksizes []float64, binType string, binWsize int, wipe bool, wipeWsize int, wipeMinWidthPerc int) ([]string, error)

PreProcMulti binarizes and preprocesses an image with multiple binarisation levels. inPath: Path of input image. ksizes: Slice of k values to pass to Sauvola algorithm binType: Type of binarization threshold. binary or zeroinv are currently implemented. binWsize: Window size for sauvola binarization algorithm. Set automatically based on resolution if 0. wipe: Whether to wipe (clear sides) the image wipeWsize: Window size for wiping algorithm wipeMinWidthPerc: Minimum percentage of the image width for the content width calculation to be considered valid Note: copied from cmd/preprocmulti/main.go, should think about the best way

to organise this code later.

TODO: return errors that encapsulate the err describing where it was encountered TODO: do the post-integral image stuff in separate goroutines for speed

func Sauvola

func Sauvola(img *image.Gray, ksize float64, windowsize int) *image.Gray

Implements Sauvola's algorithm for text binarization, see paper "Adaptive document image binarization" (2000)

func Wipe

func Wipe(img *image.Gray, wsize int, thresh float64, min int) *image.Gray

Wipe fills the sections of image which fall outside the content area with white, providing the content area is above min %

func WipeFile

func WipeFile(inPath string, outPath string, wsize int, thresh float64, min int) error

WipeFile wipes an image file, filling the sections of the image which fall outside the content area with white, providing the content area is above min %. inPath: path of the input image. outPath: path to save the output image. wsize: window size for wipe algorithm. thresh: threshold for wipe algorithm. min: minimum % of content area width to consider valid.

Types

type UsefulImg

type UsefulImg interface {
	MeanWindow()
	MeanStdDevWindow()
}

TODO: name better; maybe verb, x-er TODO: implement these for regular image, and use them to make

image functions generic for integral and non- images

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL