README

# rescribe.xyz/preproc package

This package contains various image processing methods which are
useful for preprocessing page images for OCR. It also contains
several commands in the cmd/ directory which can be used to
preprocess images directly.

This is a Go package, and can be installed in the standard go way,
by running `go get rescribe.xyz/preproc/...` and documentation
can be read with the `go doc` command or online at
<https://pkg.go.dev/rescribe.xyz/preproc>.

If you just want to install and use the commands, you can get the
package with `git clone https://git.rescribe.xyz/preproc`, and then
install them with `go install ./...` from within the `preproc`
directory.

## Commands

There are several commands in the cmd/ directory which are useful
in their own right as well as serving as examples of using the
package.

  - binarize     : binarises an image using the sauvola algorithm
  - pggraph      : creates a graph showing the proportion of black
                   pixels for slices through an image
  - preproc      : binarises and wipes an image
  - preprocmulti : binarises and wipes an image with multiple
                   binarisation ksize values
  - wipe         : wipes sections of an image that are outside an
                   area detected as content

## Bugs

The integral image operations don't produce exactly the same result
as their non-integral image counterparts. The difference is small
enough that it has little effect on the output images, but it ought
to be identical.

## Contributions

Any and all comments, bug reports, patches or pull requests would
be very welcomely received. Please email them to <nick@rescribe.xyz>.

## License

This package is licensed under the GPLv3. See the LICENSE file for
more details.
Expand ▾ Collapse ▴

Documentation

Overview

    preproc contains various image processing methods which are useful for preprocessing page images for OCR. It contains both library functions to incorporate into your own projects and standalone tools which can be used directly.

    Index

    Constants

    This section is empty.

    Variables

    This section is empty.

    Functions

    func BinToZeroInv

    func BinToZeroInv(bin *image.Gray, orig *image.RGBA) (*image.RGBA, error)

      BinToZeroInv converts a binary thresholded image to a zero inverse binary thresholded image

      func IntegralSauvola

      func IntegralSauvola(img image.Image, ksize float64, windowsize int) *image.Gray

        Implements Sauvola's algorithm using Integral Images, see paper "Efficient Implementation of Local Adaptive Thresholding Techniques Using Integral Images" and https://stackoverflow.com/questions/13110733/computing-image-integral

        func PreCalcedSauvola

        func PreCalcedSauvola(intImg integral.Image, intSqImg integral.SqImage, img image.Image, ksize float64, windowsize int) *image.Gray

          PreCalcedSauvola Implements Sauvola's algorithm using precalculated Integral Images

          func PreProcMulti

          func PreProcMulti(inPath string, ksizes []float64, binType string, binWsize int, wipe bool, wipeWsize int, wipeMinWidthPerc int, vWipeWsize int, wipeMinHeightPerc int) ([]string, error)

            PreProcMulti binarizes and preprocesses an image with multiple binarisation levels. inPath: Path of input image. ksizes: Slice of k values to pass to Sauvola algorithm binType: Type of binarization threshold. binary or zeroinv are currently implemented. binWsize: Window size for sauvola binarization algorithm. Set automatically based on resolution if 0. wipe: Whether to wipe (clear sides) the image wipeWsize: Window size for wiping algorithm wipeMinWidthPerc: Minimum percentage of the image width for the content width calculation to be considered valid vWipeWsize: Window size for vertical wiping algorithm wipeMinHeightPerc: Minimum percentage of the image height for the content height calculation to be considered valid

            func ProportionSlice

            func ProportionSlice(i SummableImage, x int, width int) float64

              ProportionSlice returns the proportion of black pixels in a vertical slice of an image starting at x, width pixels wide.

              func Sauvola

              func Sauvola(img image.Image, ksize float64, windowsize int) *image.Gray

                Implements Sauvola's algorithm for text binarization, see paper "Adaptive document image binarization" (2000)

                func VWipe

                func VWipe(img *image.Gray, wsize int, thresh float64, min int) *image.Gray

                  VWipe fills the sections of image which fall outside the vertical content area with white, providing the content area is above min %

                  func Wipe

                  func Wipe(img *image.Gray, wsize int, thresh float64, min int) *image.Gray

                    Wipe fills the sections of image which fall outside the content area with white, providing the content area is above min %

                    func WipeFile

                    func WipeFile(inPath string, outPath string, hwsize int, hthresh float64, hmin int, vwsize int, vthresh float64, vmin int) error

                      WipeFile wipes an image file, filling the sections of the image which fall outside the content area with white, providing the content area is above min %. inPath: path of the input image. outPath: path to save the output image. hwsize: window size (width) for horizontal wipe algorithm. hthresh: threshold for horizontal wipe algorithm. hmin: minimum % of content area width to consider valid. vwsize: window size (height) for vertical wipe algorithm. vthresh: threshold for vertical wipe algorithm. vmin: minimum % of content area height to consider valid.

                      Types

                      type SummableImage

                      type SummableImage interface {
                      	image.Image
                      	Sum(r image.Rectangle) uint64
                      }

                      Directories

                      Path Synopsis
                      cmd
                      binarize
                      binarize does fast Integral Image sauvola binarisation on an image
                      binarize does fast Integral Image sauvola binarisation on an image
                      pggraph
                      pggraph creates a graph showing the proportion of black pixels for slices through a binarised image.
                      pggraph creates a graph showing the proportion of black pixels for slices through a binarised image.
                      preproc
                      preproc runs binarisation and wipe preprocessing on an image
                      preproc runs binarisation and wipe preprocessing on an image
                      preprocmulti
                      preprocmulti runs binarisation with a variety of different binarisation levels, preprocessing and saving each version
                      preprocmulti runs binarisation with a variety of different binarisation levels, preprocessing and saving each version
                      wipe
                      wipe wipes sections of an image which are outside of an automatically determined content area
                      wipe wipes sections of an image which are outside of an automatically determined content area