tesseract

package
v0.0.0-...-65af2ff Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2014 License: AGPL-3.0, BSD-2-Clause Imports: 10 Imported by: 0

README

go.tesseract

go.tesseract is a wrapper for the tesseract-ocr library.

go.tesseract is under heavy development and should not be used in a production environment.

Installation

You are required to install tesseract 3.02.02 or later. At time of writing this version of tesseract is not in the ubuntu repository yet. You absolutely need 3.02.02 (or later) as go.tesseract can not and will not compile with earlier versions.

Before you continue, make sure you have installed go.leptonica. Please follow the directions in it's readme.

Download, configure, make and install

svn checkout http://tesseract-ocr.googlecode.com/svn/tags/release-3.02.02 tesseract-ocr-read-only
cd tesseract-ocr-read-only
./autogen.sh
./configure
make
sudo make install
sudo ldconfig

Copy language files (do this for any language you require)

cp tessdata/eng.* /usr/local/share/tessdata/

For more information, view the tesseract compilation guide.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Version

func Version() string

Version returns both go.tesseract's version as well as the version from the tesseract lib (>3.02.02)

Types

type BoxCharacter

type BoxCharacter struct {
	Character  rune
	StartX     uint32
	StartY     uint32
	EndX       uint32
	EndY       uint32
	Pagenumber uint32
}

type BoxText

type BoxText struct {
	Characters []BoxCharacter
}

TODO: make this: `type BoxText []BoxCharacter` ?

type Tess

type Tess struct {
	// contains filtered or unexported fields
}

Tess represents a tesseract instance

func NewTess

func NewTess(datapath string, language string) (*Tess, error)

NewTess creates and returns a new tesseract instance.

func (*Tess) AvailableLanguages

func (t *Tess) AvailableLanguages() []string

AvailableLanguages returns the languages available to the given tesseract instance. To find the languages actually loaded use (*Tess).LoadedLanguages().

func (*Tess) BoxText

func (tess *Tess) BoxText(pagenumber int) (*BoxText, error)

BoxText returns the output given by BoxTextRaw as BoxText object

func (*Tess) BoxTextRaw

func (t *Tess) BoxTextRaw(pagenumber int) string

BoxTextRaw returns the raw box text for given pagenumber

func (*Tess) Clear

func (t *Tess) Clear()

Clear frees up recognition results and any stored image data, without actually freeing any recognition data that would be time-consuming to reload. Afterwards, you must call SetImagePix before doing any Recognize or Get* operation.

func (*Tess) Close

func (t *Tess) Close()

Close clears the tesseract instance from memory

func (*Tess) DumpVariables

func (t *Tess) DumpVariables()

DumpVariables dumps the variables set on a Tess to stdout

func (*Tess) HOCRText

func (t *Tess) HOCRText(pagenumber int) string

HOCRText returns the HOCR text for given pagenumber

func (*Tess) InitializedLanguages

func (t *Tess) InitializedLanguages() string

InitializedLanguages returns the languages string used in the last valid initialization. If the last initialization specified "deu+hin" then that will be returned. If hin loaded eng automatically as well, then that will not be included in this list. To find the languages actually loaded use (*Tess).LoadedLanguages().

func (*Tess) LoadedLanguages

func (t *Tess) LoadedLanguages() []string

LoadedLanguages returns the loaded languages in the vector of STRINGs. Includes all languages loaded for the given tesseract instance, including those loaded as dependencies of other loaded languages.

func (*Tess) SetImagePix

func (t *Tess) SetImagePix(pix *leptonica.Pix)

SetImagePix sets the input image using a leptonica Pix

func (*Tess) SetInputName

func (t *Tess) SetInputName(filename string)

SetInputName sets the name of the input file. Needed only for training and loading a UNLV zone file. ++ TODO: drop this?

func (*Tess) Text

func (t *Tess) Text() string

Text returns text after analysing the image(s)

func (*Tess) UNLVText

func (t *Tess) UNLVText() string

UNLVText returns the UNLV text

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL