internal

package
v0.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2022 License: MIT Imports: 15 Imported by: 0

Documentation

Index

Constants

View Source
const (
	Cautious   string = "cautious"
	Courageous string = "courageous"
	Redundant  string = "redundant"
)
View Source
const Epsilon = "ε"

Epsilon is used to mark empty strings and slices in the IO of stoks.

View Source
const PStep = "recognition/post-correction"

PStep defines the OCR-D processing step.

View Source
const StokComment = "#"

StokComment defines the start of comments.

View Source
const StokNamePref = "#name="

StokNamePref defines the line prefix for file names.

View Source
const Version = "v0.9.0"

Version defines the version of apoco.

Variables

This section is empty.

Functions

func ConnectProfile added in v0.0.37

func ConnectProfile(c *Config, suffix string) apoco.StreamFunc

ConnectProfile generates the profile by running the profiler or reads the profile from the cache and connects the profile with the tokens.

func E added in v0.0.21

func E(str string) string

func EachStok added in v0.0.42

func EachStok(r io.Reader, f func(string, Stok) error) error

EachStok calls the given callback function f for each token read from r with the according name. Stokens are read line by line from the reader, lines starting with # are skipped. If a line starting with '#name=x' is encountered the name for the callback function is updated accordingly.

func FilterLex added in v0.0.60

func FilterLex(c *Config) apoco.StreamFunc

func IDFromFilePath

func IDFromFilePath(path, fg string) string

IDFromFilePath generates an id based on the file group and the file path.

func UpdateInConfig added in v0.0.46

func UpdateInConfig(dest, val interface{})

UpdateInConfig updates the value in dest with val if the according value is not the zero-type for the underlying type. Dest must be a pointer type to either string, int, float64 or bool. Otherwise the function panics.

Types

type Config added in v0.0.29

type Config struct {
	Model    string                    `json:"model,omitempty"`
	LM       map[string]apoco.LMConfig `json:"lm"`
	Profiler ProfilerConfig            `json:"profiler"`
	RR       TrainingConfig            `json:"rr"`
	DM       DMConfig                  `json:"dm"`
	MS       TrainingConfig            `json:"ms"`
	FF       TrainingConfig            `json:"ff"`
	Nocr     int                       `json:"nocr"`
	Cache    bool                      `json:"cache"`
	GT       bool                      `json:"gt"`
	AlignLev bool                      `json:"alignLev"`
	Lex      bool                      `json:"lex"`
}

Config defines the command's configuration.

func ReadConfig added in v0.0.29

func ReadConfig(name string) (*Config, error)

ReadConfig reads the config from a json or toml file. If the name is empty, an empty configuration file is returned. If name has the prefix '{' and the suffix '}' the name is interpreted as a json string and parsed accordingly (OCR-D compability).

type DMConfig added in v0.0.52

type DMConfig struct {
	TrainingConfig
	Filter string `json:"filter"` // cautious, courageous or redundant
}

DMConfig encloses settings for dm training.

type Model added in v0.0.52

type Model = apoco.Model

Aliases for Model holds the different models for the different training runs for a different number of OCRs. It is used to save and load the models for the automatic postcorrection.

func ReadModel added in v0.0.52

func ReadModel(name string, lms map[string]apoco.LMConfig, create bool) (*Model, error)

ReadModel reads a model from a gob compressed input file. If the given file does not exist, the according language models are loaded and a new model is returned. If create is set to false no new model will be created and the model must be read from an existing file.

type ModelData added in v0.0.52

type ModelData = apoco.ModelData

Aliases for Model holds the different models for the different training runs for a different number of OCRs. It is used to save and load the models for the automatic postcorrection.

type Piper added in v0.0.17

type Piper struct {
	IFGS, Exts, Dirs []string
	METS             string
	AlignLev         bool
}

func (Piper) Pipe added in v0.0.17

func (p Piper) Pipe(ctx context.Context, fns ...apoco.StreamFunc) error

type ProfilerConfig added in v0.0.52

type ProfilerConfig struct {
	Exe    string `json:"exe"`
	Config string `json:"config"`
}

ProfilerConfig holds the profiler's configuration values.

type Stok added in v0.0.21

type Stok struct {
	OCR, Sug, GT, ID         string
	OCRConfs                 []float64
	Conf                     float64
	Rank                     int
	Skipped, Short, Lex, Cor bool
}

Stok represents a stats token. Stat tokens explain the correction decisions of apoco and form the basis of the correction protocols.

func MakeStokFromLine added in v0.0.62

func MakeStokFromLine(line string) (Stok, error)

MakeStokFromLine creates a new stats token from a according formatted line.

func MakeStokFromT added in v0.0.42

func MakeStokFromT(t apoco.T, gt bool) Stok

func (Stok) Cause added in v0.0.27

func (s Stok) Cause(limit int) StokCause

Cause returns the cause of a correction error. There are 3 possibilities. Either the correction candidate was missing, the correct correction candidate was not selected by the reranker or the correct correction canidate would have been available but could not be selected because of the imposed limit of the number of correction candidates. If the limit smaller or equal to 0, no limit is imposed.

func (Stok) ErrAfter added in v0.0.38

func (s Stok) ErrAfter() bool

func (Stok) ErrBefore added in v0.0.38

func (s Stok) ErrBefore() bool

func (Stok) Merge added in v0.0.29

func (s Stok) Merge() bool

Merge returns true if the token contains merged OCR-tokens.

func (Stok) Split added in v0.0.29

func (s Stok) Split(before Stok) bool

func (Stok) String added in v0.0.21

func (s Stok) String() string

func (Stok) Type added in v0.0.27

func (s Stok) Type() StokType

Type returns the correction type of the stok.

type StokCause added in v0.0.27

type StokCause int

StokCause gives the cause of errors.

const (
	BadRank          StokCause = iota // Bad correction because of a bad rank.
	BadLimit                          // Bad correction because of a bad limit for the correction candidates.
	MissingCandidate                  // Bad correction because of a missing correct correction candidate.
)

func (StokCause) String added in v0.0.27

func (i StokCause) String() string

type StokType added in v0.0.27

type StokType int

StokType gives the type of stoks.

const (
	SkippedShort                       StokType = iota // Skipped short token.
	SkippedShortErr                                    // Error in short token.
	SkippedNoCand                                      // Skipped no canidate token.
	SkippedNoCandErr                                   // Error in skipped no candidate token.
	SkippedLex                                         // Skipped lexical token.
	FalseFriend                                        // Error in skipped lexical token (false friend).
	RedundantCorrection                                // Redundant correction.
	InfelicitousCorrection                             // Infelicitous correction.
	SuccessfulCorrection                               // Successful correction.
	DoNotCareCorrection                                // Do not care correction.
	SuspiciousNotReplacedCorrect                       // Accept OCR.
	DodgedBullet                                       // Dogded bullet.
	MissedOpportunity                                  // Missed opportunity.
	SuspiciousNotReplacedNotCorrectErr                 // Skipped do not care.
)

func (StokType) Err added in v0.0.27

func (s StokType) Err() bool

Err returns true if the stok type marks an Error.

func (StokType) Skipped added in v0.0.27

func (s StokType) Skipped() bool

IsSkipped returns true if the stok type marks a skipped tokens.

func (StokType) String added in v0.0.27

func (i StokType) String() string

type TrainingConfig added in v0.0.52

type TrainingConfig struct {
	Features     []string `json:"features"`
	LearningRate float64  `json:"learningRate"`
	Ntrain       int      `json:"ntrain"`
}

TrainingConfig encloses different training settings.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL