Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Annotation ¶
Annotation returns index where unparsed part starts. In case if the full string can be parsed, returns returns the index of the end of the input.
func CleanupStream ¶ added in v0.7.0
CleanupStream takes input and output string channels, and feeds output with pipe delimited strings with original name on the left and cleaned up name on the right from the pipe.
func NormalizeHybridChar ¶
NormalizeHybridChar substitutes hybrid chars 'X' or 'x' with the multiplication sign char.
func StripTags ¶ added in v0.7.0
StripTags takes a slice of bytes and returns a string with common tags removed and html entities escaped. It does keep all uncommon tags intact to let parser deal with them.
func UnderscoreToSpace ¶ added in v0.7.0
UnderscoreToSpace takes a slice of bytes. If it finds that the string contains underscores, but not spaces, it substitutes underscores to spaces in the slice. In case if any spaces are present, the slice is returned unmodified.
Types ¶
type Preprocessor ¶
type Preprocessor struct { Virus bool Underscore bool NoParse bool Approximate bool Annotation bool Body []byte Tail []byte }
func Preprocess ¶
func Preprocess(bs []byte) *Preprocessor
Preprocess runs a series of regular expressions over the input to determine features of the input before parsing.