Documentation ¶
Index ¶
- func Annotation(bs []byte) int
- func CleanupStream(in <-chan string, out chan<- *CleanupResult, wn int)
- func IsVirus(data []byte) bool
- func NoParse(data []byte) bool
- func NormalizeHybridChar(bs []byte) []byte
- func StripTags(s string) string
- func UnderscoreToSpace(bs []byte) (bool, error)
- func VirusLikeName(name string) bool
- type CleanupResult
- type Preprocessor
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Annotation ¶
Annotation returns index where unparsed part starts. In case if the full string can be parsed, returns returns the index of the end of the input.
func CleanupStream ¶ added in v0.7.0
func CleanupStream(in <-chan string, out chan<- *CleanupResult, wn int)
CleanupStream takes input and output string channels, and feeds output with pipe delimited strings with original name on the left and cleaned up name on the right from the pipe.
func NormalizeHybridChar ¶
NormalizeHybridChar substitutes hybrid chars 'X' or 'x' with the multiplication sign char.
func StripTags ¶ added in v0.7.0
StripTags takes a slice of bytes and returns a string with common tags removed and html entities escaped. It does keep all uncommon tags intact to let parser deal with them.
func UnderscoreToSpace ¶ added in v0.7.0
UnderscoreToSpace takes a slice of bytes. If it finds that the string contains underscores, but not spaces, it substitutes underscores to spaces in the slice. In case if any spaces are present, the slice is returned unmodified.
func VirusLikeName ¶ added in v0.14.0
Types ¶
type CleanupResult ¶ added in v0.9.0
type Preprocessor ¶
type Preprocessor struct { Virus bool Underscore bool NoParse bool Approximate bool Annotation bool Body []byte Tail []byte }
func Preprocess ¶
func Preprocess(bs []byte) *Preprocessor
Preprocess runs a series of regular expressions over the input to determine features of the input before parsing.