preprocess

package
Version: v0.14.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 15, 2020 License: MIT Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Annotation

func Annotation(bs []byte) int

Annotation returns index where unparsed part starts. In case if the full string can be parsed, returns returns the index of the end of the input.

func CleanupStream added in v0.7.0

func CleanupStream(in <-chan string, out chan<- *CleanupResult, wn int)

CleanupStream takes input and output string channels, and feeds output with pipe delimited strings with original name on the left and cleaned up name on the right from the pipe.

func IsVirus

func IsVirus(data []byte) bool

func NoParse

func NoParse(data []byte) bool

func NormalizeHybridChar

func NormalizeHybridChar(bs []byte) []byte

NormalizeHybridChar substitutes hybrid chars 'X' or 'x' with the multiplication sign char.

func StripTags added in v0.7.0

func StripTags(s string) string

StripTags takes a slice of bytes and returns a string with common tags removed and html entities escaped. It does keep all uncommon tags intact to let parser deal with them.

func UnderscoreToSpace added in v0.7.0

func UnderscoreToSpace(bs []byte) (bool, error)

UnderscoreToSpace takes a slice of bytes. If it finds that the string contains underscores, but not spaces, it substitutes underscores to spaces in the slice. In case if any spaces are present, the slice is returned unmodified.

func VirusLikeName added in v0.14.0

func VirusLikeName(name string) bool

Types

type CleanupResult added in v0.9.0

type CleanupResult struct {
	Input  string
	Output string
}

type Preprocessor

type Preprocessor struct {
	Virus       bool
	Underscore  bool
	NoParse     bool
	Approximate bool
	Annotation  bool
	Body        []byte
	Tail        []byte
}

func Preprocess

func Preprocess(bs []byte) *Preprocessor

Preprocess runs a series of regular expressions over the input to determine features of the input before parsing.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
t or T : Toggle theme light dark auto
y or Y : Canonical URL