boilertext

package
v0.0.0-...-75c0cbd Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 15, 2017 License: MIT Imports: 7 Imported by: 2

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Extractor

type Extractor interface {
	Process(blocks []*TextBlock) (string, error)
}

Extractor is an interface that processes incoming HTML and outputs text within HTML minus all the boilerplate

type TextBlock

type TextBlock struct {
	NumOfWords       int
	NumOfAnchorWords int
	Content          string
}

TextBlock represents a text block which may comprise of inline elements.

func GenerateTextBlocks

func GenerateTextBlocks(reader io.Reader, splitStrategy bufio.SplitFunc) ([]*TextBlock, error)

GenerateTextBlocks takes a reader containing HTML and generates a TextBlock array from it.

func (*TextBlock) LinkDensity

func (t *TextBlock) LinkDensity() float64

LinkDensity is the number of link text words divided by the total number of words in the block.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL