model

package
v0.0.0-...-0906917 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 6, 2021 License: MIT Imports: 4 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNoChunks    = errors.New("document contains no chunks")
	ErrEmptyResult = errors.New("nothing found")
)

Functions

This section is empty.

Types

type Extractor

type Extractor struct {
	Labels []bool
}

Extractor utilizes the trained model to extract relevant html.Chunks from an html.Document.

func NewExtractor

func NewExtractor() *Extractor

NewExtractor creates and initializes a new Extractor.

func (*Extractor) Extract

func (ext *Extractor) Extract(doc *html.Document) (*util.Article, error)

Extract returns a list of relevant text chunks found in doc.

How it works

This function creates a feature vector for each chunk found in doc. A feature vector contains a numerical representation of the chunk's properties like HTML element type, parent element type, number of words, number of sentences and stuff like this.

A logistic regression model is used to calculate scores based on these feature vectors. Then, in some kind of meta / ensemble learning approach, a second type of feature vector is created based on these scores. This feature vector is fed to our random forest and finally the random forest's predictions are used to generate the result.

By now you might have noticed that I'm exceptionally bad at naming and describing things properly.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL