extractor

package
v0.0.0-...-8aef35c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 27, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const (
	StrategyReadability = "readability"
	StrategySelector    = "selector"
)

Strategy constants for extraction methods.

Variables

This section is empty.

Functions

This section is empty.

Types

type Extractor

type Extractor interface {
	Extract(html io.Reader, pageURL string, logger *slog.Logger) (string, error)
}

Extractor pulls the main content from an HTML document.

func New

func New(strategy, config string) (Extractor, error)

New returns an Extractor for the given strategy and config. An empty strategy defaults to StrategyReadability.

type ReadabilityExtractor

type ReadabilityExtractor struct{}

ReadabilityExtractor uses the go-readability library to extract article content.

func (*ReadabilityExtractor) Extract

func (e *ReadabilityExtractor) Extract(html io.Reader, pageURL string, logger *slog.Logger) (string, error)

type SelectorExtractor

type SelectorExtractor struct {
	Selector string
}

SelectorExtractor uses a CSS selector to extract a specific element from the page.

func (*SelectorExtractor) Extract

func (e *SelectorExtractor) Extract(html io.Reader, pageURL string, logger *slog.Logger) (string, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL