trafilatura

package
v0.8.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2024 License: GPL-3.0 Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Factory

func Factory(client fetch.Client) func() (fetch.URLFetcher, error)

Factory function for new fetcher.

Types

type TrafilaturaFetcher

type TrafilaturaFetcher struct {
	// contains filtered or unexported fields
}

func New added in v0.8.0

func New(client fetch.Client) (*TrafilaturaFetcher, error)

func (*TrafilaturaFetcher) Close

func (f *TrafilaturaFetcher) Close() error

func (*TrafilaturaFetcher) Fetch

func (f *TrafilaturaFetcher) Fetch(url *nurl.URL) (*resource.WebPage, error)

Fetch a URL and return a WebPage resource. The web page will be fetched and parsed using the Trafilatura library. The returned resource will contain the metadata and content text. The request's StatusCode will be set to the HTTP status code returned. If there's an error fetching the page, in addition to the returned error, the *resource.WebPage will contain partial data pertaining to the request.

func (*TrafilaturaFetcher) Open

func (f *TrafilaturaFetcher) Open(ctx context.Context) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL