goquery

package
v0.0.0-...-e7b49d2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 3, 2024 License: GPL-3.0 Imports: 17 Imported by: 0

Documentation

Index

Constants

View Source
const (
	DefaultMaxDepth  = 1
	DefaultParallels = 2
	DefaultDelay     = 3
	DefaultAsync     = true
)

Variables

View Source
var ErrScrapingFailed = errors.New("scraper could not read URL, or scraping is not allowed for provided URL")

Functions

func ExtractTextFromHTML

func ExtractTextFromHTML(htmlContent string) (string, error)

func ExtractURL

func ExtractURL(str string) (string, error)

func RemoveBlankLines

func RemoveBlankLines(input string) string

Types

type Options

type Options func(*Scraper)

func WithAsync

func WithAsync(async bool) Options

async: The boolean value indicating if the scraper should run asynchronously. Returns a function that sets the async option for the Scraper.

func WithBlacklist

func WithBlacklist(blacklist []string) Options

WithBlacklist creates an Options function that appends the url endpoints to be excluded from the scraping, to the current list

Default value:

[]string{
	"login",
	"signup",
	"signin",
	"register",
	"logout",
	"download",
	"redirect",
},

blacklist: slice of strings with url endpoints to be excluded from the scraping. Returns: an Options function.

func WithDelay

func WithDelay(delay int64) Options

WithDelay creates an Options function that sets the delay of a Scraper.

The delay parameter specifies the amount of time in milliseconds that the Scraper should wait between requests.

Default value: 3

delay: the delay to set. Returns: an Options function.

func WithMaxDepth

func WithMaxDepth(maxDepth int) Options

WithMaxDepth sets the maximum depth for the Scraper.

Default value: 1

maxDepth: the maximum depth to set. Returns: an Options function.

func WithNewBlacklist

func WithNewBlacklist(blacklist []string) Options

WithNewBlacklist creates an Options function that replaces the list of url endpoints to be excluded from the scraping, with a new list.

Default value:

[]string{
	"login",
	"signup",
	"signin",
	"register",
	"logout",
	"download",
	"redirect",
},

blacklist: slice of strings with url endpoints to be excluded from the scraping. Returns: an Options function.

func WithParallelsNum

func WithParallelsNum(parallels int) Options

WithParallelsNum sets the number of maximum allowed concurrent requests of the matching domains

Default value: 2

parallels: the number of parallels to set. Returns: the updated Scraper options.

type Scraper

type Scraper struct {
	MaxDepth  int
	Parallels int
	Delay     int64
	Blacklist []string
	Async     bool
	// contains filtered or unexported fields
}

func New

func New(options ...Options) (*Scraper, error)

func (Scraper) Call

func (s Scraper) Call(ctx context.Context, input string) (string, error)

func (Scraper) Description

func (s Scraper) Description() string

func (Scraper) Name

func (s Scraper) Name() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL