fetcher

package
v0.0.0-...-d30cde0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 23, 2023 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type FetchedCallback

type FetchedCallback func(result *types.FetchResult)

FetchedCallback is a type alias for the callback function executed upon task completion.

type Fetcher

type Fetcher struct {
	*FetcherConfig
	// contains filtered or unexported fields
}

Fetcher provides scraper instances for fetching web pages.

func NewFetcher

func NewFetcher(options ...FetcherOption) *Fetcher

NewFetcher creates a fetcher instance with builder options.

func (*Fetcher) Fetch

func (f *Fetcher) Fetch(url string) error

Fetch starts scraping by HTTP requesting to the specified URL. Fetching result will be notified by callback functions if registered.

func (*Fetcher) OnFetched

func (f *Fetcher) OnFetched(cb FetchedCallback)

OnFetched registers a callback function to be invoked after each task finishes. NB this function is not thread safe.

func (*Fetcher) Wait

func (f *Fetcher) Wait()

Wait blocks until all scraping jobs are done.

type FetcherConfig

type FetcherConfig struct {
	// Async turns on asynchronous HTTP requesting.
	Async bool
	// Mirror downloads asset resources (such as images, CSS, and JavaScript)
	// within the HTML page to a local folder.
	Mirror bool
}

FetcherConfig modifies fetcher behaviors.

type FetcherOption

type FetcherOption func(*Fetcher)

FetcherOption builder option on a fetcher.

func Async

func Async(a ...bool) FetcherOption

Async turns on asynchronous HTTP requesting.

func Mirror

func Mirror(a ...bool) FetcherOption

Mirror turns on mirror downloading.

type ThrottleClient

type ThrottleClient struct {
	// Parallelism is the number of max allowed concurrent requests.
	// Default 0 with unlimited concurrencies.
	Parallelism int
	// contains filtered or unexported fields
}

ThrottleClient is a throttled HTTP client that limits the number of concurrent requests to avoid resource overload and rate limiting issues.

func NewThrottleClient

func NewThrottleClient(parallelism int) *ThrottleClient

func (*ThrottleClient) Do

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL