Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type FetchedCallback ¶
type FetchedCallback func(result *types.FetchResult)
FetchedCallback is a type alias for the callback function executed upon task completion.
type Fetcher ¶
type Fetcher struct { *FetcherConfig // contains filtered or unexported fields }
Fetcher provides scraper instances for fetching web pages.
func NewFetcher ¶
func NewFetcher(options ...FetcherOption) *Fetcher
NewFetcher creates a fetcher instance with builder options.
func (*Fetcher) Fetch ¶
Fetch starts scraping by HTTP requesting to the specified URL. Fetching result will be notified by callback functions if registered.
func (*Fetcher) OnFetched ¶
func (f *Fetcher) OnFetched(cb FetchedCallback)
OnFetched registers a callback function to be invoked after each task finishes. NB this function is not thread safe.
type FetcherConfig ¶
type FetcherConfig struct { // Async turns on asynchronous HTTP requesting. Async bool // Mirror downloads asset resources (such as images, CSS, and JavaScript) // within the HTML page to a local folder. Mirror bool }
FetcherConfig modifies fetcher behaviors.
type ThrottleClient ¶
type ThrottleClient struct { // Parallelism is the number of max allowed concurrent requests. // Default 0 with unlimited concurrencies. Parallelism int // contains filtered or unexported fields }
ThrottleClient is a throttled HTTP client that limits the number of concurrent requests to avoid resource overload and rate limiting issues.
func NewThrottleClient ¶
func NewThrottleClient(parallelism int) *ThrottleClient