fetcher

package
v0.0.0-...-9a3108e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 24, 2020 License: MIT Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultOptions = &Options{
	client:          http.DefaultClient,
	limitDuration:   5 * time.Second,
	timeoutDuration: 1 * time.Minute,
	burst:           1,
}

Default options to be used with a `Fetcher` instance

Functions

This section is empty.

Types

type Fetchable

type Fetchable interface {
	// Unique identifier for this fetchable item. This is useful in logging.
	Id() string

	// Build a request.
	Request() (*http.Request, error)

	// Validate the request before doing the actual fetch. This is useful for
	// example, to check if the store has already fetched the data recently.
	Validate() error

	// Callback to handle the http response corresponding to the request. This
	// can be used for example, to store data into the store, or to parse the
	// results in some way
	HandleResponse(*http.Response) error
}

Interface that defines what can be `fetched`. The request to be fetched is returned by `Request()` method. Before the actual fetching is performed, the `Validate()` method is called. Fetching only proceeds if that method returns a `nil` error. Finally, `HandleResponse()` is the callback when crawling is successful.

type Fetcher

type Fetcher struct {
	// contains filtered or unexported fields
}

Fetcher struct used to download

func NewFetcher

func NewFetcher() *Fetcher

Returns a new `Fetcher` instance.

func NewFetcherWithOptions

func NewFetcherWithOptions(options *Options) *Fetcher

Returns a `Fetcher` with specified options. If any fields of the option are equal to the zero value, we use the value from `DefaultOptions` instead. This allows a caller to specify only the changed options

func (*Fetcher) Fetch

func (f *Fetcher) Fetch(furl Fetchable) error

Performs the actual fetch of a given `Fetchable`. The steps it follows are:

  1. Build the request by calling `Request()`
  2. Validate the request by calling `Validate()`
  3. Wait until the rate limit allows the domain to be crawled, or options.timeoutDuration is exceeded
  4. Actually make the http request with the supplied client, calling `HandleResponse()` on the output

func (*Fetcher) FetchConcurrentlyWait

func (f *Fetcher) FetchConcurrentlyWait(urlChannel <-chan Fetchable, concurrency int)

Starts `concurrency` goroutines to fetch content from `urlChannel` in parallel. The goroutines end when the `urlChannel` is closed. This method waits until all the launched goroutines are complete.

Note: Please ensure you call `close()` on the `urlChannel`, or else this method will never return

type Options

type Options struct {
	// contains filtered or unexported fields
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL