Documentation ¶
Overview ¶
Package getgo is a concurrent web scrapping framework.
Index ¶
- Variables
- func Run(runner Runner, tx Tx, tasks ...interface{}) error
- type Atomized
- type ConcurrentRunner
- type Doer
- type ErrorHandler
- type ErrorHandlerFunc
- type HTMLTask
- type HTTPLogger
- type Requester
- type RetryDoer
- type Runner
- type SequentialRunner
- type Storable
- type StorableTask
- type Storer
- type Task
- type TaskGroup
- type Text
- type TextTask
- type Tx
Constants ¶
This section is empty.
Variables ¶
var RetryNum = 3
RetryNum is the retry number when failed to fetch a page.
Functions ¶
Types ¶
type Atomized ¶
type Atomized struct { StorableTask Tx }
Atomized is an adapter that converts a StorableTask to an atomized Task that supports transaction.
type ConcurrentRunner ¶
type ConcurrentRunner struct {
// contains filtered or unexported fields
}
ConcurrentRunner runs tasks concurrently.
func NewConcurrentRunner ¶
func NewConcurrentRunner(workerNum int, client Doer, errHandler ErrorHandler) ConcurrentRunner
NewConcurrentRunner creates a concurrent runner.
func (ConcurrentRunner) Close ¶
func (r ConcurrentRunner) Close()
Close implements the Close method of the Runner interface.
func (ConcurrentRunner) Run ¶
func (r ConcurrentRunner) Run(task Task) error
Run implements the Run method of the Runner interface.
type ErrorHandler ¶
ErrorHandler is used to call back an external error handler when a task fails.
type ErrorHandlerFunc ¶
ErrorHandlerFunc converts a function object to a ErrorHandler interface.
func (ErrorHandlerFunc) HandleError ¶
func (f ErrorHandlerFunc) HandleError(request *http.Request, err error) error
HandleError implements ErrorHandler interface.
type HTMLTask ¶
HTMLTask is an HTML task should be able to Parse an HTML node tree to a slice of objects.
type HTTPLogger ¶
type HTTPLogger struct {
// contains filtered or unexported fields
}
HTTPLogger wraps an HTTP client and logs the request and network speed.
func NewHTTPLogger ¶
func NewHTTPLogger(client *http.Client) *HTTPLogger
NewHTTPLogger creates an HTTPLogger by inspecting the connection's Read method of an http.Client.
type Requester ¶
Requester is the interface that returns an HTTP request by Request method. The Request method must be implemented to allow repeated calls.
type Runner ¶
type Runner interface { Run(task Task) error // Run runs a task Close() // Close closes the runner }
Runner runs Tasks. A Runner gets an HTTP request from a Task, get the HTTP response and pass the response to the Task's Handle method. When a runner failed to get a response object, a nil response must still be passed to the Handle method to notify that a transaction must be rolled back if any.
type SequentialRunner ¶
type SequentialRunner struct { Client Doer ErrorHandler }
SequentialRunner is a simple single threaded task runner.
func (SequentialRunner) Close ¶
func (r SequentialRunner) Close()
Close implements the Close method of the Runner interface.
func (SequentialRunner) Run ¶
func (r SequentialRunner) Run(task Task) error
Run implements the Run method of the Runner interface.
type Storable ¶
type Storable struct {
TextTask
}
Storable is an adapter that converts a TextTask to a StorableTask.
type StorableTask ¶
StorableTask is a task that should be able to store data with a Storer passed to the Handle method.
type Storer ¶
type Storer interface {
Store(v interface{}) error
}
Storer provides the Store method to store an object parsed from an HTTP response.
type Task ¶
Task is an HTTP crawler task. It must provide an HTTP request and a method to handle an HTTP response.
type TaskGroup ¶
type TaskGroup struct { Tx // contains filtered or unexported fields }
TaskGroup makes a group of StorableTask as a single transaction.
func NewTaskGroup ¶
NewTaskGroup creates a TaskGroup from a trasaction object.
type Text ¶
type Text struct {
HTMLTask
}
Text is an adapter that converts an HTMLTask to a TextTask.