Documentation
¶
Overview ¶
Package request provides encapsulation and deduplication of crawl requests.
Index ¶
- Constants
- func UnSerialize(s string) result.Result[*Request]
- type Request
- func (r *Request) AddHeader(key, value string) *Request
- func (r *Request) Copy() result.Result[*Request]
- func (r *Request) GetConnTimeout() time.Duration
- func (r *Request) GetCookies() string
- func (r *Request) GetDialTimeout() time.Duration
- func (r *Request) GetDownloaderID() int
- func (r *Request) GetEnableCookie() bool
- func (r *Request) GetHeader() http.Header
- func (r *Request) GetMethod() string
- func (r *Request) GetPostData() string
- func (r *Request) GetPriority() int
- func (r *Request) GetProxy() string
- func (r *Request) GetRedirectTimes() int
- func (r *Request) GetReferer() string
- func (r *Request) GetRetryPause() time.Duration
- func (r *Request) GetRuleName() string
- func (r *Request) GetSpiderName() string
- func (r *Request) GetTemp(key string, defaultValue interface{}) interface{}
- func (r *Request) GetTempOpt(key string) option.Option[interface{}]
- func (r *Request) GetTemps() Temp
- func (r *Request) GetTryTimes() int
- func (r *Request) GetURL() string
- func (r *Request) IsReloadable() bool
- func (r *Request) MarshalJSON() ([]byte, error)
- func (r *Request) Prepare() result.VoidResult
- func (r *Request) Serialize() result.Result[string]
- func (r *Request) SetCookies(cookie string) *Request
- func (r *Request) SetDownloaderID(id int) *Request
- func (r *Request) SetEnableCookie(enableCookie bool) *Request
- func (r *Request) SetHeader(key, value string) *Request
- func (r *Request) SetMethod(method string) *Request
- func (r *Request) SetPriority(priority int) *Request
- func (r *Request) SetProxy(proxy string) *Request
- func (r *Request) SetReferer(referer string) *Request
- func (r *Request) SetReloadable(can bool) *Request
- func (r *Request) SetRuleName(ruleName string) *Request
- func (r *Request) SetSpiderName(spiderName string) *Request
- func (r *Request) SetTemp(key string, value interface{}) *Request
- func (r *Request) SetTemps(temp map[string]interface{}) *Request
- func (r *Request) SetURL(url string) *Request
- func (r *Request) Unique() string
- type Temp
Constants ¶
View Source
const ( DefaultDialTimeout = 2 * time.Minute // default server request timeout DefaultConnTimeout = 2 * time.Minute // default download timeout DefaultTryTimes = 3 // default max download attempts DefaultRetryPause = 2 * time.Second // default pause before retry )
View Source
const ( SurfID = 0 // Surf downloader (native Go), do not change PhantomID = 1 // PhantomJS downloader (fallback, rarely used) ChromeID = 2 // Chromium headless browser downloader )
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Request ¶
type Request struct {
Spider string // spider name, auto-set, do not set manually
URL string // target URL, required
Rule string // rule node name for parsing response, required
Method string // GET POST POST-M HEAD
Header http.Header // request headers
EnableCookie bool // whether to use cookies, set in Spider.EnableCookie
PostData string // POST values
DialTimeout time.Duration // dial timeout (dial tcp: i/o timeout)
ConnTimeout time.Duration // connection timeout (WSARecv tcp: i/o timeout)
TryTimes int // max download retry attempts
RetryPause time.Duration // wait time before retry after download failure
RedirectTimes int // max redirects; 0=unlimited, <0=no redirects
Temp Temp // temporary data
TempIsJSON map[string]bool // marks Temp fields stored as JSON; auto-set, do not set manually
Priority int // scheduling priority, default 0 (min priority)
Reloadable bool // whether the link can be re-downloaded
// DownloaderID: 0=Surf (high concurrency, full features), 1=PhantomJS (strong anti-block, slow, low concurrency)
DownloaderID int
// contains filtered or unexported fields
}
Request represents object waiting for being crawled.
func (*Request) GetConnTimeout ¶
func (*Request) GetCookies ¶
func (*Request) GetDialTimeout ¶
func (*Request) GetDownloaderID ¶
func (*Request) GetEnableCookie ¶
func (*Request) GetPostData ¶
func (*Request) GetPriority ¶
func (*Request) GetRedirectTimes ¶
func (*Request) GetReferer ¶
func (*Request) GetRetryPause ¶
func (*Request) GetRuleName ¶
func (*Request) GetSpiderName ¶
func (*Request) GetTempOpt ¶ added in v1.4.0
GetTempOpt returns temporary cached data as Option. None when key is missing.
func (*Request) GetTryTimes ¶
func (*Request) IsReloadable ¶
func (*Request) MarshalJSON ¶
func (*Request) Prepare ¶
func (r *Request) Prepare() result.VoidResult
Prepare sets default values before sending a request. Request.URL and Request.Rule must be set. Request.Spider is auto-set by the system. Request.EnableCookie is set in Spider; per-request values are ignored. Optional fields with defaults: Method (GET), DialTimeout, ConnTimeout, TryTimes, RedirectTimes, RetryPause, DownloaderID (0=Surf, 1=PhantomJS).
func (*Request) SetCookies ¶
func (*Request) SetDownloaderID ¶
func (*Request) SetEnableCookie ¶
func (*Request) SetPriority ¶
func (*Request) SetReferer ¶
func (*Request) SetReloadable ¶
func (*Request) SetRuleName ¶
func (*Request) SetSpiderName ¶
Click to show internal directories.
Click to hide internal directories.