request

package
v1.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: Apache-2.0 Imports: 13 Imported by: 31

Documentation

Overview

Package request provides encapsulation and deduplication of crawl requests.

Index

Constants

View Source
const (
	DefaultDialTimeout = 2 * time.Minute // default server request timeout
	DefaultConnTimeout = 2 * time.Minute // default download timeout
	DefaultTryTimes    = 3               // default max download attempts
	DefaultRetryPause  = 2 * time.Second // default pause before retry
)
View Source
const (
	SurfID    = 0 // Surf downloader (native Go), do not change
	PhantomID = 1 // PhantomJS downloader (fallback, rarely used)
	ChromeID  = 2 // Chromium headless browser downloader
)

Variables

This section is empty.

Functions

func UnSerialize

func UnSerialize(s string) result.Result[*Request]

UnSerialize deserializes a Request from JSON string.

Types

type Request

type Request struct {
	Spider        string          // spider name, auto-set, do not set manually
	URL           string          // target URL, required
	Rule          string          // rule node name for parsing response, required
	Method        string          // GET POST POST-M HEAD
	Header        http.Header     // request headers
	EnableCookie  bool            // whether to use cookies, set in Spider.EnableCookie
	PostData      string          // POST values
	DialTimeout   time.Duration   // dial timeout (dial tcp: i/o timeout)
	ConnTimeout   time.Duration   // connection timeout (WSARecv tcp: i/o timeout)
	TryTimes      int             // max download retry attempts
	RetryPause    time.Duration   // wait time before retry after download failure
	RedirectTimes int             // max redirects; 0=unlimited, <0=no redirects
	Temp          Temp            // temporary data
	TempIsJSON    map[string]bool // marks Temp fields stored as JSON; auto-set, do not set manually
	Priority      int             // scheduling priority, default 0 (min priority)
	Reloadable    bool            // whether the link can be re-downloaded
	// DownloaderID: 0=Surf (high concurrency, full features), 1=PhantomJS (strong anti-block, slow, low concurrency)
	DownloaderID int
	// contains filtered or unexported fields
}

Request represents object waiting for being crawled.

func (*Request) AddHeader

func (r *Request) AddHeader(key, value string) *Request

func (*Request) Copy

func (r *Request) Copy() result.Result[*Request]

Copy returns a deep copy of the request.

func (*Request) GetConnTimeout

func (r *Request) GetConnTimeout() time.Duration

func (*Request) GetCookies

func (r *Request) GetCookies() string

func (*Request) GetDialTimeout

func (r *Request) GetDialTimeout() time.Duration

func (*Request) GetDownloaderID

func (r *Request) GetDownloaderID() int

func (*Request) GetEnableCookie

func (r *Request) GetEnableCookie() bool

func (*Request) GetHeader

func (r *Request) GetHeader() http.Header

func (*Request) GetMethod

func (r *Request) GetMethod() string

GetMethod returns the HTTP method name (e.g. GET, POST).

func (*Request) GetPostData

func (r *Request) GetPostData() string

func (*Request) GetPriority

func (r *Request) GetPriority() int

func (*Request) GetProxy

func (r *Request) GetProxy() string

func (*Request) GetRedirectTimes

func (r *Request) GetRedirectTimes() int

func (*Request) GetReferer

func (r *Request) GetReferer() string

func (*Request) GetRetryPause

func (r *Request) GetRetryPause() time.Duration

func (*Request) GetRuleName

func (r *Request) GetRuleName() string

func (*Request) GetSpiderName

func (r *Request) GetSpiderName() string

func (*Request) GetTemp

func (r *Request) GetTemp(key string, defaultValue interface{}) interface{}

GetTemp returns temporary cached data. defaultValue must not be nil.

func (*Request) GetTempOpt added in v1.4.0

func (r *Request) GetTempOpt(key string) option.Option[interface{}]

GetTempOpt returns temporary cached data as Option. None when key is missing.

func (*Request) GetTemps

func (r *Request) GetTemps() Temp

func (*Request) GetTryTimes

func (r *Request) GetTryTimes() int

func (*Request) GetURL added in v1.4.0

func (r *Request) GetURL() string

GetURL returns the request URL.

func (*Request) IsReloadable

func (r *Request) IsReloadable() bool

func (*Request) MarshalJSON

func (r *Request) MarshalJSON() ([]byte, error)

func (*Request) Prepare

func (r *Request) Prepare() result.VoidResult

Prepare sets default values before sending a request. Request.URL and Request.Rule must be set. Request.Spider is auto-set by the system. Request.EnableCookie is set in Spider; per-request values are ignored. Optional fields with defaults: Method (GET), DialTimeout, ConnTimeout, TryTimes, RedirectTimes, RetryPause, DownloaderID (0=Surf, 1=PhantomJS).

func (*Request) Serialize

func (r *Request) Serialize() result.Result[string]

Serialize serializes the Request to JSON string.

func (*Request) SetCookies

func (r *Request) SetCookies(cookie string) *Request

func (*Request) SetDownloaderID

func (r *Request) SetDownloaderID(id int) *Request

func (*Request) SetEnableCookie

func (r *Request) SetEnableCookie(enableCookie bool) *Request

func (*Request) SetHeader

func (r *Request) SetHeader(key, value string) *Request

func (*Request) SetMethod

func (r *Request) SetMethod(method string) *Request

SetMethod sets the HTTP method.

func (*Request) SetPriority

func (r *Request) SetPriority(priority int) *Request

func (*Request) SetProxy

func (r *Request) SetProxy(proxy string) *Request

func (*Request) SetReferer

func (r *Request) SetReferer(referer string) *Request

func (*Request) SetReloadable

func (r *Request) SetReloadable(can bool) *Request

func (*Request) SetRuleName

func (r *Request) SetRuleName(ruleName string) *Request

func (*Request) SetSpiderName

func (r *Request) SetSpiderName(spiderName string) *Request

func (*Request) SetTemp

func (r *Request) SetTemp(key string, value interface{}) *Request

func (*Request) SetTemps

func (r *Request) SetTemps(temp map[string]interface{}) *Request

func (*Request) SetURL added in v1.4.0

func (r *Request) SetURL(url string) *Request

func (*Request) Unique

func (r *Request) Unique() string

Unique returns the unique identifier for the request.

type Temp

type Temp map[string]interface{}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL