Documentation ¶
Overview ¶
Package surfer s a Go language prepared by the high concurrent web downloader, support GET/POST/HEAD Method and method http/https Protocol, while supporting the fixed UserAgent automatically save the cookie with a random large number of UserAgent disabled cookie two modes, a high degree of simulation of the browser behavior, enabling analog login and other functions.
Index ¶
- Constants
- func AutoToUTF8(resp *http.Response) error
- func BodyBytes(resp *http.Response) ([]byte, error)
- func DestroyJsFiles()
- func DestroyLuaScriptFiles()
- func Download(req Request) (resp *http.Response, err error)
- func GetWDPath() string
- func IsDirExists(path string) bool
- func IsFileExists(path string) bool
- func URLEncode(urlStr string) (*url.URL, error)
- func WalkDir(targpath string, suffixes ...string) (dirlist []string)
- type Body
- type DefaultRequest
- func (defaultRequest *DefaultRequest) GetConnTimeout() time.Duration
- func (defaultRequest *DefaultRequest) GetDialTimeout() time.Duration
- func (defaultRequest *DefaultRequest) GetDownloaderID() int
- func (defaultRequest *DefaultRequest) GetEnableCookie() bool
- func (defaultRequest *DefaultRequest) GetHeader() http.Header
- func (defaultRequest *DefaultRequest) GetMethod() string
- func (defaultRequest *DefaultRequest) GetPostData() string
- func (defaultRequest *DefaultRequest) GetProxy() string
- func (defaultRequest *DefaultRequest) GetRedirectTimes() int
- func (defaultRequest *DefaultRequest) GetRetryPause() time.Duration
- func (defaultRequest *DefaultRequest) GetTryTimes() int
- func (defaultRequest *DefaultRequest) GetURL() string
- type Param
- type Phantom
- type Request
- type Response
- type Splash
- type Surf
- type Surfer
Constants ¶
const ( SurfID = 0 // Surf Downloader Identifier PhomtomJsID = 1 // PhomtomJs downloader identifier SplashID = 2 // Splash downloader identifier DefaultMethod = "GET" // Default request method DefaultDialTimeout = 2 * time.Minute // The default request server timed out DefaultConnTimeout = 2 * time.Minute // Default download timeout DefaultTryTimes = 3 // Default maximum number of downloads DefaultRetryPause = 2 * time.Second // Default to re-download before pause )
Variables ¶
This section is empty.
Functions ¶
func AutoToUTF8 ¶
Using the surf kernel to download, you can try to automatically transcode to utf8 Using phantomjs kernel, no transcoding (is utf8)
func DestroyJsFiles ¶
func DestroyJsFiles()
DestroyJsFiles is a funtion for destroy phantomjs js temporary files
func DestroyLuaScriptFiles ¶
func DestroyLuaScriptFiles()
DestroyLuaScriptFiles is a funtion for destroy phantomjs js temporary files
Types ¶
type DefaultRequest ¶
type DefaultRequest struct { // url (required) URL string // GET POST POST-M HEAD (The default is GET) Method string // http header Header http.Header // Whether to use cookies, set in Spider's EnableCookie EnableCookie bool // POST values PostData string // dial tcp: i/o timeout DialTimeout time.Duration // WSARecv tcp: i/o timeout ConnTimeout time.Duration // the max times of download TryTimes int // how long pause when retry RetryPause time.Duration // max redirect times // when RedirectTimes equal 0, redirect times is ∞ // when RedirectTimes less than 0, redirect times is 0 RedirectTimes int // the download ProxyHost Proxy string // Tentukan Downloader ID // 0 Surf Download concurrency tinggi, berbagai fungsi kontrol penuh // 1 PhantomJS downloader, fitur yang kuat anti-pecah, lambat, concurrency rendah DownloaderID int // contains filtered or unexported fields }
The default implementation of the Request
func (*DefaultRequest) GetConnTimeout ¶
func (defaultRequest *DefaultRequest) GetConnTimeout() time.Duration
GetConnTimeout WSARecv tcp: i/o timeout
func (*DefaultRequest) GetDialTimeout ¶
func (defaultRequest *DefaultRequest) GetDialTimeout() time.Duration
GetDialTimeout dial tcp: i/o timeout
func (*DefaultRequest) GetDownloaderID ¶
func (defaultRequest *DefaultRequest) GetDownloaderID() int
select Surf ro PhomtomJS
func (*DefaultRequest) GetEnableCookie ¶
func (defaultRequest *DefaultRequest) GetEnableCookie() bool
GetEnableCookie enable http cookies
func (*DefaultRequest) GetHeader ¶
func (defaultRequest *DefaultRequest) GetHeader() http.Header
GetHeader http header
func (*DefaultRequest) GetMethod ¶
func (defaultRequest *DefaultRequest) GetMethod() string
GetMethod GET POST POST-M HEAD
func (*DefaultRequest) GetPostData ¶
func (defaultRequest *DefaultRequest) GetPostData() string
GetPostData POST values
func (*DefaultRequest) GetProxy ¶
func (defaultRequest *DefaultRequest) GetProxy() string
GetProxy is the download ProxyHost
func (*DefaultRequest) GetRedirectTimes ¶
func (defaultRequest *DefaultRequest) GetRedirectTimes() int
max redirect times
func (*DefaultRequest) GetRetryPause ¶
func (defaultRequest *DefaultRequest) GetRetryPause() time.Duration
GetRetryPause is the pause time of retry
func (*DefaultRequest) GetTryTimes ¶
func (defaultRequest *DefaultRequest) GetTryTimes() int
GetTryTimes is the max times of download
func (*DefaultRequest) GetURL ¶
func (defaultRequest *DefaultRequest) GetURL() string
GetURL is a func ...
type Phantom ¶
type Phantom struct { PhantomjsFile string // Phantomjs full file name TempJsDir string // Temporary js storage directory // contains filtered or unexported fields }
based on Phantomjs downloader implementation, as surfer added efficiency is much slower than surfer, but because of the analog browser, break better support UserAgent / TryTimes / RetryPause / custom js
func (*Phantom) DestroyJsFiles ¶
func (phantom *Phantom) DestroyJsFiles()
DestroyJsFiles is a funtion for destroy js temporary files
type Request ¶
type Request interface { // url GetURL() string // GET POST POST-M HEAD GetMethod() string // POST values GetPostData() string // http header GetHeader() http.Header // enable http cookies GetEnableCookie() bool // dial tcp: i/o timeout GetDialTimeout() time.Duration // WSARecv tcp: i/o timeout GetConnTimeout() time.Duration // the max times of download GetTryTimes() int // the pause time of retry GetRetryPause() time.Duration // the download ProxyHost GetProxy() string // max redirect times GetRedirectTimes() int // select Surf ro PhomtomJS GetDownloaderID() int }
type Response ¶
based on Phantomjs downloader implementation, as surfer added efficiency is much slower than surfer, but because of the analog browser, break better support UserAgent / TryTimes / RetryPause / custom js
type Splash ¶
type Splash struct { SplashServer string // Splash Server host and port TempLuaScriptDir string // Temporary lua script storage directory // contains filtered or unexported fields }
Splash is struct for represent splash API
func (*Splash) DestroyLuaScriptFiles ¶
func (splash *Splash) DestroyLuaScriptFiles()
DestroyLuaScriptFiles is a funtion for destroy js temporary files
type Surf ¶
type Surf struct {
// contains filtered or unexported fields
}
Default is the default Download implementation.
type Surfer ¶
type Surfer interface { // GET @param url string, header http.Header, cookies []*http.Cookie // HEAD @param url string, header http.Header, cookies []*http.Cookie // POST PostForm @param url, referer string, values url.Values, header http.Header, cookies []*http.Cookie // POST-M PostMultipart @param url, referer string, values url.Values, header http.Header, cookies []*http.Cookie Download(Request) (resp *http.Response, err error) }
Surfer is a function downloader represents a core of HTTP web browser for crawler.
func NewPhantom ¶
NewPhantom is a func to create phantomjs downloader