Documentation
¶
Overview ¶
surfer是一款Go语言编写的高并发web下载器,支持 GET/POST/HEAD 方法及 http/https 协议,同时支持固定UserAgent自动保存cookie与随机大量UserAgent禁用cookie两种模式,高度模拟浏览器行为,可实现模拟登录等功能。
Index ¶
- Constants
- func AutoToUTF8(resp *http.Response) error
- func BodyBytes(resp *http.Response) ([]byte, error)
- func DestroyJsFiles()
- func Download(req Request) (resp *http.Response, err error)
- func GetWDPath() string
- func IsDirExists(path string) bool
- func IsFileExists(path string) bool
- func UrlEncode(urlStr string) (*url.URL, error)
- func WalkDir(targpath string, suffixes ...string) (dirlist []string)
- type Body
- type Cookie
- type DefaultRequest
- func (self *DefaultRequest) GetConnTimeout() time.Duration
- func (self *DefaultRequest) GetDialTimeout() time.Duration
- func (self *DefaultRequest) GetDownloaderID() int
- func (self *DefaultRequest) GetEnableCookie() bool
- func (self *DefaultRequest) GetHeader() http.Header
- func (self *DefaultRequest) GetMethod() string
- func (self *DefaultRequest) GetPostData() string
- func (self *DefaultRequest) GetProxy() string
- func (self *DefaultRequest) GetRedirectTimes() int
- func (self *DefaultRequest) GetRetryPause() time.Duration
- func (self *DefaultRequest) GetTryTimes() int
- func (self *DefaultRequest) GetUrl() string
- type DnsCache
- type Param
- type Phantom
- type Request
- type Response
- type Surf
- type Surfer
Constants ¶
Variables ¶
This section is empty.
Functions ¶
func AutoToUTF8 ¶
采用surf内核下载时,可以尝试自动转码为utf8 采用phantomjs内核时,无需转码(已是utf8)
func IsDirExists ¶
The IsDirExists judges path is directory or not.
func IsFileExists ¶
The IsFileExists judges path is file or not.
Types ¶
type Cookie ¶
type Cookie struct {
Name string `json:"name"`
Value string `json:"value"`
Domain string `json:"domain"`
Path string `json:"path"`
}
给phantomjs传输cookie用
type DefaultRequest ¶
type DefaultRequest struct {
// url (必须填写)
Url string
// GET POST POST-M HEAD (默认为GET)
Method string
// http header
Header http.Header
// 是否使用cookies,在Spider的EnableCookie设置
EnableCookie bool
// POST values
PostData string
// dial tcp: i/o timeout
DialTimeout time.Duration
// WSARecv tcp: i/o timeout
ConnTimeout time.Duration
// the max times of download
TryTimes int
// how long pause when retry
RetryPause time.Duration
// max redirect times
// when RedirectTimes equal 0, redirect times is ∞
// when RedirectTimes less than 0, redirect times is 0
RedirectTimes int
// the download ProxyHost
Proxy string
// 指定下载器ID
// 0为Surf高并发下载器,各种控制功能齐全
// 1为PhantomJS下载器,特点破防力强,速度慢,低并发
DownloaderID int
// contains filtered or unexported fields
}
默认实现的Request
func (*DefaultRequest) GetConnTimeout ¶
func (self *DefaultRequest) GetConnTimeout() time.Duration
WSARecv tcp: i/o timeout
func (*DefaultRequest) GetDialTimeout ¶
func (self *DefaultRequest) GetDialTimeout() time.Duration
dial tcp: i/o timeout
func (*DefaultRequest) GetDownloaderID ¶
func (self *DefaultRequest) GetDownloaderID() int
select Surf ro PhomtomJS
func (*DefaultRequest) GetEnableCookie ¶
func (self *DefaultRequest) GetEnableCookie() bool
enable http cookies
func (*DefaultRequest) GetMethod ¶
func (self *DefaultRequest) GetMethod() string
GET POST POST-M HEAD
func (*DefaultRequest) GetProxy ¶
func (self *DefaultRequest) GetProxy() string
the download ProxyHost
func (*DefaultRequest) GetRedirectTimes ¶
func (self *DefaultRequest) GetRedirectTimes() int
max redirect times
func (*DefaultRequest) GetRetryPause ¶
func (self *DefaultRequest) GetRetryPause() time.Duration
the pause time of retry
func (*DefaultRequest) GetTryTimes ¶
func (self *DefaultRequest) GetTryTimes() int
the max times of download
type DnsCache ¶
type DnsCache struct {
// contains filtered or unexported fields
}
DnsCache DNS cache
type Phantom ¶
type Phantom struct {
PhantomjsFile string //Phantomjs完整文件名
TempJsDir string //临时js存放目录
CookieJar *cookiejar.Jar
// contains filtered or unexported fields
}
Phantom 基于Phantomjs的下载器实现,作为surfer的补充 效率较surfer会慢很多,但是因为模拟浏览器,破防性更好 支持UserAgent/TryTimes/RetryPause/自定义js
type Request ¶
type Request interface {
// url
GetUrl() string
// GET POST POST-M HEAD
GetMethod() string
// POST values
GetPostData() string
// http header
GetHeader() http.Header
// enable http cookies
GetEnableCookie() bool
// dial tcp: i/o timeout
GetDialTimeout() time.Duration
// WSARecv tcp: i/o timeout
GetConnTimeout() time.Duration
// the max times of download
GetTryTimes() int
// the pause time of retry
GetRetryPause() time.Duration
// the download ProxyHost
GetProxy() string
// max redirect times
GetRedirectTimes() int
// select Surf ro PhomtomJS
GetDownloaderID() int
}
type Response ¶
type Response struct {
Cookies []string
Body string
Error string
Header []struct {
Name string
Value string
}
}
Response 用于解析Phantomjs的响应内容
type Surfer ¶
type Surfer interface {
// GET @param url string, header http.Header, cookies []*http.Cookie
// HEAD @param url string, header http.Header, cookies []*http.Cookie
// POST PostForm @param url, referer string, values url.Values, header http.Header, cookies []*http.Cookie
// POST-M PostMultipart @param url, referer string, values url.Values, header http.Header, cookies []*http.Cookie
Download(Request) (resp *http.Response, err error)
}
Downloader represents an core of HTTP web browser for crawler.