spider

package
v0.0.0-...-6e90328 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2023 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// NameOfXici 西刺代理 `https://www.xicidaili.com/nn/`
	NameOfXici = "xici"
	// NameOfKuai 快代理 `https://www.kuaidaili.com/ops/`, `https://www.kuaidaili.com/free/`
	NameOfKuai = "kuai"
	// NameOfYun 云代理,质量较高. `http://www.ip3366.net/free/`
	NameOfYun = "yun"
	// NameOfIphai ip海代理,`http://www.iphai.com/free/ng`
	NameOfIphai = "iphai"
	// NameOfXila 西拉免费代理,`http://www.xiladaili.com/`
	NameOfXila = "xila"
	// NameOfNima 泥马代理,量较大,`http://www.nimadaili.com/`
	NameOfNima = "nima"
	// NameOfEightnine 89免费代理,`http://www.89ip.cn/`
	NameOfEightnine = "eightnine"
	// NameOfHappy 开心代理,`http://ip.kxdaili.com/`
	NameOfHappy = "kaixin"
)
View Source
const (
	Idle = iota
	Crawling
	CoolDown
)

Variables

This section is empty.

Functions

func CoolDownTime

func CoolDownTime(d time.Duration) func(*Spider)

CoolDownTime sets the sleep time after crawlOnce, the purpose is to reduce the risk being banned of ip by the website.

func Limit

func Limit(rule *colly.LimitRule) func(*Spider)

Limit sets the rule used by the Collector.

func Period

func Period(d time.Duration) func(*Spider)

Period sets the interval duration, which is used to set the sleep time after each url is crawled.

Types

type Spider

type Spider struct {
	// contains filtered or unexported fields
}

Spider provides the instance for crawling jobs.

func BuildAndInitAll

func BuildAndInitAll() (spiders []*Spider)

BuildAndInitAll returns all of the enable spider.

func NewSpider

func NewSpider(name string, lr *colly.LimitRule) *Spider

NewSpider creates a new Spider with name and default configurations.

func (*Spider) Start

func (s *Spider) Start(ch proxy.CachedChan)

Start calls crawlOnce after sleeping the period duration or when receiving a re-crawl chan.

func (*Spider) TryCrawl

func (s *Spider) TryCrawl()

TryCrawl sends object to needCrawl chan when this spider id IDLE.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL