spider

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 6, 2021 License: MIT Imports: 11 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type HandleRequestFunc

type HandleRequestFunc func(*model.Request)

type RunFunc

type RunFunc func(*Spider) error

type Spider

type Spider struct {
	PageCount int64 //爬的页面数量

	OnSuccess     HandleRequestFunc //请求成功
	OnFailed      HandleRequestFunc //请求失败
	RunBeforeFunc RunFunc
	RunAfterFunc  RunFunc
	// contains filtered or unexported fields
}

Spider 爬虫 整合下载器、处理器、调度器以及持久化模块

func Create

func Create(processor processor.PageProcessor) *Spider

Create 创建爬虫

func (*Spider) Close

func (s *Spider) Close()

Close 关闭爬虫

func (*Spider) GetCloseAfterNotHandleRequest

func (s *Spider) GetCloseAfterNotHandleRequest() []model.Request

GetNotHandleRequest 获取所有未处理的请求

func (*Spider) SetDownloader

func (s *Spider) SetDownloader(downloader downloader.Downloader) *Spider

SetDownloader 设置下载器

func (*Spider) SetExitWhenComplete

func (s *Spider) SetExitWhenComplete(exitWhenComplete bool) *Spider

SetExitWhenComplete 当下载任务完成后是否退出程序

func (*Spider) SetIdleTimeout

func (s *Spider) SetIdleTimeout(duration time.Duration) *Spider

SetIdleTimeout 设置空闲退出时间

func (*Spider) SetScheduler

func (s *Spider) SetScheduler(scheduler scheduler.Scheduler) *Spider

SetScheduler 设置调度器

func (*Spider) SetStorage

func (s *Spider) SetStorage(stg persist.Storage) *Spider

SetStorage 设置数据持久化

func (*Spider) Start

func (s *Spider) Start()

Start 开始程序

func (*Spider) StartRequest

func (s *Spider) StartRequest(startRequest ...model.Request) *Spider

StartRequest 设置种子

func (*Spider) StartUrls

func (s *Spider) StartUrls(startUrls ...string) *Spider

StartUrls 设置种子

func (*Spider) ThreadNum

func (s *Spider) ThreadNum(num int) *Spider

ThreadNum 设置并发数量

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL