Documentation

Overview

    craw master module

    Index

    Constants

    This section is empty.

    Variables

    This section is empty.

    Functions

    This section is empty.

    Types

    type Spider

    type Spider struct {
    	// contains filtered or unexported fields
    }

    func NewSpider

    func NewSpider(pageinst page_processer.PageProcesser, taskname string) *Spider

      Spider is scheduler module for all the other modules, like downloader, pipeline, scheduler and etc. The taskname could be empty string too, or it can be used in Pipeline for record the result crawled by which task;

      func (*Spider) AddPipeline

      func (this *Spider) AddPipeline(p pipeline.Pipeline) *Spider

      func (*Spider) AddRequest

      func (this *Spider) AddRequest(req *request.Request) *Spider

        add Request to Schedule

        func (*Spider) AddRequests

        func (this *Spider) AddRequests(reqs []*request.Request) *Spider

        func (*Spider) AddUrl

        func (this *Spider) AddUrl(url string, respType string) *Spider

        func (*Spider) AddUrlEx

        func (this *Spider) AddUrlEx(url string, respType string, headerFile string, proxyHost string) *Spider

        func (*Spider) AddUrlWithHeaderFile

        func (this *Spider) AddUrlWithHeaderFile(url string, respType string, headerFile string) *Spider

        func (*Spider) AddUrls

        func (this *Spider) AddUrls(urls []string, respType string) *Spider

        func (*Spider) AddUrlsEx

        func (this *Spider) AddUrlsEx(urls []string, respType string, headerFile string, proxyHost string) *Spider

        func (*Spider) AddUrlsWithHeaderFile

        func (this *Spider) AddUrlsWithHeaderFile(urls []string, respType string, headerFile string) *Spider

        func (*Spider) CloseFileLog

        func (this *Spider) CloseFileLog() *Spider

          The CloseFileLog close file log.

          func (*Spider) CloseStrace

          func (this *Spider) CloseStrace() *Spider

            The CloseStrace close strace.

            func (*Spider) Get

            func (this *Spider) Get(url string, respType string) *page_items.PageItems

              Deal with one url and return the PageItems.

              func (*Spider) GetAll

              func (this *Spider) GetAll(urls []string, respType string) []*page_items.PageItems

                Deal with several urls and return the PageItems slice.

                func (*Spider) GetAllByRequest

                func (this *Spider) GetAllByRequest(reqs []*request.Request) []*page_items.PageItems

                  Deal with several urls and return the PageItems slice

                  func (*Spider) GetByRequest

                  func (this *Spider) GetByRequest(req *request.Request) *page_items.PageItems

                    Deal with one url and return the PageItems with other setting.

                    func (*Spider) GetDownloader

                    func (this *Spider) GetDownloader() downloader.Downloader

                    func (*Spider) GetExitWhenComplete

                    func (this *Spider) GetExitWhenComplete() bool

                    func (*Spider) GetScheduler

                    func (this *Spider) GetScheduler() scheduler.Scheduler

                    func (*Spider) GetThreadnum

                    func (this *Spider) GetThreadnum() uint

                    func (*Spider) OpenFileLog

                    func (this *Spider) OpenFileLog(filePath string) *Spider

                      The OpenFileLog initialize the log path and open log. If log is opened, error info or other useful info in spider will be logged in file of the filepath. Log command is mlog.LogInst().LogError("info") or mlog.LogInst().LogInfo("info"). Spider's default log is closed. The filepath is absolute path.

                      func (*Spider) OpenFileLogDefault

                      func (this *Spider) OpenFileLogDefault() *Spider

                        OpenFileLogDefault open file log with default file path like "WD/log/log.2014-9-1".

                        func (*Spider) OpenStrace

                        func (this *Spider) OpenStrace() *Spider

                          The OpenStrace open strace that output progress info on the screen. Spider's default strace is opened.

                          func (*Spider) Run

                          func (this *Spider) Run()

                          func (*Spider) SetDownloader

                          func (this *Spider) SetDownloader(d downloader.Downloader) *Spider

                          func (*Spider) SetExitWhenComplete

                          func (this *Spider) SetExitWhenComplete(e bool) *Spider

                            If exit when each crawl task is done. If you want to keep spider in memory all the time and add url from outside, you can set it true.

                            func (*Spider) SetScheduler

                            func (this *Spider) SetScheduler(s scheduler.Scheduler) *Spider

                            func (*Spider) SetSleepTime

                            func (this *Spider) SetSleepTime(sleeptype string, s uint, e uint) *Spider

                              The SetSleepTime set sleep time after each crawl task. The unit is millisecond. If sleeptype is "fixed", the s is the sleep time and e is useless. If sleeptype is "rand", the sleep time is rand between s and e.

                              func (*Spider) SetThreadnum

                              func (this *Spider) SetThreadnum(i uint) *Spider

                              func (*Spider) Taskname

                              func (this *Spider) Taskname() string

                              Source Files