Documentation
¶
Overview ¶
Package crawler defines all the functionality for page crawling
Index ¶
- type CrawlResult
- type Crawler
- func (c *Crawler) Crawl() error
- func (c *Crawler) CrawlPage(url string) ([]string, CrawlResult, error)
- func (c *Crawler) FormatRelative(urls map[string]int) (formatedUrls []string)
- func (c *Crawler) GetLinks(doc *goquery.Document) []string
- func (c *Crawler) GetRequest(url string) (*goquery.Document, error)
- func (c *Crawler) GetResult(doc *goquery.Document, url string) CrawlResult
- func (c *Crawler) ParseBase() error
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CrawlResult ¶
CrawlResult defines the result of crawled single page.
type Crawler ¶
type Crawler struct {
ID string `json:"ID"`
BaseURL string `json:"BaseURL"`
StartURL string `json:"StartURL"`
PagesLimit int `json:"PagesLimit"`
Results []CrawlResult `json:"Results"`
}
Crawler defines a default crawler
func (*Crawler) Crawl ¶
Crawl crawls the whole host of give startURL and saves data(URLs and Titles) to Crawler struct.
func (*Crawler) CrawlPage ¶
func (c *Crawler) CrawlPage(url string) ([]string, CrawlResult, error)
CrawlPage crawls single page, returns links as []string, CrawlResult(Page URL and Title) and error.
func (*Crawler) FormatRelative ¶
FormatRelative formats relative links to an absolute links if encounter them during crawling.
func (*Crawler) GetRequest ¶
GetRequest is a helper function for CrawlPage. It makes a request to a page and returns goquery.Document and error.
Click to show internal directories.
Click to hide internal directories.