Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Channels ¶
Channels is a Page channels map where the index is the response code so we can define different behavior for the different resp codes
type Config ¶ added in v0.1.6
type Config struct { StartURL string AllowedDomains []string // Domains to stay within UserAgents []string CrawlDelay time.Duration // Delay between requests to the same domain MaxDepth int // Maximum crawl depth MaxRetries int // Max retries for a failed request RequestTimeout time.Duration QueueIdleTimeout time.Duration ProxyURL string // e.g., "http://user:pass@host:port" RobotsUserAgent string // User agent to use for robots.txt checks ConcurrentRequests int // Number of concurrent fetch workers Channels Channels Headers map[string]string LanguageCode string Filters []func(*Page, *Config) bool MaxIdleConnsPerHost int MaxIdleConns int Proxies []string RequireHeadless bool }
Config holds crawler configuration
type Crawler ¶
type Crawler struct {
// contains filtered or unexported fields
}
Crawler represents the web crawler
func NewCrawler ¶
func NewCrawler(config Config, queue queue.QueueInterface) (*Crawler, error)
NewCrawler initializes a new Crawler
type Headless ¶ added in v0.7.0
type Headless struct { }
func NewHeadless ¶ added in v0.7.0
func NewHeadless() *Headless
Click to show internal directories.
Click to hide internal directories.