wbot

package module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2024 License: MIT Imports: 21 Imported by: 1

README

WBot

A configurable, thread-safe web crawler, provides a minimal interface for crawling and downloading web pages.

Features

  • Clean minimal API.
  • Configurable: MaxDepth, MaxBodySize, Rate Limit, Parrallelism, User Agent & Proxy rotation.
  • Memory-efficient, thread-safe.
  • Provides built-in interface: Fetcher, Store, Queue & a Logger.

API

WBot provides a minimal API for crawling web pages.

 Run(links ...string) error
 OnReponse(fn func(*wbot.Response))
 Metrics() map[string]int64
 Shutdown()

Usage

package main

import (
 "fmt"
 "log"

 "github.com/rs/zerolog"

 "github.com/twiny/wbot"
 "github.com/twiny/wbot/pkg/api"
)

func main() {
 bot := wbot.New(
  wbot.WithParallel(50),
  wbot.WithMaxDepth(5),
  wbot.WithRateLimit(&api.RateLimit{
   Hostname: "*",
   Rate:     "10/1s",
  }),
  wbot.WithLogLevel(zerolog.DebugLevel),
 )
 defer bot.Shutdown()

 // read responses
 bot.OnReponse(func(resp *api.Response) {
  fmt.Printf("crawled: %s\n", resp.URL.String())
 })

 if err := bot.Run(
  "https://go.dev/",
 ); err != nil {
  log.Fatal(err)
 }

 log.Printf("finished crawling\n")
}
Wiki

More documentation can be found in the wiki.

Bugs

Bugs or suggestions? Please visit the issue tracker.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler added in v0.2.2

type Crawler struct {
	// contains filtered or unexported fields
}

func New added in v0.2.2

func New(opts ...Option) *Crawler

func (*Crawler) Metrics added in v0.2.2

func (c *Crawler) Metrics() map[string]int64

func (*Crawler) OnReponse added in v0.2.2

func (c *Crawler) OnReponse(fn func(*api.Response))

func (*Crawler) Run added in v0.2.2

func (c *Crawler) Run(links ...string) error

func (*Crawler) Shutdown added in v0.2.2

func (c *Crawler) Shutdown()

type Option

type Option func(c *Crawler)

func WithFetcher added in v0.2.2

func WithFetcher(fetcher api.Fetcher) Option

func WithFilter added in v0.2.2

func WithFilter(rules ...*api.FilterRule) Option

func WithLogLevel added in v0.2.2

func WithLogLevel(level zerolog.Level) Option

func WithMaxDepth added in v0.2.2

func WithMaxDepth(maxDepth int32) Option

func WithParallel added in v0.2.2

func WithParallel(parallel int) Option

func WithProxies added in v0.2.2

func WithProxies(proxies []string) Option

func WithQueue added in v0.2.2

func WithQueue(queue api.Queue) Option

func WithRateLimit added in v0.2.2

func WithRateLimit(rates ...*api.RateLimit) Option

func WithStore added in v0.2.2

func WithStore(store api.Store) Option

func WithUserAgents added in v0.2.2

func WithUserAgents(userAgents []string) Option

Directories

Path Synopsis
pkg
api

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL