bolt

package module
v0.0.0-...-579422c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 6, 2020 License: MIT Imports: 6 Imported by: 0

README

BoltDB storage for Colly

A BoltDB storage back-end for the Colly web crawling/scraping framework.

It implements both the storage and queue interfaces.

Example Usage:

package main

import (
	"github.com/gocolly/colly/v2"
	"github.com/gocolly/colly/v2/debug"
	"github.com/gocolly/colly/v2/extensions"
	"github.com/gocolly/colly/v2/proxy"
	"github.com/gocolly/colly/v2/queue"
	bolt "src.userspace.com.au/colly-bolt-storage"
)

func main() {
	c := colly.NewCollector(
		colly.AllowedDomains("www.example.com"),
	)

	storage, err := bolt.New("./state.bdb")
	if err != nil {
		panic(err)

	}
	defer storage.Close()

	err := c.SetStorage(storage)
	if err != nil {
		panic(err)
	}

	// ...
}

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (

	// ErrEmptyQueue is returned when an URL is requested from an empty queue.
	ErrEmptyQueue = fmt.Errorf("queue is empty")
)

Functions

This section is empty.

Types

type Logger

type Logger func(...interface{})

Logger is the interface used for debug logging.

type Option

type Option func(*Storage) error

Option enables configuration of the storage.

func Debug

func Debug(l Logger) Option

Debug sets a Logger for the storage.

func Mode

func Mode(m os.FileMode) Option

Mode determines the file creation mode. It defaults to 0666.

func NoHistory

func NoHistory(def bool) Option

NoHistory configures the storage to not store history. The bool parameter is default response when Colly asks "isVisited". Use this for cases where you track the visited state externally.

func Timeout

func Timeout(t time.Duration) Option

Timeout sets the underlying BoltDB timeout.

func Unique

func Unique() Option

Unique ensures unique entries.

type Storage

type Storage struct {
	// contains filtered or unexported fields
}

Storage is a implementation for colly/queue and colly/storage

func New

func New(path string, opts ...Option) (*Storage, error)

New creates a new storage implementation for Colly. A database will be created at the provided path if it does not already exist.

func (*Storage) AddRequest

func (s *Storage) AddRequest(request []byte) error

AddRequest implements the colly.Storage interface.

func (*Storage) Close

func (s *Storage) Close() error

Close ensures the database is left in a valid state.

func (*Storage) Cookies

func (s *Storage) Cookies(u *url.URL) string

Cookies implements the colly.Storage interface.

func (*Storage) GetRequest

func (s *Storage) GetRequest() ([]byte, error)

GetRequest implements the colly.Storage interface.

func (*Storage) Init

func (s *Storage) Init() error

Init implements the colly.Storage interface.

func (*Storage) IsVisited

func (s *Storage) IsVisited(id uint64) (bool, error)

IsVisited implements the colly.Storage interface.

func (*Storage) QueueSize

func (s *Storage) QueueSize() (int, error)

QueueSize implements the colly.Queue interface.

func (*Storage) SetCookies

func (s *Storage) SetCookies(u *url.URL, cookies string)

SetCookies implements the colly.Storage interface.

func (*Storage) Visited

func (s *Storage) Visited(id uint64) error

Visited implements the colly.Storage interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL