crawler

package
v0.0.0-...-02f51c9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 31, 2023 License: GPL-3.0 Imports: 16 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	PrivateNetworkDetector PrivateNetworkDetector
	URLGetter              URLGetter
	Graph                  MiniGraph
	Indexer                MiniIndexer
	NumOfFetchWorkers      int
}

Config serves as a configuration object for the crawler.

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

Crawler executes a web crawler pipeline.

func New

func New(config Config) *Crawler

New configures and returns pointer to a fully configured crawler type.

func (*Crawler) Crawl

func (c *Crawler) Crawl(
	ctx context.Context, linkIt graph.LinkIterator,
) (int, error)

Crawl executes the pipeline. calls to crawl block until the pipeline execution is complete.

type MiniGraph

type MiniGraph interface {
	// UpsertLink creates a new or updates an existing link.
	UpsertLink(link *graph.Link) error

	// UpsertEdge creates a new or updates an existing edge.
	UpsertEdge(edge *graph.Edge) error

	// RemoveStaleEdges removes any edge that originates from a specific link ID
	// and was updated before the specified [updatedBefore] time.
	RemoveStaleEdges(fromID uuid.UUID, updatedBefore time.Time) error
}

MiniGraph should be implemented by objects that can upsert links and edges into a link graph instance. ie [graph updater objects].

type MiniIndexer

type MiniIndexer interface {
	// Index adds a new document or updates an existing index entry
	// in case of an existing document.
	Index(doc *index.Document) error
}

MiniIndexer should be implemented by objects that can index documents discovered by the crawler component. ie [text indexer objects].

type PrivateNetworkDetector

type PrivateNetworkDetector interface {
	IsNetworkPrivate(address string) (bool, error)
}

PrivateNetworkDetector should be implemented by objects that can detect whether a host resolves to a private network address.

type URLGetter

type URLGetter interface {
	Get(url string) (*http.Response, error)
}

URLGetter should be implemented by objects that perform HTTP GET requests to fetch link data.

Directories

Path Synopsis
Package mock_crawler is a generated GoMock package.
Package mock_crawler is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL