crawler

package
v0.0.0-...-b28cec7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 14, 2020 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Crawl

func Crawl(
	endpoint string,
	approximateMaxNodes int32,
	parallelism int,
	msDelay int,
	isValidCrawlLink IsValidCrawlLinkFunction,
	addEdgesIfDoNotExist AddEdgeFunction,
	filterPage FilterPageFunction,
)

crawls a domain and saves relatives links to a db

func Run

func Run(
	endpoint string,
	isValidCrawlLink IsValidCrawlLinkFunction,
	connectToDB ConnectToDBFunction,
	addEdgesIfDoNotExist AddEdgeFunction,
	getNewNode GetNewNodeFunction,
	filterPage FilterPageFunction,
)

crawls until approximateMaxNodes nodes is reached

func ServeMetrics

func ServeMetrics()

resgisters and serves metrics to HTTP

func UpdateMetrics

func UpdateMetrics(numberOfNodesAdded int, currDepth int)

updates prometheus and internal metrics

Types

type AddEdgeFunction

type AddEdgeFunction func(string, []string) ([]string, error)

add edge to graph in DB return 'true' if edge already exists

type ConnectToDBFunction

type ConnectToDBFunction func() error

establishes initial connection to DB

type FilterPageFunction

type FilterPageFunction func(e *colly.HTMLElement) (*colly.HTMLElement, error)

filters page down to more specific element

type GetNewNodeFunction

type GetNewNodeFunction func() (string, error)

retrieves new node if current expires

type IsValidCrawlLinkFunction

type IsValidCrawlLinkFunction func(string) bool

check if valid url string for crawling

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL