httpsyet

package
v0.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 23, 2018 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package httpsyet provides the configuration and execution for crawling a list of sites for links that can be updated to HTTPS.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	Sites    []string                             // At least one URL.
	Out      io.Writer                            // Required. Writes one detected site per line.
	Log      *log.Logger                          // Required. Errors are reported here.
	Depth    int                                  // Optional. Limit depth. Set to >= 1.
	Parallel int                                  // Optional. Set how many sites to crawl in parallel.
	Delay    time.Duration                        // Optional. Set delay between crawls.
	Get      func(string) (*http.Response, error) // Optional. Defaults to http.Get.
	Verbose  bool                                 // Optional. If set, status updates are written to logger.
}

Crawler is used as configuration for Run. Is validated in Run().

func (Crawler) Run

func (c Crawler) Run() error

Run the crawler. Can return validation errors. All crawling errors are reported via logger. Output is written to writer. Crawls sites recursively and reports all external links that can be changed to HTTPS. Also reports broken links via error logger.

type Site

type Site struct {
	URL    *url.URL
	Parent *url.URL
	Depth  int
}

Site represents what travels: an URL which may have a Parent URL, and a Depth.

func (Site) Attr

func (s Site) Attr() interface{}

Attr implements the attribute relevant for ForkSiteSeenAttr, the "I've seen this site before" discriminator.

func (Site) Print

func (s Site) Print() Site

print may be used via e.g. PipeSiteFunc(sites, site.print) for tracing

type Traffic

type Traffic struct {
	Travel          chan site // to be processed
	*sync.WaitGroup           // monitor SiteEnter & SiteLeave
}

Traffic as it goes around inside a circular site pipe network, e. g. a crawling Crawler. Composed of Travel, a channel for those who travel in the traffic, and an embedded *sync.WaitGroup to keep track of congestion.

func (*Traffic) Feed

func (t *Traffic) Feed(urls []*url.URL, parent *url.URL, depth int)

Feed registers new entries and launches their dispatcher (which we intentionally left untouched).

func (*Traffic) Processor

func (t *Traffic) Processor(crawl func(s site), parallel int)

Processor builds the site traffic processing network; it is cirular if crawl uses Feed to provide feedback.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL