sites

package
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 30, 2018 License: MIT Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func SiteChan

func SiteChan(inp ...Site) (out <-chan Site)

SiteChan returns a channel to receive all inputs before close.

func SiteChanFuncErr

func SiteChanFuncErr(gen func() (Site, error)) (out <-chan Site)

SiteChanFuncErr returns a channel to receive all results of generator `gen` until `err != nil` before close.

func SiteChanFuncNok

func SiteChanFuncNok(gen func() (Site, bool)) (out <-chan Site)

SiteChanFuncNok returns a channel to receive all results of generator `gen` until `!ok` before close.

func SiteChanSlice

func SiteChanSlice(inp ...[]Site) (out <-chan Site)

SiteChanSlice returns a channel to receive all inputs before close.

func SiteDone

func SiteDone(inp <-chan Site) (done <-chan struct{})

SiteDone returns a channel to receive one signal upon close and after `inp` has been drained.

func SiteDoneFunc

func SiteDoneFunc(inp <-chan Site, act func(a Site)) (done <-chan struct{})

SiteDoneFunc will apply `act` to every `inp` and returns a channel to receive one signal upon close.

func SiteDoneLeave added in v0.1.3

func SiteDoneLeave(inp <-chan Site, wg SiteWaiter) (done <-chan struct{})

SiteDoneLeave returns a channel to receive one signal after all throughput on `inp` has been registered as departure on the given `sync.WaitGroup` before close.

func SiteDoneSlice

func SiteDoneSlice(inp <-chan Site) (done <-chan []Site)

SiteDoneSlice returns a channel to receive a slice with every Site received on `inp` upon close.

Note: Unlike SiteDone, SiteDoneSlice sends the fully accumulated slice, not just an event, once upon close of inp.

func SiteDoneWait

func SiteDoneWait(out chan<- Site, wg SiteWaiter) (done <-chan struct{})

SiteDoneWait returns a channel to receive one signal after wg.Wait() has returned and out has been closed before close.

Note: Use only *after* You've started flooding the facilities.

func SiteFanIn2

func SiteFanIn2(inp1, inp2 <-chan Site) (out <-chan Site)

SiteFanIn2 returns a channel to receive all from both `inp1` and `inp2` before close.

func SiteFini

func SiteFini() func(inp <-chan Site) (done <-chan struct{})

SiteFini returns a closure around `SiteDone(_)`.

func SiteFiniFunc

func SiteFiniFunc(act func(a Site)) func(inp <-chan Site) (done <-chan struct{})

SiteFiniFunc returns a closure around `SiteDoneFunc(_, act)`.

func SiteFiniLeave added in v0.1.3

func SiteFiniLeave(wg SiteWaiter) func(inp <-chan Site) (done <-chan struct{})

SiteFiniLeave returns a closure around `SiteDoneLeave(_, wg)` registering throughput as departure on the given `sync.WaitGroup`.

func SiteFiniSlice

func SiteFiniSlice() func(inp <-chan Site) (done <-chan []Site)

SiteFiniSlice returns a closure around `SiteDoneSlice(_)`.

func SiteFiniWait

func SiteFiniWait(wg SiteWaiter) func(out chan<- Site) (done <-chan struct{})

SiteFiniWait returns a closure around `SiteDoneWait(_, wg)`.

func SiteFork

func SiteFork(inp <-chan Site) (out1, out2 <-chan Site)

SiteFork returns two channels either of which is to receive every result of inp before close.

func SiteForkSeen

func SiteForkSeen(inp <-chan Site) (new, old <-chan Site)

SiteForkSeen returns two channels, `new` and `old`, where `new` is to receive all `inp` not been seen before and `old` all `inp` seen before (internally growing a `sync.Map` to discriminate) until close.

func SiteForkSeenAttr

func SiteForkSeenAttr(inp <-chan Site, attr func(a Site) interface{}) (new, old <-chan Site)

SiteForkSeenAttr returns two channels, `new` and `old`, where `new` is to receive all `inp` whose attribute `attr` has not been seen before and `old` all `inp` seen before (internally growing a `sync.Map` to discriminate) until close.

func SiteMakeChan

func SiteMakeChan() (out chan Site)

SiteMakeChan returns a new open channel (simply a 'chan Site' that is). Note: No 'Site-producer' is launched here yet! (as is in all the other functions).

This is useful to easily create corresponding variables such as:

var mySitePipelineStartsHere := SiteMakeChan() // ... lot's of code to design and build Your favourite "mySiteWorkflowPipeline"

// ...
// ... *before* You start pouring data into it, e.g. simply via:
for drop := range water {

mySitePipelineStartsHere <- drop

}

close(mySitePipelineStartsHere)

Hint: especially helpful, if Your piping library operates on some hidden (non-exported) type
(or on a type imported from elsewhere - and You don't want/need or should(!) have to care.)

Note: as always (except for SitePipeBuffer) the channel is unbuffered.

func SitePair

func SitePair(inp <-chan Site) (out1, out2 <-chan Site)

SitePair returns a pair of channels to receive every result of inp before close.

Note: Yes, it is a VERY simple fanout - but sometimes all You need.

func SitePipeAdjust

func SitePipeAdjust(inp <-chan Site, sizes ...int) (out <-chan Site)

SitePipeAdjust returns a channel to receive all `inp` buffered by a SiteSendProxy process before close.

func SitePipeEnter

func SitePipeEnter(inp <-chan Site, wg SiteWaiter) (out <-chan Site)

SitePipeEnter returns a channel to receive all `inp` and registers throughput as arrival on the given `sync.WaitGroup` until close.

func SitePipeFunc

func SitePipeFunc(inp <-chan Site, act func(a Site) Site) (out <-chan Site)

SitePipeFunc returns a channel to receive every result of action `act` applied to `inp` before close. Note: it 'could' be SitePipeMap for functional people, but 'map' has a very different meaning in go lang.

func SitePipeLeave

func SitePipeLeave(inp <-chan Site, wg SiteWaiter) (out <-chan Site)

SitePipeLeave returns a channel to receive all `inp` and registers throughput as departure on the given `sync.WaitGroup` until close.

func SitePipeSeen

func SitePipeSeen(inp <-chan Site) (out <-chan Site)

SitePipeSeen returns a channel to receive all `inp` not been seen before while silently dropping everything seen before (internally growing a `sync.Map` to discriminate) until close. Note: SitePipeFilterNotSeenYet might be a better name, but is fairly long.

func SitePipeSeenAttr

func SitePipeSeenAttr(inp <-chan Site, attr func(a Site) interface{}) (out <-chan Site)

SitePipeSeenAttr returns a channel to receive all `inp` whose attribute `attr` has not been seen before while silently dropping everything seen before (internally growing a `sync.Map` to discriminate) until close. Note: SitePipeFilterAttrNotSeenYet might be a better name, but is fairly long.

func SiteSendProxy

func SiteSendProxy(out chan<- Site, sizes ...int) chan<- Site

SiteSendProxy returns a channel to serve as a sending proxy to 'out'. Uses a goroutine to receive values from 'out' and store them in an expanding buffer, so that sending to 'out' never blocks.

Note: the expanding buffer is implemented via "container/ring"

Note: SiteSendProxy is kept for the Sieve example and other dynamic use to be discovered even so it does not fit the pipe tube pattern as SitePipeAdjust does.

func SiteStrew

func SiteStrew(inp <-chan Site, size int) (outS [](<-chan Site))

SiteStrew returns a slice (of size = size) of channels one of which shall receive each inp before close.

func SiteTubeAdjust

func SiteTubeAdjust(sizes ...int) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeAdjust returns a closure around SitePipeAdjust (_, sizes ...int).

func SiteTubeEnter

func SiteTubeEnter(wg SiteWaiter) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeEnter returns a closure around SitePipeEnter (_, wg) registering throughput as arrival on the given `sync.WaitGroup`.

func SiteTubeFunc

func SiteTubeFunc(act func(a Site) Site) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeFunc returns a closure around PipeSiteFunc (_, act).

func SiteTubeLeave

func SiteTubeLeave(wg SiteWaiter) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeLeave returns a closure around SitePipeLeave (_, wg) registering throughput as departure on the given `sync.WaitGroup`.

func SiteTubeSeen

func SiteTubeSeen() (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeSeen returns a closure around SitePipeSeen() (silently dropping every Site seen before).

func SiteTubeSeenAttr

func SiteTubeSeenAttr(attr func(a Site) interface{}) (tube func(inp <-chan Site) (out <-chan Site))

SiteTubeSeenAttr returns a closure around SitePipeSeenAttr() (silently dropping every Site whose attribute `attr` was seen before).

Types

type Site

type Site struct {
	URL    *url.URL
	Parent *url.URL
	Depth  int
}

Site represents what travels: an URL which may have a Parent URL, and a Depth.

func (Site) Attr

func (s Site) Attr() interface{}

Attr implements the attribute relevant for ForkSiteSeenAttr, the "I've seen this site before" discriminator.

func (Site) Print

func (s Site) Print() Site

print may be used via e.g. PipeSiteFunc(sites, site.print) for tracing

type SiteWaiter

type SiteWaiter interface {
	Add(delta int)
	Done()
	Wait()
}

SiteWaiter - as implemented by `*sync.WaitGroup` - attends Flapdoors and keeps counting who enters and who leaves.

Use SiteDoneWait to learn about when the facilities are closed.

Note: You may also use Your provided `*sync.WaitGroup.Wait()` to know when to close the facilities. Just: SiteDoneWait is more convenient as it also closes the primary channel for You.

Just make sure to have _all_ entrances and exits attended, and `Wait()` only *after* You've started flooding the facilities.

type Traffic

type Traffic struct {
	Travel          chan Site // to be processed
	*sync.WaitGroup           // monitor SiteEnter & SiteLeave
}

Traffic as it goes around inside a circular site pipe network, e. g. a crawling Crawler. Composed of Travel, a channel for those who travel in the traffic, and an embedded *sync.WaitGroup to keep track of congestion.

func (*Traffic) Feed

func (t *Traffic) Feed(urls []*url.URL, parent *url.URL, depth int)

Feed registers new entries and launches their dispatcher (which we intentionally left untouched).

func (*Traffic) Processor

func (t *Traffic) Processor(crawl func(s Site), parallel int)

Processor builds the site traffic processing network; it is cirular if crawl uses Feed to provide feedback.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL