Documentation ¶
Index ¶
- type LinkProcessor
- func (lp *LinkProcessor) CheckURLExists(u *url.URL) (bool, error)
- func (lp *LinkProcessor) Close()
- func (lp *LinkProcessor) GracefulShutdown() <-chan bool
- func (lp *LinkProcessor) MarkURLVisited(u *url.URL)
- func (lp *LinkProcessor) ProcessURL(u *url.URL) error
- func (lp *LinkProcessor) ScrapeLinksFromURL(u *url.URL) ([]*linkstorage.Link, error)
- func (lp *LinkProcessor) SpawnWorkers(n int) chan *url.URL
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type LinkProcessor ¶
type LinkProcessor struct {
// contains filtered or unexported fields
}
LinkProcessor contains all connections necessary for accessing the cache, db and channel for sending urls back to rabbitmq.
func NewLinkProcessor ¶
func NewLinkProcessor( storage *linkstorage.Storage, batchSize int, queue *linkqueue.LinkQueue, numWorkers int, ) (*LinkProcessor, error)
NewLinkProcessor is a helper function for creating the LinkProcessor.
func (*LinkProcessor) CheckURLExists ¶
func (lp *LinkProcessor) CheckURLExists(u *url.URL) (bool, error)
CheckURLExists initially checks the in memory cache forthe url, and returns true if found. If the url is not in the in-memory cache it will check the db, and returns true/update cache if found. If not found in db or cache, then returns false.
func (*LinkProcessor) Close ¶
func (lp *LinkProcessor) Close()
Close immediately kills batching workers.
func (*LinkProcessor) GracefulShutdown ¶
func (lp *LinkProcessor) GracefulShutdown() <-chan bool
GracefulShutdown returns a channel that receives true when it has finished flushing the db batching cache / finished writing to the queue.
func (*LinkProcessor) MarkURLVisited ¶
func (lp *LinkProcessor) MarkURLVisited(u *url.URL)
MarkURLVisited sets the link as visited in cache
func (*LinkProcessor) ProcessURL ¶
func (lp *LinkProcessor) ProcessURL(u *url.URL) error
ProcessURL takes a url and processes it.
func (*LinkProcessor) ScrapeLinksFromURL ¶
func (lp *LinkProcessor) ScrapeLinksFromURL(u *url.URL) ([]*linkstorage.Link, error)
ScrapeLinksFromURL takes a url to scrape, retrieves the page and returns all links found.
func (*LinkProcessor) SpawnWorkers ¶
func (lp *LinkProcessor) SpawnWorkers(n int) chan *url.URL
SpawnWorkers vaguely spawns up n number of workers, that can then be communicated with by pushing urls to the channel.