pulse

command module
v0.0.0-...-3db1f35 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2019 License: MIT Imports: 13 Imported by: 0

README

Pulse

Pulse is a crawler build on top of gocolly/colly

Features:

  • Expose all golly/colly options to a yml configuration
  • Create rule(s) that export crawling data to MongoDB
Installation

Go modules must be enabled

$ go build

Usage

$ pulse [-q][--no-logging] [-c configFile] [url entrypoint]

$ pulse -c conf.yml https://www.example.com

Configuration example

see default.yml

Grab HTML data

This rule below will add to mongodb collection "images" the value of src attribute for all tag img. The context-attr is also added as images metadata.

collection: "images"
tag: "img"
attr: "src"
context-attr: "alt"

You can also grab html attributes with a selector instead of tag.

collection: "images-test"
selector: "img[data-src]"
attr: "data-src"
context-attr: "alt"

More infos about selector here: PuerkitoBio/goquery

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL