fetch.d

command

v0.0.0-...-d33463d Latest Latest Go to latest Published: Jun 12, 2020 License: BSD-3-Clause Imports: 10 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/slotix/dataflowkit

Links

Open Source Insights

Documentation ¶

Overview ¶

Fetcher service of the Dataflow kit downloads html content from web pages to feed Dataflow kit scrapers.

Currently two fetcher types are available : Headless Chrome Fetcher and Base Fetcher.

Base fetcher is used for html web page download with Go standard Http library.

Chrome Fetcher connects to Headless Chrome which processes JavaScript pages and returns rendered content.

Accessing Fetcher endpoints ¶

Examples

fetch a web page using Chrome Fetcher
curl -XPOST  localhost:8000/fetch -d '{"type":"chrome", "url":"http://example.com","formData":"auth_key=880ea6a14ea49e853634fbdc5015a024&referer=http%3A%2F%2Fexample.com%2F&ips_username=user&ips_password=userpassword&rememberMe=1"}'

Set type to either "chrome" or "base" value. formData is a string value for passing form data parameters. For example it may be used for processing pages which require authentication. "auth_key=880ea6a14ea49e853634fbdc5015a024&referer=http%3A%2F%2Fexample.com%2F&ips_username=user&ips_password=userpassword&rememberMe=1"

fetch a web page with base fetcher. For base fetcher type parameter may be omitted.
curl -XPOST  localhost:8000/fetch -d '{"url":"http://example.com"}'

Flags and configuration settings ¶

General settings

DFK_FETCH: HTTP listen address of Fetch service (defaults to "127.0.0.1:8000")
CHROME: Headless Chrome address. It is used for fetching JS driven web pages (defaults to http://127.0.0.1:9222)
PROXY: Proxy address http://username:password@proxy-host:port . (defaults to "")

Storage settings

STORAGE_TYPE: Storage type may be Diskv or Cassandra. (defaults to "Diskv")
Storage stores auxiliary information generated by fetcher.
DISKV_BASE_DIR: diskv base directory for Diskv Storage type (defaults to "diskv").
Find more information about Diskv storage at https://github.com/peterbourgon/diskv

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL