scraper

command module

v0.0.0-...-f703472 Latest Latest Go to latest Published: Mar 24, 2026 License: MIT Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/uwedeportivo/scraper

Links

Open Source Insights

README ¶

scraper Hero

scraper

scraper is a fast, concurrent Go CLI for downloading images from web pages. It uses a priority queue scheduler and a configurable worker pool to fetch assets efficiently, with automatic retry on transient errors.

Features

Concurrent Downloads: Spawns 20 parallel workers backed by a heap-based priority queue that deprioritizes recently-failed links.
Recursive Crawling: Optionally follows anchor links within the same domain to discover and download images across an entire site.
Retry Logic: Failed fetches are re-queued with backoff up to 3 attempts before giving up.
Dry Run Mode: Preview all discovered URLs without writing any files to disk.
Unique Filenames: Appends a UUID to each downloaded file to prevent collisions.

Quick Start

# Download all images from a page
scraper --output ./images https://example.com

# Recursively crawl the entire site
scraper --recurse --output ./images https://example.com

# Dry run — print discovered links without downloading
scraper --dryrun https://example.com

Options

Flag	Description	Default
`--output`	Output directory for downloaded files (required)	`""`
`--recurse`	Follow anchor links within the same domain	`false`
`--dryrun`	Print links without downloading	`false`

Installation

go install github.com/uwedeportivo/scraper@latest

Build

With Bazel (from source)

bazel build //:scraper
bazel run //:scraper -- --dryrun https://example.com

With Go

go build -o scraper .
./scraper --output ./images https://example.com

Update Bazel deps after changing go.mod

bazel run //:gazelle

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL