scraper

command module
v0.0.0-...-f703472 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 24, 2026 License: MIT Imports: 20 Imported by: 0

README

scraper Hero

scraper

scraper is a fast, concurrent Go CLI for downloading images from web pages. It uses a priority queue scheduler and a configurable worker pool to fetch assets efficiently, with automatic retry on transient errors.

Features

  • Concurrent Downloads: Spawns 20 parallel workers backed by a heap-based priority queue that deprioritizes recently-failed links.
  • Recursive Crawling: Optionally follows anchor links within the same domain to discover and download images across an entire site.
  • Retry Logic: Failed fetches are re-queued with backoff up to 3 attempts before giving up.
  • Dry Run Mode: Preview all discovered URLs without writing any files to disk.
  • Unique Filenames: Appends a UUID to each downloaded file to prevent collisions.

Quick Start

# Download all images from a page
scraper --output ./images https://example.com

# Recursively crawl the entire site
scraper --recurse --output ./images https://example.com

# Dry run — print discovered links without downloading
scraper --dryrun https://example.com

Options

Flag Description Default
--output Output directory for downloaded files (required) ""
--recurse Follow anchor links within the same domain false
--dryrun Print links without downloading false

Installation

go install github.com/uwedeportivo/scraper@latest

Build

With Bazel (from source)
bazel build //:scraper
bazel run //:scraper -- --dryrun https://example.com
With Go
go build -o scraper .
./scraper --output ./images https://example.com
Update Bazel deps after changing go.mod
bazel run //:gazelle

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL