πΎ wayback-dl
A fast, self-contained command-line tool for downloading archived websites from
the Wayback Machine.
Go adaptation of wayback-machine-downloader.
Install
Download release or build from source:
git clone ...
cd wayback-dl
make build
Requires Go 1.24+.
Usage
wayback-dl [url] [options]
Arguments:
url Domain or URL to archive (same as -url)
Options:
-url string Domain or URL to archive
-from string Start timestamp YYYYMMDDhhmmss (default: none)
-to string End timestamp YYYYMMDDhhmmss (default: none)
-threads int Concurrent download threads (default: 3)
-directory string Output directory (default: websites/<host>/)
-rewrite-links Rewrite page links to relative paths
-pretty-path Map extension-less URLs to dir/index.html (default: preserve original path)
-canonical string Canonical tag handling: keep|remove (default: keep)
-exact-url Download only the exact URL, no wildcard /*
-external-assets Also download off-site (external) assets
-stop-on-error Stop immediately on first download error (default: continue)
-cdx-rate int CDX API requests per minute (default: 60)
-cdx-retries int Max retries on CDX throttle or 5xx (default: 5)
-debug Enable verbose debug logging
-version Print version and exit
-h / -help Show this help and exit
Examples
# Download all snapshots of a site
wayback-dl example.com
# Limit to a date range with 8 threads
wayback-dl example.com -from 20200101000000 -to 20201231235959 -threads 8
# Rewrite links for offline browsing, remove canonical tags
wayback-dl example.com -rewrite-links -canonical remove -directory ./out
# Exact URL only (no wildcard crawl)
wayback-dl https://example.com/blog/ -exact-url
# Debug output
wayback-dl example.com -debug
How it works
- Queries the CDX API
for all snapshots of the target URL (wildcarded by default).
- Deduplicates snapshots by URL path, keeping the most recent timestamp for each.
- Downloads each snapshot concurrently using Wayback's raw-content (
id_) endpoint.
- Optionally rewrites HTML/CSS links to relative paths for offline browsing.
Output structure
Files are saved under websites/<host>/ mirroring the original URL path:
websites/
βββ example.com/
βββ index.html
βββ about/
β βββ index.html
βββ assets/
βββ style.css
Dependencies
| Package |
Purpose |
golang.org/x/net/html |
HTML parsing for link rewriting |
Everything else uses the Go standard library.
Testing
# Build + smoke test
make build
./wayback-dl example.com -from 20200101 -to 20200201 -threads 2
Development
# Install tooling (one-time)
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
go install github.com/goreleaser/goreleaser/v2@latest
# Build with version info
make build
# Run tests
make test
# Run linter
make lint
# Activate pre-commit hook (per clone)
git config core.hooksPath .githooks
Release
Releases are automated via goreleaser and GitHub Actions.
Push a semver tag to trigger a release:
git tag v0.2.0
git push origin v0.2.0
The CI workflow (ci.yml) runs on every push to main/master and on pull
requests. The release workflow (release.yml) triggers on v* tags and publishes
cross-compiled binaries for Linux, macOS, and Windows.