kage

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 14, 2026 License: MIT

README

kage

kage (影, "shadow") clones a website into a self-contained folder you can browse offline, with all the JavaScript stripped out. It renders every page in headless Chrome, snapshots the final rendered DOM, removes every script and event handler, and downloads the CSS, images, and fonts and rewrites them to local paths. The result looks like the live site but runs no code: a plain folder of .html files you can open straight from disk.

kage clone example.com
kage serve kage-out/example.com

Why

Saving a page with "Save As" gives you a copy that still phones home, still runs analytics, and often renders blank because the markup is built by JavaScript at runtime. kage takes the opposite approach:

  • Render first, save second. Each page goes through real headless Chrome, so a page whose content is assembled by JavaScript is captured the way a human would have seen it, not as an empty shell.
  • Strip every script. Once the DOM is captured, kage removes all <script> tags, every on* event handler, and any javascript: URL. The saved page is inert: no tracking, no network calls, no surprises.
  • Keep the layout. Stylesheets, images, fonts, and media are downloaded and rewritten to relative local paths, so the offline copy looks like the original.
  • Stay browsable. In-scope links are rewritten to point at the other saved pages, so you can click around the mirror exactly as you would the live site.

Install

# Go
go install github.com/tamnd/kage/cmd/kage@latest

# Homebrew (once the tap is published)
brew install tamnd/tap/kage

# Container (Chromium bundled)
docker run -v "$PWD/out:/out" ghcr.io/tamnd/kage clone example.com

Prebuilt archives, .deb/.rpm/.apk packages, and a multi-arch image are attached to each release.

kage drives a real browser, so it needs Chrome or Chromium available. It finds a system install automatically; point it at a specific binary with --chrome or the KAGE_CHROME environment variable. The container image bundles Chromium.

Usage

kage clone <url> [flags]
kage serve [dir] [flags]
Clone
# Clone a whole site into kage-out/<host>/
kage clone https://example.com

# Limit the crawl
kage clone example.com --max-pages 200 --max-depth 3

# Only a section of the site
kage clone example.com --scope-prefix /docs

# Include subdomains, and trigger lazy-loaded images by scrolling
kage clone example.com --subdomains --scroll

# Resume an interrupted run (on by default; Ctrl-C saves state)
kage clone example.com

# Re-render every page in place to pull in changed content
kage clone example.com --refresh

A clone is idempotent: each page is keyed by the file it writes, so the same URL reached over http and https, with or without a trailing slash, is fetched once. Re-running resumes where it left off; --refresh re-renders in place, --force wipes and starts clean.

Common flags:

Flag Default Meaning
-o, --out $HOME/data/kage Output root; the mirror lands in <out>/<host>/
-p, --max-pages 0 Stop after N pages (0 = unlimited)
-d, --max-depth 0 Link-follow depth cap (0 = unlimited)
--scope-prefix Only crawl pages whose path starts with this prefix
--subdomains false Treat subdomains of the seed host as in scope
--exclude Path prefixes to skip (repeatable)
--scroll false Auto-scroll each page to trigger lazy loading
--workers 4 Concurrent page render workers
--no-robots false Ignore robots.txt (be polite)
-f, --force false Delete any existing mirror for the host first
--chrome Path to the Chrome/Chromium binary

Run kage clone --help for the full list.

Serve

kage serve runs a local static file server over a cloned folder so links and assets resolve the way they would on a real host:

kage serve kage-out/example.com
# open http://127.0.0.1:8800

How it works

seed URL ─▶ headless Chrome ─▶ final DOM ─▶ strip JS ─▶ localise assets ─▶ disk
              (render)          (snapshot)   (sanitize)   (rewrite links)

A clone is a polite breadth-first crawl. Pages are rendered by a pool of Chrome tabs; assets are fetched over plain HTTP by a separate worker pool. Every URL maps deterministically to a local path, so links can be rewritten before the asset they point at has even finished downloading. The crawl honours robots.txt and seeds itself from sitemap.xml by default. Output layout:

kage-out/example.com/
├── index.html                 # the home page, scripts stripped
├── about/index.html           # /about
├── _kage/                      # reserved: assets and crawl state
│   ├── example.com/site.css    # localised stylesheet (url() rewritten)
│   ├── example.com/logo.png
│   └── state.json              # visited set, for --resume
└── ...

Building from source

git clone https://github.com/tamnd/kage
cd kage
make build          # -> bin/kage
make test           # full suite, including Chrome-driven end-to-end tests
make test-short     # skip the tests that launch a browser

License

MIT. See LICENSE.

Directories

Path Synopsis
Package browser drives a real headless Chrome through the DevTools Protocol so JavaScript-built pages are captured as they actually render.
Package browser drives a real headless Chrome through the DevTools Protocol so JavaScript-built pages are captured as they actually render.
Package cli wires kage's command surface: the cobra tree, the global flags, and the fang-rendered help and errors.
Package cli wires kage's command surface: the cobra tree, the global flags, and the fang-rendered help and errors.
Package clone is kage's engine: it ties the Chrome pool, the JavaScript stripper, the asset localiser, and the URL↔path mapper into one resumable, polite crawl that turns a live site into a browsable offline folder.
Package clone is kage's engine: it ties the Chrome pool, the JavaScript stripper, the asset localiser, and the URL↔path mapper into one resumable, polite crawl that turns a live site into a browsable offline folder.
cmd
kage command
Command kage clones a website into a self-contained offline folder: it renders every page in headless Chrome, strips all JavaScript, and localises the CSS, images, and fonts so the saved copy looks like the live site but runs no code.
Command kage clones a website into a self-contained offline folder: it renders every page in headless Chrome, strips all JavaScript, and localises the CSS, images, and fonts so the saved copy looks like the live site but runs no code.
Package robots is a small, pure robots.txt parser and matcher.
Package robots is a small, pure robots.txt parser and matcher.
Package sanitize removes every trace of JavaScript from an HTML document so the saved page is inert: a photograph, not a program.
Package sanitize removes every trace of JavaScript from an HTML document so the saved page is inert: a photograph, not a program.
Package urlx is the URL ⇄ filesystem contract at the heart of kage.
Package urlx is the URL ⇄ filesystem contract at the heart of kage.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL