Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrTooLarge = errors.New("asset over size cap")
ErrTooLarge reports that an asset exceeds the size cap and was skipped without being saved. It is deliberately a skip, not a download failure: the caller leaves the asset out of the mirror rather than writing a truncated fragment of it, so a 500 MB installer or video never bloats the archive with a corrupt quarter of itself.
Functions ¶
func RewriteCSS ¶
RewriteCSS rewrites every url(...) and @import in a stylesheet so its references point at local files. base is the stylesheet's own URL (so relative references resolve correctly); sink maps each absolute URL to its local path. data: URLs and unparseable references are left untouched.
func RewriteHTML ¶
RewriteHTML walks the parsed document and rewrites every resource and link reference through sink, resolving relative URLs against base. It mutates the tree in place; the caller renders it afterwards. References kage cannot handle (data:, mailto:, fragment-only, …) are left untouched.
Types ¶
type Downloader ¶
type Downloader struct {
Client *http.Client
UserAgent string
MaxBytes int64 // per-asset cap; 0 = unlimited
Retries int // extra attempts for a transient failure (0 = try once)
}
Downloader fetches asset bytes over plain HTTP. It is separate from the Chrome pool: assets are public bytes that rarely need a real browser, so a fast HTTP client keeps the crawl cheap. Failures are returned to the caller, which logs them and moves on — a missing asset degrades a page, it never aborts a clone.
func NewDownloader ¶
func NewDownloader(userAgent string, timeout time.Duration, maxBytes int64) *Downloader
NewDownloader builds a Downloader with a sane client and the given timeout.
type RefSink ¶
RefSink registers a resolved asset/page URL with the cloner and returns the string to write back into the markup or CSS — a relative local path for things kage saves, or the absolute URL for anything it leaves on the live web.
type StatusError ¶ added in v0.1.2
type StatusError struct {
Code int
}
StatusError reports a non-2xx HTTP response. It carries the code so callers can render a clear message ("HTTP 403 Forbidden") and decide whether a retry is worthwhile, without the URL baked in (the caller already has it).
func (*StatusError) Error ¶ added in v0.1.2
func (e *StatusError) Error() string