Documentation
¶
Overview ¶
Package sanitize removes every trace of JavaScript from an HTML document so the saved page is inert: a photograph, not a program.
It parses with golang.org/x/net/html, walks the tree, and deletes scripts, event handlers, javascript: URLs, downlevel IE conditional comments (which can smuggle a <script> past an element-only walk), and the dead preconnect/preload hints that mean nothing offline — while leaving styles, images, fonts, forms, and all semantic markup untouched so the layout survives intact.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Options ¶
type Options struct {
// KeepNoscript unwraps <noscript> content into the document instead of
// deleting it, for sites whose real content hides behind a JS check.
KeepNoscript bool
// KeepMetaRefresh preserves a plain timed <meta http-equiv="refresh">
// (a JS-target refresh is always removed).
KeepMetaRefresh bool
// Banner, when non-empty, is inserted as an HTML comment at the top of the
// document.
Banner string
}
Options tune a few edge behaviours; the zero value is the safe default (scripts and noscript removed, meta-refresh removed).
type Report ¶
type Report struct {
ScriptsRemoved int
HandlersRemoved int
NoscriptRemoved int
NoscriptUnwrapped int
JSURLsNeutralized int
MetaRefreshRemoved int
DeadLinksRemoved int
CondCommentsRemoved int
CharsetAdded bool
}
Report counts what was removed, for the run summary and for tests.