sanitize

package
v0.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2026 License: MIT Imports: 4 Imported by: 0

Documentation

Overview

Package sanitize removes every trace of JavaScript from an HTML document so the saved page is inert: a photograph, not a program.

It parses with golang.org/x/net/html, walks the tree, and deletes scripts, event handlers, javascript: URLs, downlevel IE conditional comments (which can smuggle a <script> past an element-only walk), and the dead preconnect/preload hints that mean nothing offline — while leaving styles, images, fonts, forms, and all semantic markup untouched so the layout survives intact.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Options

type Options struct {
	// KeepNoscript unwraps <noscript> content into the document instead of
	// deleting it, for sites whose real content hides behind a JS check.
	KeepNoscript bool
	// KeepMetaRefresh preserves a plain timed <meta http-equiv="refresh">
	// (a JS-target refresh is always removed).
	KeepMetaRefresh bool
	// Banner, when non-empty, is inserted as an HTML comment at the top of the
	// document.
	Banner string
}

Options tune a few edge behaviours; the zero value is the safe default (scripts and noscript removed, meta-refresh removed).

type Report

type Report struct {
	ScriptsRemoved      int
	HandlersRemoved     int
	NoscriptRemoved     int
	NoscriptUnwrapped   int
	JSURLsNeutralized   int
	MetaRefreshRemoved  int
	DeadLinksRemoved    int
	CondCommentsRemoved int
	CharsetAdded        bool
}

Report counts what was removed, for the run summary and for tests.

func CleanTree

func CleanTree(root *html.Node, opts Options) Report

CleanTree removes all JavaScript from an already-parsed document in place and returns the Report. The cloner uses this so the HTML is parsed only once and shared with the asset rewriter.

func Strip

func Strip(doc []byte, opts Options) ([]byte, Report, error)

Strip parses doc, removes all JavaScript, and returns the rewritten HTML plus a Report. A parse error is returned unchanged to the caller.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL