htmlcleaner

package module
v3.1.1-0...-c422d05 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 20, 2018 License: MIT Imports: 7 Imported by: 0

README

HTML Cleaner

Build Status Go Report Card GoDoc Coverage Status

Documentation

Index

Examples

Constants

View Source
const DefaultMaxDepth = 100

DefaultMaxDepth is the default maximum depth of the node trees returned by Parse.

Variables

View Source
var DefaultConfig = (&Config{
	ValidateURL: SafeURLScheme,
}).GlobalAttrAtom(atom.Title).
	ElemAttrAtom(atom.A, atom.Href).
	ElemAttrAtom(atom.Img, atom.Src, atom.Alt).
	ElemAttrAtom(atom.Video, atom.Src, atom.Poster, atom.Controls).
	ElemAttrAtom(atom.Audio, atom.Src, atom.Controls).
	ElemAtom(atom.B, atom.I, atom.U, atom.S).
	ElemAtom(atom.Em, atom.Strong, atom.Strike).
	ElemAtom(atom.Big, atom.Small, atom.Sup, atom.Sub).
	ElemAtom(atom.Ins, atom.Del).
	ElemAtom(atom.Abbr, atom.Address, atom.Cite, atom.Q).
	ElemAtom(atom.P, atom.Blockquote, atom.Pre).
	ElemAtom(atom.Code, atom.Kbd, atom.Tt).
	ElemAttrAtom(atom.Details, atom.Open).
	ElemAtom(atom.Summary)

DefaultConfig is the default settings for htmlcleaner.

Functions

func Clean

func Clean(c *Config, fragment string) string

Clean a fragment of HTML using the specified Config, or the DefaultConfig if it is nil.

Example
package main

import (
	"fmt"
	"net/url"
	"regexp"

	"golang.org/x/net/html/atom"

	"github.com/BenLubar/htmlcleaner"
)

func main() {
	config := (&htmlcleaner.Config{
		ValidateURL: func(u *url.URL) bool {
			return u.Scheme == "https"
		},
	}).ElemAttrAtomMatch(atom.Span, atom.Class, regexp.MustCompile(`\Afa-spin\z`)).ElemAttrAtom(atom.A, atom.Href)

	fmt.Println(htmlcleaner.Clean(config, htmlcleaner.Preprocess(config, `<span class="fa-spin">[whee]</span>
<span class="hello">[aww]</span>
<a href="https://www.google.com">Google</a>
<a href="http://www.google.com">Google</a>
<some tag that doesn't exist>`)))

}
Output:

<span class="fa-spin">[whee]</span>
<span>[aww]</span>
<a href="https://www.google.com">Google</a>
<a>Google</a>
&lt;some tag that doesn&#39;t exist&gt;

func CleanNode

func CleanNode(c *Config, n *html.Node) *html.Node

CleanNode cleans an HTML node using the specified config. Text nodes are returned as-is. Element nodes are recursively checked for legality and have their attributes checked for legality as well. Elements with illegal attributes are copied and the problematic attributes are removed. Elements that are not in the set of legal elements are replaced with a textual version of their source code.

Example
package main

import (
	"fmt"

	"github.com/BenLubar/htmlcleaner"
)

func main() {
	var config *htmlcleaner.Config = nil

	nodes := htmlcleaner.Parse(`<a href="http://golang.org/" onclick="malicious()" title="Go">hello</a>
<script>malicious()</script>`)

	for i, n := range nodes {
		nodes[i] = htmlcleaner.CleanNode(config, n)
	}

	fmt.Println(htmlcleaner.Render(nodes...))

}
Output:

<a href="http://golang.org/" title="Go">hello</a>
&lt;script&gt;malicious()&lt;/script&gt;

func CleanNodes

func CleanNodes(c *Config, nodes []*html.Node) []*html.Node

CleanNodes calls CleanNode on each node, and additionally wraps inline elements in <p> tags and wraps dangling <li> tags in <ul> tags.

func Parse

func Parse(fragment string) []*html.Node

Parse is a convenience wrapper that calls ParseDepth with DefaultMaxDepth.

func ParseDepth

func ParseDepth(fragment string, maxDepth int) []*html.Node

ParseDepth is a convenience function that wraps html.ParseFragment but takes a string instead of an io.Reader and omits deep trees.

func Preprocess added in v1.1.0

func Preprocess(config *Config, fragment string) string

Preprocess escapes disallowed tags in a cleaner way, but does not fix nesting problems. Use with Clean.

func Render

func Render(nodes ...*html.Node) string

Render is a convenience function that wraps html.Render and renders to a string instead of an io.Writer.

func SafeURLScheme

func SafeURLScheme(u *url.URL) bool

SafeURLScheme returns true if u.Scheme is http, https, mailto, data, or an empty string.

Types

type Config

type Config struct {

	// A custom URL validation function. If it is set and returns false,
	// the attribute will be removed. Called for attributes such as src
	// and href.
	ValidateURL func(*url.URL) bool

	// If true, HTML comments are turned into text.
	EscapeComments bool

	// Wrap text nodes in at least one tag.
	WrapText bool
	// contains filtered or unexported fields
}

Config holds the settings for htmlcleaner.

func (*Config) Elem

func (c *Config) Elem(names ...string) *Config

Elem ensures an element name is allowed. The receiver is returned to allow call chaining.

func (*Config) ElemAtom

func (c *Config) ElemAtom(elem ...atom.Atom) *Config

ElemAtom ensures an element name is allowed. The receiver is returned to allow call chaining.

func (*Config) ElemAttr

func (c *Config) ElemAttr(elem string, attr ...string) *Config

ElemAttr allows an attribute name on the specified element. The receiver is returned to allow call chaining.

func (*Config) ElemAttrAtom

func (c *Config) ElemAttrAtom(elem atom.Atom, attr ...atom.Atom) *Config

ElemAttrAtom allows an attribute name on the specified element. The receiver is returned to allow call chaining.

func (*Config) ElemAttrAtomMatch

func (c *Config) ElemAttrAtomMatch(elem, attr atom.Atom, match *regexp.Regexp) *Config

ElemAttrAtomMatch allows an attribute name on the specified element, but only if the value matches a regular expression. The receiver is returned to allow call chaining.

func (*Config) ElemAttrMatch

func (c *Config) ElemAttrMatch(elem, attr string, match *regexp.Regexp) *Config

ElemAttrMatch allows an attribute name on the specified element, but only if the value matches a regular expression. The receiver is returned to allow call chaining.

func (*Config) GlobalAttr

func (c *Config) GlobalAttr(names ...string) *Config

GlobalAttr allows an attribute name on all allowed elements. The receiver is returned to allow call chaining.

func (*Config) GlobalAttrAtom

func (c *Config) GlobalAttrAtom(a atom.Atom) *Config

GlobalAttrAtom allows an attribute name on all allowed elements. The receiver is returned to allow call chaining.

func (*Config) WrapTextInside

func (c *Config) WrapTextInside(names ...string) *Config

WrapTextInside makes an element's children behave as if they are root nodes in the context of WrapText. The receiver is returned to allow call chaining.

func (*Config) WrapTextInsideAtom

func (c *Config) WrapTextInsideAtom(elem ...atom.Atom) *Config

WrapTextInsideAtom makes an element's children behave as if they are root nodes in the context of WrapText. The receiver is returned to allow call chaining.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL