bluemonday

package

v1.0.0 Latest Latest Go to latest Published: Feb 13, 2022 License: GPL-2.0, BSD-3-Clause Imports: 7 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/pschlump/lexie

README ¶

bluemonday

bluemonday is a HTML sanitizer implemented in Go. It is fast and highly configurable.

bluemonday takes untrusted user generated content as an input, and will return HTML that has been sanitised against a whitelist of approved HTML elements and attributes so that you can safely include the content in your web page.

If you accept user generated content, and your server uses Go, you need bluemonday.

The default policy for user generated content (bluemonday.UGCPolicy().Sanitize()) turns this:

Hello <STYLE>.XSS{background-image:url("javascript:alert('XSS')");}</STYLE><A CLASS=XSS></A>World

Into a harmless:

Hello World

And it turns this:

<a href="javascript:alert('XSS1')" onmouseover="alert('XSS2')">XSS<a>

Into this:

XSS

Whilst still allowing this:

<a href="http://www.google.com/">
  <img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>

To pass through mostly unaltered (it gained a rel="nofollow" which is a good thing for user generated content):

<a href="http://www.google.com/" rel="nofollow">
  <img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>

It protects sites from XSS attacks. There are many vectors for an XSS attack and the best way to mitigate the risk is to sanitize user input against a known safe list of HTML elements and attributes.

You should always run bluemonday after any other processing.

If you use blackfriday or Pandoc then bluemonday should be run after these steps. This ensures that no insecure HTML is introduced later in your process.

bluemonday is heavily inspired by both the OWASP Java HTML Sanitizer and the HTML Purifier.

Technical Summary

Whitelist based, you need to either build a policy describing the HTML elements and attributes to permit (and the regexp patterns of attributes), or use one of the supplied policies representing good defaults.

The policy containing the whitelist is applied using a fast non-validating, forward only, token-based parser implemented in the Go net/html library by the core Go team.

We expect to be supplied with well-formatted HTML (closing elements for every applicable open element, nested correctly) and so we do not focus on repairing badly nested or incomplete HTML. We focus on simply ensuring that whatever elements do exist are described in the policy whitelist and that attributes and links are safe for use on your web page. GIGO does apply and if you feed it bad HTML bluemonday is not tasked with figuring out how to make it good again.

Supported Go Versions

bluemonday is regularly tested against Go 1.1, 1.2, 1.3 and tip.

We do not support Go 1.0 as we depend on golang.org/x/net/html which includes a reference to io.ErrNoProgress which did not exist in Go 1.0.

Is it production ready?

Yes

We are using bluemonday in production having migrated from the widely used and heavily field tested OWASP Java HTML Sanitizer.

We are passing our extensive test suite (including AntiSamy tests as well as tests for any issues raised). Check for any unresolved issues to see whether anything may be a blocker for you.

We invite pull requests and issues to help us ensure we are offering comprehensive protection against various attacks via user generated content.

Usage

Install in your ${GOPATH} using go get -u github.com/microcosm-cc/bluemonday

Then call it:

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.UGCPolicy()
	html := p.Sanitize(
		`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`,
	)

	// Output:
	// <a href="http://www.google.com" rel="nofollow">Google</a>
	fmt.Println(html)
}

We offer three ways to call Sanitize:

p.Sanitize(string) string
p.SanitizeBytes([]byte) []byte
p.SanitizeReader(io.Reader) bytes.Buffer

If you are obsessed about performance, p.SanitizeReader(r).Bytes() will return a []byte without performing any unnecessary casting of the inputs or outputs. Though the difference is so negligible you should never need to care.

You can build your own policies:

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.NewPolicy()

	// Require URLs to be parseable by net/url.Parse and either:
	//   mailto: http:// or https://
	p.AllowStandardURLs()

	// We only allow <p> and <a href="">
	p.AllowAttrs("href").OnElements("a")
	p.AllowElements("p")

	html := p.Sanitize(
		`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`,
	)

	// Output:
	// <a href="http://www.google.com">Google</a>
	fmt.Println(html)
}

We ship two default policies:

bluemonday.StrictPolicy() which can be thought of as equivalent to stripping all HTML elements and their attributes as it has nothing on it's whitelist. An example usage scenario would be blog post titles where HTML tags are not expected at all and if they are then the elements and the content of the elements should be stripped. This is a very strict policy.
bluemonday.UGCPolicy() which allows a broad selection of HTML elements and attributes that are safe for user generated content. Note that this policy does not whitelist iframes, object, embed, styles, script, etc. An example usage scenario would be blog post bodies where a variety of formatting is expected along with the potential for TABLEs and IMGs.

Policy Building

The essence of building a policy is to determine which HTML elements and attributes are considered safe for your scenario. OWASP provide an XSS prevention cheat sheet to help explain the risks, but essentially:

Avoid anything other than the standard HTML elements
Avoid script, style, iframe, object, embed, base elements that allow code to be executed by the client or third party content to be included that can execute code
Avoid anything other than plain HTML attributes with values matched to a regexp

Basically, you should be able to describe what HTML is fine for your scenario. If you do not have confidence that you can describe your policy please consider using one of the shipped policies such as bluemonday.UGCPolicy().

To create a new policy:

p := bluemonday.NewPolicy()

To add elements to a policy either add just the elements:

p.AllowElements("b", "strong")

Or add elements as a virtue of adding an attribute:

// Not the recommended pattern, see the recommendation on using .Matching() below
p.AllowAttrs("nowrap").OnElements("td", "th")

Attributes can either be added to all elements:

p.AllowAttrs("dir").Matching(regexp.MustCompile("(?i)rtl|ltr")).Globally()

Or attributes can be added to specific elements:

// Not the recommended pattern, see the recommendation on using .Matching() below
p.AllowAttrs("value").OnElements("li")

It is always recommended that an attribute be made to match a pattern. XSS in HTML attributes is very easy otherwise:

// \p{L} matches unicode letters, \p{N} matches unicode numbers
p.AllowAttrs("title").Matching(regexp.MustCompile(`[\p{L}\p{N}\s\-_',:\[\]!\./\\\(\)&]*`)).Globally()

You can stop at any time and call .Sanitize():

// string htmlIn passed in from a HTTP POST
htmlOut := p.Sanitize(htmlIn)

And you can take any existing policy and extend it:

p := bluemonday.UGCPolicy()
p.AllowElements("fieldset", "select", "option")

Links

Links are difficult beasts to sanitise safely and also one of the biggest attack vectors for malicious content.

It is possible to do this:

p.AllowAttrs("href").Matching(regexp.MustCompile(`(?i)mailto|https?`)).OnElements("a")

But that will not protect you as the regular expression is insufficient in this case to have prevented a malformed value doing something unexpected.

We provide some additional global options for safely working with links.

RequireParseableURLs will ensure that URLs are parseable by Go's net/url package:

p.RequireParseableURLs(true)

If you have enabled parseable URLs then the following option will AllowRelativeURLs. By default this is disabled (bluemonday is a whitelist tool... you need to explicitly tell us to permit things) and when disabled it will prevent all local and scheme relative URLs (i.e. href="localpage.html", href="../home.html" and even href="//www.google.com" are relative):

p.AllowRelativeURLs(true)

If you have enabled parseable URLs then you can whitelist the schemes (commonly called protocol when thinking of http and https) that are permitted. Bear in mind that allowing relative URLs in the above option will allow for a blank scheme:

p.AllowURLSchemes("mailto", "http", "https")

Regardless of whether you have enabled parseable URLs, you can force all URLs to have a rel="nofollow" attribute. This will be added if it does not exist, but only when the href is valid:

// This applies to "a" "area" "link" elements that have a "href" attribute
p.RequireNoFollowOnLinks(true)

We provide a convenience method that applies all of the above, but you will still need to whitelist the linkable elements for the URL rules to be applied to:

p.AllowStandardURLs()
p.AllowAttrs("cite").OnElements("blockquote", "q")
p.AllowAttrs("href").OnElements("a", "area")
p.AllowAttrs("src").OnElements("img")

An additional complexity regarding links is the data URI as defined in RFC2397. The data URI allows for images to be served inline using this format:

<img src="data:image/webp;base64,UklGRh4AAABXRUJQVlA4TBEAAAAvAAAAAAfQ//73v/+BiOh/AAA=">

We have provided a helper to verify the mimetype followed by base64 content of data URIs links:

p.AllowDataURIImages()

That helper will enable GIF, JPEG, PNG and WEBP images.

It should be noted that there is a potential security risk with the use of data URI links. You should only enable data URI links if you already trust the content.

Policy Building Helpers

We also bundle some helpers to simplify policy building:


// Permits the "dir", "id", "lang", "title" attributes globally
p.AllowStandardAttributes()

// Permits the "img" element and it's standard attributes
p.AllowImages()

// Permits ordered and unordered lists, and also definition lists
p.AllowLists()

// Permits HTML tables and all applicable elements and non-styling attributes
p.AllowTables()

Invalid Instructions

The following are invalid:

// This does not say where the attributes are allowed, you need to add
// .Globally() or .OnElements(...)
// This will be ignored without error.
p.AllowAttrs("value")

// This does not say where the attributes are allowed, you need to add
// .Globally() or .OnElements(...)
// This will be ignored without error.
p.AllowAttrs(
	"type",
).Matching(
	regexp.MustCompile("(?i)^(circle|disc|square|a|A|i|I|1)$"),
)

Both examples exhibit the same issue, they declare attributes but do not then specify whether they are whitelisted globally or only on specific elements (and which elements). Attributes belong to one or more elements, and the policy needs to declare this.

Limitations

We are not yet including any tools to help whitelist and sanitize CSS. Which means that unless you wish to do the heavy lifting in a single regular expression (inadvisable), you should not allow the "style" attribute anywhere.

It is not the job of bluemonday to fix your bad HTML, it is merely the job of bluemonday to prevent malicious HTML getting through. If you have mismatched HTML elements, or non-conforming nesting of elements, those will remain. But if you have well-structured HTML bluemonday will not break it.

TODO

Add support for allowing the list of HTML elements that are permitted without attributes to be configured
Add support for CSS sanitisation to allow some CSS properties based on a whitelist, possibly using the Gorilla CSS3 scanner
Investigate whether devs want to blacklist elements and attributes. This would allow devs to take an existing policy (such as the bluemonday.UGCPolicy() ) that encapsulates 90% of what they're looking for but does more than they need, and to remove the extra things they do not want to make it 100% what they want
Investigate whether devs want a validating HTML mode, in which the HTML elements are not just transformed into a balanced tree (every start tag has a closing tag at the correct depth) but also that elements and character data appear only in their allowed context (i.e. that a table element isn't a descendent of a caption, that colgroup, thead, tbody, tfoot and tr are permitted, and that character data is not permitted)

Development

If you have cloned this repo you will probably need the dependency:

go get golang.org/x/net/html

Gophers can use their familiar tools:

go build

go test

I personally use a Makefile as it spares typing the same args over and over whilst providing consistency for those of us who jump from language to language and enjoy just typing make in a project directory and watch magic happen.

make will build, vet, test and install the library.

make clean will remove the library from a single ${GOPATH}/pkg directory tree

make test will run the tests

make cover will run the tests and open a browser window with the coverage report

make lint will run golint (install via go get github.com/golang/lint/golint)

Long term goals

Open the code to adversarial peer review similar to the Attack Review Ground Rules
Raise funds and pay for an external security review

Documentation ¶

Overview ¶

Package bluemonday provides a way of describing a whitelist of HTML elements and attributes as a policy, and for that policy to be applied to untrusted strings from users that may contain markup. All elements and attributes not on the whitelist will be stripped.

The default bluemonday.UGCPolicy().Sanitize() turns this:

Hello <STYLE>.XSS{background-image:url("javascript:alert('XSS')");}</STYLE><A CLASS=XSS></A>World

Into the more harmless:

Hello World

And it turns this:

<a href="javascript:alert('XSS1')" onmouseover="alert('XSS2')">XSS<a>

Into this:

XSS

Whilst still allowing this:

<a href="http://www.google.com/">
  <img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>

To pass through mostly unaltered (it gained a rel="nofollow"):

<a href="http://www.google.com/" rel="nofollow">
  <img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>

The primary purpose of bluemonday is to take potentially unsafe user generated content (from things like Markdown, HTML WYSIWYG tools, etc) and make it safe for you to put on your website.

It protects sites against XSS (http://en.wikipedia.org/wiki/Cross-site_scripting) and other malicious content that a user interface may deliver. There are many vectors for an XSS attack (https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet) and the safest thing to do is to sanitize user input against a known safe list of HTML elements and attributes.

Note: You should always run bluemonday after any other processing.

If you use blackfriday (https://github.com/russross/blackfriday) or Pandoc (http://johnmacfarlane.net/pandoc/) then bluemonday should be run after these steps. This ensures that no insecure HTML is introduced later in your process.

bluemonday is heavily inspired by both the OWASP Java HTML Sanitizer (https://code.google.com/p/owasp-java-html-sanitizer/) and the HTML Purifier (http://htmlpurifier.org/).

We ship two default policies, one is bluemonday.StrictPolicy() and can be thought of as equivalent to stripping all HTML elements and their attributes as it has nothing on it's whitelist.

The other is bluemonday.UGCPolicy() and allows a broad selection of HTML elements and attributes that are safe for user generated content. Note that this policy does not whitelist iframes, object, embed, styles, script, etc.

The essence of building a policy is to determine which HTML elements and attributes are considered safe for your scenario. OWASP provide an XSS prevention cheat sheet ( https://www.google.com/search?q=xss+prevention+cheat+sheet ) to help explain the risks, but essentially:

Avoid whitelisting anything other than plain HTML elements
Avoid whitelisting `script`, `style`, `iframe`, `object`, `embed`, `base` elements
Avoid whitelisting anything other than plain HTML elements with simple values that you can match to a regexp

Example ¶

package main

import (
	"fmt"
	"regexp"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// Create a new policy
	p := bluemonday.NewPolicy()

	// Add elements to a policy without attributes
	p.AllowElements("b", "strong")

	// Add elements as a virtue of adding an attribute
	p.AllowAttrs("nowrap").OnElements("td", "th")

	// Attributes can either be added to all elements
	p.AllowAttrs("dir").Globally()

	//Or attributes can be added to specific elements
	p.AllowAttrs("value").OnElements("li")

	// It is ALWAYS recommended that an attribute be made to match a pattern
	// XSS in HTML attributes is a very easy attack vector

	// \p{L} matches unicode letters, \p{N} matches unicode numbers
	p.AllowAttrs("title").Matching(regexp.MustCompile(`[\p{L}\p{N}\s\-_',:\[\]!\./\\\(\)&]*`)).Globally()

	// You can stop at any time and call .Sanitize()

	// Assumes that string htmlIn was passed in from a HTTP POST and contains
	// untrusted user generated content
	htmlIn := `untrusted user generated content <body onload="alert('XSS')">`
	fmt.Println(p.Sanitize(htmlIn))

	// And you can take any existing policy and extend it
	p = bluemonday.UGCPolicy()
	p.AllowElements("fieldset", "select", "option")

	// Links are complex beasts and one of the biggest attack vectors for
	// malicious content so we have included features specifically to help here.

	// This is not recommended:
	p = bluemonday.NewPolicy()
	p.AllowAttrs("href").Matching(regexp.MustCompile(`(?i)mailto|https?`)).OnElements("a")

	// The regexp is insufficient in this case to have prevented a malformed
	// value doing something unexpected.

	// This will ensure that URLs are not considered invalid by Go's net/url
	// package.
	p.RequireParseableURLs(true)

	// If you have enabled parseable URLs then the following option will allow
	// relative URLs. By default this is disabled and will prevent all local and
	// schema relative URLs (i.e. `href="//www.google.com"` is schema relative).
	p.AllowRelativeURLs(true)

	// If you have enabled parseable URLs then you can whitelist the schemas
	// that are permitted. Bear in mind that allowing relative URLs in the above
	// option allows for blank schemas.
	p.AllowURLSchemes("mailto", "http", "https")

	// Regardless of whether you have enabled parseable URLs, you can force all
	// URLs to have a rel="nofollow" attribute. This will be added if it does
	// not exist.

	// This applies to "a" "area" "link" elements that have a "href" attribute
	p.RequireNoFollowOnLinks(true)

	// We provide a convenience function that applies all of the above, but you
	// will still need to whitelist the linkable elements:
	p = bluemonday.NewPolicy()
	p.AllowStandardURLs()
	p.AllowAttrs("cite").OnElements("blockquote")
	p.AllowAttrs("href").OnElements("a", "area")
	p.AllowAttrs("src").OnElements("img")

	// Policy Building Helpers

	// If you've got this far and you're bored already, we also bundle some
	// other convenience functions
	p = bluemonday.NewPolicy()
	p.AllowStandardAttributes()
	p.AllowImages()
	p.AllowLists()
	p.AllowTables()
}

Output:

Examples ¶

Package
NewPolicy
Policy.AllowAttrs
Policy.AllowElements
Policy.Sanitize
Policy.SanitizeBytes
Policy.SanitizeReader
StrictPolicy
UGCPolicy

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// CellAlign handles the `align` attribute
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td#attr-align
	CellAlign = regexp.MustCompile(`(?i)^(center|justify|left|right|char)$`)

	// CellVerticalAlign handles the `valign` attribute
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td#attr-valign
	CellVerticalAlign = regexp.MustCompile(`(?i)^(baseline|bottom|middle|top)$`)

	// Direction handles the `dir` attribute
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bdo#attr-dir
	Direction = regexp.MustCompile(`(?i)^(rtl|ltr)$`)

	// ImageAlign handles the `align` attribute on the `image` tag
	// http://www.w3.org/MarkUp/Test/Img/imgtest.html
	ImageAlign = regexp.MustCompile(
		`(?i)^(left|right|top|texttop|middle|absmiddle|baseline|bottom|absbottom)$`,
	)

	// Integer describes whole positive integers (including 0) used in places
	// like td.colspan
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td#attr-colspan
	Integer = regexp.MustCompile(`^[0-9]+$`)

	// ISO8601 according to the W3 group is only a subset of the ISO8601
	// standard: http://www.w3.org/TR/NOTE-datetime
	//
	// Used in places like time.datetime
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/time#attr-datetime
	//
	// Matches patterns:
	//  Year:
	//     YYYY (eg 1997)
	//  Year and month:
	//     YYYY-MM (eg 1997-07)
	//  Complete date:
	//     YYYY-MM-DD (eg 1997-07-16)
	//  Complete date plus hours and minutes:
	//     YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
	//  Complete date plus hours, minutes and seconds:
	//     YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
	//  Complete date plus hours, minutes, seconds and a decimal fraction of a
	//  second
	//      YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
	ISO8601 = regexp.MustCompile(
		`^[0-9]{4}(-[0-9]{2}(-[0-9]{2}([ T][0-9]{2}(:[0-9]{2}){1,2}(.[0-9]{1,6})` +
			`?Z?([\+-][0-9]{2}:[0-9]{2})?)?)?)?$`,
	)

	// ListType encapsulates the common value as well as the latest spec
	// values for lists
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ol#attr-type
	ListType = regexp.MustCompile(`(?i)^(circle|disc|square|a|A|i|I|1)$`)

	// SpaceSeparatedTokens is used in places like `a.rel` and the common attribute
	// `class` which both contain space delimited lists of data tokens
	// http://www.w3.org/TR/html-markup/datatypes.html#common.data.tokens-def
	// Regexp: \p{L} matches unicode letters, \p{N} matches unicode numbers
	SpaceSeparatedTokens = regexp.MustCompile(`^([\s\p{L}\p{N}_-]+)$`)

	// Number is a double value used on HTML5 meter and progress elements
	// http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#the-meter-element
	Number = regexp.MustCompile(`^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$`)

	// NumberOrPercent is used predominantly as units of measurement in width
	// and height attributes
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#attr-height
	NumberOrPercent = regexp.MustCompile(`^[0-9]+[%]?$`)

	// Paragraph of text in an attribute such as *.'title', img.alt, etc
	// https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes#attr-title
	// Note that we are not allowing chars that could close tags like '>'
	Paragraph = regexp.MustCompile(`^[\p{L}\p{N}\s\-_',\[\]!\./\\\(\)]*$`)
)

A selection of regular expressions that can be used as .Matching() rules on HTML attributes.

Functions ¶

This section is empty.

Types ¶

type Policy ¶

type Policy struct {
	// contains filtered or unexported fields
}

Policy encapsulates the whitelist of HTML elements and attributes that will be applied to the sanitised HTML.

You should use bluemonday.NewPolicy() to create a blank policy as the unexported fields contain maps that need to be initialized.

func NewPolicy ¶

func NewPolicy() *Policy

NewPolicy returns a blank policy with nothing whitelisted or permitted. This is the recommended way to start building a policy and you should now use AllowAttrs() and/or AllowElements() to construct the whitelist of HTML elements and attributes.

Example ¶

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// NewPolicy is a blank policy and we need to explicitly whitelist anything
	// that we wish to allow through
	p := bluemonday.NewPolicy()

	// We ensure any URLs are parseable and have rel="nofollow" where applicable
	p.AllowStandardURLs()

	// AllowStandardURLs already ensures that the href will be valid, and so we
	// can skip the .Matching()
	p.AllowAttrs("href").OnElements("a")

	// We allow paragraphs too
	p.AllowElements("p")

	html := p.Sanitize(
		`<p><a onblur="alert(secret)" href="http://www.google.com">Google</a></p>`,
	)

	fmt.Println(html)

}

Output:

<p><a href="http://www.google.com" rel="nofollow">Google</a></p>

func StrictPolicy ¶

func StrictPolicy() *Policy

StrictPolicy returns an empty policy, which will effectively strip all HTML elements and their attributes from a document.

Example ¶

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// StrictPolicy is equivalent to NewPolicy and as nothing else is declared
	// we are stripping all elements (and their attributes)
	p := bluemonday.StrictPolicy()

	html := p.Sanitize(
		`Goodbye <a onblur="alert(secret)" href="http://en.wikipedia.org/wiki/Goodbye_Cruel_World_(Pink_Floyd_song)">Cruel</a> World`,
	)

	fmt.Println(html)

}

Output:

Goodbye Cruel World

func StripTagsPolicy ¶

func StripTagsPolicy() *Policy

StripTagsPolicy is DEPRECATED. Use StrictPolicy instead.

func UGCPolicy ¶

func UGCPolicy() *Policy

UGCPolicy returns a policy aimed at user generated content that is a result of HTML WYSIWYG tools and Markdown conversions.

This is expected to be a fairly rich document where as much markup as possible should be retained. Markdown permits raw HTML so we are basically providing a policy to sanitise HTML5 documents safely but with the least intrusion on the formatting expectations of the user.

Example ¶

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// UGCPolicy is a convenience policy for user generated content.
	p := bluemonday.UGCPolicy()

	html := p.Sanitize(
		`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`,
	)

	fmt.Println(html)

}

Output:

<a href="http://www.google.com" rel="nofollow">Google</a>

func (*Policy) AddTargetBlankToFullyQualifiedLinks ¶

func (p *Policy) AddTargetBlankToFullyQualifiedLinks(require bool) *Policy

AddTargetBlankToFullyQualifiedLinks will result in all <a> tags that point to a non-local destination (i.e. starts with a protocol and has a host) having a target="_blank" added to them if one does not already exist

Note: This requires p.RequireParseableURLs(true) and will enable it.

func (*Policy) AllowAttrs ¶

func (p *Policy) AllowAttrs(attrNames ...string) *attrPolicyBuilder

AllowAttrs takes a range of HTML attribute names and returns an attribute policy builder that allows you to specify the pattern and scope of the whitelisted attribute.

The attribute policy is only added to the core policy when either Globally() or OnElements(...) are called.

Example ¶

package main

import (
	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.NewPolicy()

	// Allow the 'title' attribute on every HTML element that has been
	// whitelisted
	p.AllowAttrs("title").Matching(bluemonday.Paragraph).Globally()

	// Allow the 'abbr' attribute on only the 'td' and 'th' elements.
	p.AllowAttrs("abbr").Matching(bluemonday.Paragraph).OnElements("td", "th")

	// Allow the 'colspan' and 'rowspan' attributes, matching a positive integer
	// pattern, on only the 'td' and 'th' elements.
	p.AllowAttrs("colspan", "rowspan").Matching(
		bluemonday.Integer,
	).OnElements("td", "th")
}

Output:

func (*Policy) AllowDataURIImages ¶

func (p *Policy) AllowDataURIImages()

AllowDataURIImages permits the use of inline images defined in RFC2397 http://tools.ietf.org/html/rfc2397 http://en.wikipedia.org/wiki/Data_URI_scheme

Images must have a mimetype matching:

image/gif
image/jpeg
image/png
image/webp

NOTE: There is a potential security risk to allowing data URIs and you should only permit them on content you already trust. http://palizine.plynt.com/issues/2010Oct/bypass-xss-filters/ https://capec.mitre.org/data/definitions/244.html

func (*Policy) AllowDocType ¶

func (p *Policy) AllowDocType(allow bool) *Policy

AllowDocType states whether the HTML sanitised by the sanitizer is allowed to contain the HTML DocType tag: <!DOCTYPE HTML> or one of it's variants.

The HTML spec only permits one doctype per document, and as you know how you are using the output of this, you know best as to whether we should ignore it (default) or not.

If you are sanitizing a HTML fragment the default (false) is fine.

func (*Policy) AllowElements ¶

func (p *Policy) AllowElements(names ...string) *Policy

AllowElements will append HTML elements to the whitelist without applying an attribute policy to those elements (the elements are permitted sans-attributes)

Example ¶

package main

import (
	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.NewPolicy()

	// Allow styling elements without attributes
	p.AllowElements("br", "div", "hr", "p", "span")
}

Output:

func (*Policy) AllowImages ¶

func (p *Policy) AllowImages()

AllowImages enables the img element and some popular attributes. It will also ensure that URL values are parseable. This helper does not enable data URI images, for that you should also use the AllowDataURIImages() helper.

func (*Policy) AllowLists ¶

func (p *Policy) AllowLists()

AllowLists will enabled ordered and unordered lists, as well as definition lists

func (*Policy) AllowRelativeURLs ¶

func (p *Policy) AllowRelativeURLs(require bool) *Policy

AllowRelativeURLs enables RequireParseableURLs and then permits URLs that are parseable, have no schema information and url.IsAbs() returns false This permits local URLs

func (*Policy) AllowStandardAttributes ¶

func (p *Policy) AllowStandardAttributes()

AllowStandardAttributes will enable "id", "title" and the language specific attributes "dir" and "lang" on all elements that are whitelisted

func (*Policy) AllowStandardURLs ¶

func (p *Policy) AllowStandardURLs()

AllowStandardURLs is a convenience function that will enable rel="nofollow" on "a", "area" and "link" (if you have allowed those elements) and will ensure that the URL values are parseable and either relative or belong to the "mailto", "http", or "https" schemes

func (*Policy) AllowStyling ¶

func (p *Policy) AllowStyling()

AllowStyling presently enables the class attribute globally.

Note: When bluemonday ships a CSS parser and we can safely sanitise that, this will also allow sanitized styling of elements via the style attribute.

func (*Policy) AllowTables ¶

func (p *Policy) AllowTables()

AllowTables will enable a rich set of elements and attributes to describe HTML tables

func (*Policy) AllowURLSchemeWithCustomPolicy ¶

func (p *Policy) AllowURLSchemeWithCustomPolicy(
	scheme string,
	urlPolicy func(url *url.URL) (allowUrl bool),
) *Policy

AllowURLSchemeWithCustomPolicy will append URL schemes with a custom URL policy to the whitelist. Only the URLs with matching schema and urlPolicy(url) returning true will be allowed.

func (*Policy) AllowURLSchemes ¶

func (p *Policy) AllowURLSchemes(schemes ...string) *Policy

AllowURLSchemes will append URL schemes to the whitelist Example: p.AllowURLSchemes("mailto", "http", "https")

func (*Policy) RequireNoFollowOnFullyQualifiedLinks ¶

func (p *Policy) RequireNoFollowOnFullyQualifiedLinks(require bool) *Policy

RequireNoFollowOnFullyQualifiedLinks will result in all <a> tags that point to a non-local destination (i.e. starts with a protocol and has a host) having a rel="nofollow" added to them if one does not already exist

Note: This requires p.RequireParseableURLs(true) and will enable it.

func (*Policy) RequireNoFollowOnLinks ¶

func (p *Policy) RequireNoFollowOnLinks(require bool) *Policy

RequireNoFollowOnLinks will result in all <a> tags having a rel="nofollow" added to them if one does not already exist

Note: This requires p.RequireParseableURLs(true) and will enable it.

func (*Policy) RequireParseableURLs ¶

func (p *Policy) RequireParseableURLs(require bool) *Policy

RequireParseableURLs will result in all URLs requiring that they be parseable by "net/url" url.Parse() This applies to: - a.href - area.href - blockquote.cite - img.src - link.href - script.src

func (*Policy) Sanitize ¶

func (p *Policy) Sanitize(s string) string

Sanitize takes a string that contains a HTML fragment or document and applies the given policy whitelist.

It returns a HTML string that has been sanitized by the policy or an empty string if an error has occurred (most likely as a consequence of extremely malformed input)

Example ¶

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// UGCPolicy is a convenience policy for user generated content.
	p := bluemonday.UGCPolicy()

	// string in, string out
	html := p.Sanitize(`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`)

	fmt.Println(html)

}

Output:

<a href="http://www.google.com" rel="nofollow">Google</a>

func (*Policy) SanitizeBytes ¶

func (p *Policy) SanitizeBytes(b []byte) []byte

SanitizeBytes takes a []byte that contains a HTML fragment or document and applies the given policy whitelist.

It returns a []byte containing the HTML that has been sanitized by the policy or an empty []byte if an error has occurred (most likely as a consequence of extremely malformed input)

Example ¶

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// UGCPolicy is a convenience policy for user generated content.
	p := bluemonday.UGCPolicy()

	// []byte in, []byte out
	b := []byte(`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`)
	b = p.SanitizeBytes(b)

	fmt.Println(string(b))

}

Output:

<a href="http://www.google.com" rel="nofollow">Google</a>

func (*Policy) SanitizeReader ¶

func (p *Policy) SanitizeReader(r io.Reader) *bytes.Buffer

SanitizeReader takes an io.Reader that contains a HTML fragment or document and applies the given policy whitelist.

It returns a bytes.Buffer containing the HTML that has been sanitized by the policy. Errors during sanitization will merely return an empty result.

Example ¶

package main

import (
	"fmt"
	"strings"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// UGCPolicy is a convenience policy for user generated content.
	p := bluemonday.UGCPolicy()

	// io.Reader in, bytes.Buffer out
	r := strings.NewReader(`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`)
	buf := p.SanitizeReader(r)

	fmt.Println(buf.String())

}

Output:

<a href="http://www.google.com" rel="nofollow">Google</a>

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL