Documentation ¶
Overview ¶
Package cleanhtml provides a toolset for reading source HTML documents and attempting to render them into more human-readable output.
Although this package is meant to be consumed by the cleanpg utility (http://github.com/scu/cleanpg) it may be useful in other applications.
url := "http://example.com" sourceData, err := cleanhtml.ReadHTML(url) if err != nil { errStr := fmt.Sprintf("Could not read document at %q: %s", url, err) panic(errStr) } cleanData, err := cleanhtml.CleanHTML(sourceData) if err != nil { errStr := fmt.Sprintf("Could not transform data: %s", err) panic(errStr) }
Disclaimer: this library outputs a document layout and content different than the original page designer. Use of these re-rendered documents are not intended for re-publishing, circumventing content protection mechanisms or violate the copyright of the original content authors.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CleanHTML ¶
CleanHTML provides a rendered HTML document. It accepts document data (normally through cleanhtml.ReadHTML), parses and renders the data through a set of filters to produce readable HTML output, which is returned as a string.
func ReadHTML ¶
ReadHTML reads a web page and returns a string containing the unfiltered document, which is then passed to cleanhtml.CleanHTML to render the result.
func SetLinksRender ¶
func SetLinksRender(flag bool)
SetLinksRender sets flag indicating whether links <a... href...> will be rendered [default = true]
func SetPostH1Render ¶
func SetPostH1Render(flag bool)
SetPostH1Render sets flag indicating whether the renderer will process BODY elements until the first H1 tag is reached
func SetStyleRender ¶
func SetStyleRender(flag bool)
SetStyleRender sets flag indicating whether the renderer embeds tag-level styles automatically [default = true]
Types ¶
This section is empty.