Documentation
¶
Index ¶
- Variables
- func CleanRedditHTML(content string) string
- func DiscoverFeedURL(sourceURL *url.URL, content []byte) (string, error)
- func ExtractImageFromHTML(content string) (string, string, error)
- func FindAllHTMLNodes(n *html.Node, tag string) []*html.Node
- func FindHTMLNode(n *html.Node, tag string) *html.Node
- func FindMainImage(page []byte, rawURL string) (string, error)
- func IsHTML(s string) bool
- func IsHTMLElement(str, tag string) bool
- func SanitizeHTMLString(rawStr string) (string, error)
- func ToPlainText(s string) string
- type Favicon
Constants ¶
This section is empty.
Variables ¶
Functions ¶
func CleanRedditHTML ¶ added in v0.160.0
CleanRedditHTML will remove the janky table format that some reddit posts are contained within.
func DiscoverFeedURL ¶ added in v0.159.0
DiscoverFeedURL attempts to find a feed URL within a HTML page.
There are a couple of "canonical" places the feed URL is located. Firstly, as per the RSS spec, look for a link element with rel="alternate" and type="application/rss+xml". Secondly, check for a link element with a URL that ends with feed, rss or atom, which would indicate a feed URL.
func ExtractImageFromHTML ¶ added in v0.160.0
func FindAllHTMLNodes ¶
FindAllHTMLNodes returns all nodes matching the tag within n.
func FindHTMLNode ¶
FindHTMLNode does a depth-first search for the first node matching the tag.
func FindMainImage ¶
FindMainImage tries to find a "main" image for the page, using the readability parser.
func IsHTMLElement ¶
IsHTMLElement returns a boolean indicating whether the given string is the given HTML element.
func SanitizeHTMLString ¶ added in v0.83.0
SanitizeHTMLString will parse and re-render the given string containing HTML. In doing so, the HTML is hopefully sanitized and reformatted to be well-formed HTML.
func ToPlainText ¶ added in v0.155.0
ToPlainText converts a HTML encoded string to plain text.