html

package
v0.167.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 21, 2026 License: AGPL-3.0 Imports: 13 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNotFound  = errors.New("not found")
	ErrParseURL  = errors.New("could not parse URL")
	ErrParseHTML = errors.New("could not parse HTML")
)

Functions

func CleanRedditHTML added in v0.160.0

func CleanRedditHTML(content string) string

CleanRedditHTML will remove the janky table format that some reddit posts are contained within.

func DiscoverFeedURL added in v0.159.0

func DiscoverFeedURL(sourceURL *url.URL, content []byte) (string, error)

DiscoverFeedURL attempts to find a feed URL within a HTML page.

There are a couple of "canonical" places the feed URL is located. Firstly, as per the RSS spec, look for a link element with rel="alternate" and type="application/rss+xml". Secondly, check for a link element with a URL that ends with feed, rss or atom, which would indicate a feed URL.

func ExtractImageFromHTML added in v0.160.0

func ExtractImageFromHTML(content string) (string, string, error)

func FindAllHTMLNodes

func FindAllHTMLNodes(n *html.Node, tag string) []*html.Node

FindAllHTMLNodes returns all nodes matching the tag within n.

func FindHTMLNode

func FindHTMLNode(n *html.Node, tag string) *html.Node

FindHTMLNode does a depth-first search for the first node matching the tag.

func FindMainImage

func FindMainImage(page []byte, rawURL string) (string, error)

FindMainImage tries to find a "main" image for the page, using the readability parser.

func IsHTML

func IsHTML(s string) bool

func IsHTMLElement

func IsHTMLElement(str, tag string) bool

IsHTMLElement returns a boolean indicating whether the given string is the given HTML element.

func SanitizeHTMLString added in v0.83.0

func SanitizeHTMLString(rawStr string) (string, error)

SanitizeHTMLString will parse and re-render the given string containing HTML. In doing so, the HTML is hopefully sanitized and reformatted to be well-formed HTML.

func ToPlainText added in v0.155.0

func ToPlainText(s string) string

ToPlainText converts a HTML encoded string to plain text.

Types

type Favicon

type Favicon struct {
	// contains filtered or unexported fields
}

Favicon is a favicon link found in <head>.

func FindFavicon

func FindFavicon(
	page []byte,
	pageURL string,
) ([]byte, string, Favicon, error)

FindFavicon tries each candidate in order and returns the first one that responds with a 2xx status and a non-empty body.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL