htmldoc

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 9, 2022 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Package htmldoc provides interface to handle HTML documents. It is built on top of golang.org/x/net/html package.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrSkip = errors.New("htmldoc: skip")
	ErrStop = errors.New("htmldoc: stop")
)

ErrSkip and ErrStop are used by Traverse.

Functions

func FindAttr

func FindAttr(n *html.Node, key string) *html.Attribute

FindAttr returns an Attribute in the given Node with the given key, or nil if there is no such attribute. FindAttr inspects only attributes with empty Namespace and ignores "foreign attributes."

func FindNode

func FindNode(n *html.Node, tag atom.Atom) *html.Node

FindNode locates a descendant of the given Node (including the Node itself) which has the given tag. It returns the first descendant when there is more than one, and nil when there is none.

func GetAttr

func GetAttr(n *html.Node, key string) string

GetAttr returns the value of an attribute in the given Node with the given key, or empty string if there is no such attribute. GetAttr considers only attributes with empty Namespace and ignores "foreign attributes."

func Traverse

func Traverse(n *html.Node, f func(*html.Node) error) error

Traverse performs a pre-order traversal on the parse tree n, calling f on each node. f can return ErrSkip to not traverse the subtree of the current node and ErrStop to terminate the traversal entirely without error. When f returns other non-nil error, Traverse abandons the traversal immediately and returns the encountered error.

Types

type Document

type Document struct {
	// Root is the root of the HTML parse tree. Note it is a DocumentNode, not
	// <html> element.
	Root *html.Node

	// Head is the node corresponding to <head> element in the parse tree.
	Head *html.Node

	// Body is the node corresponding to <body> element in the parse tree.
	Body *html.Node

	// URL represents where the document is located.
	URL *url.URL

	// BaseURL represents the base URL of the document. It is usually the same
	// as URL above, but can be altered by <base> element.
	BaseURL *url.URL
}

Document represents an HTML document, holding the parse tree and the related information.

func NewDocument

func NewDocument(payload []byte, url *url.URL) (*Document, error)

NewDocument creates and initializes a new Document from payload and url.

func (*Document) ResolveReference

func (doc *Document) ResolveReference(ref *url.URL) *url.URL

ResolveReference resolves a URI reference in Document to an absolute URI. The URI reference may be relative or absolute. ResolveReference always returns a new URL instance, even if the returned URL is identical to the reference.

type HTMLResponse

type HTMLResponse struct {
	*exchange.Response
	Doc *Document
}

HTMLResponse is an extension of exchange.Response for HTML responses.

func NewHTMLResponse

func NewHTMLResponse(resp *exchange.Response) (*HTMLResponse, error)

NewHTMLResponse creates and initializes a new HTMLResponse.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL