htmlquery

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 5, 2020 License: MIT Imports: 8 Imported by: 7

README

htmlquery

Build Status Coverage Status GoDoc Go Report Card

Overview

htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.

Installation

$ go get github.com/antchfx/htmlquery

Getting Started

Load HTML document from URL.
doc, err := htmlquery.LoadURL("http://example.com/")
Load HTML document from URL with Header set
header := map[string]string {
	"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
}
doc,err := htmlquery.LoadURLWithHeader("http://example.com/",header)
Load HTML document from URL with Proxy

doc,err := htmlquery.LoadURLWithProxy("http://example.com/","http://proxyip:proxyport")
Load HTML document from string.
s := `<html>....</html>`
doc, err := htmlquery.Parse(strings.NewReader(s))
Find all A elements.
list := htmlquery.Find(doc, "//a")
Find all A elements with href attribute.
list := range htmlquery.Find(doc, "//a/@href")	
Find the third A element.
a := htmlquery.FindOne(doc, "//a[3]")
Evaluate the number of all IMG element.
expr, _ := xpath.Compile("count(//img)")
v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total count is %f", v)

Quick Tutorial

func main() {
	doc, err := htmlquery.LoadURL("https://www.bing.com/search?q=golang")
	if err != nil {
		panic(err)
	}
	// Find all news item.
	for i, n := range htmlquery.Find(doc, "//ol/li") {
		a := htmlquery.FindOne(n, "//a")
		fmt.Printf("%d %s(%s)\n", i, htmlquery.InnerText(a), htmlquery.SelectAttr(a, "href"))
	}
}

List of supported XPath query packages

Name Description
htmlquery XPath query package for the HTML document
xmlquery XPath query package for the XML document
jsonquery XPath query package for the JSON document

Questions

If you have any questions, create an issue and welcome to contribute.

Documentation

Overview

Package htmlquery provides extract data from HTML documents using XPath expression.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Find

func Find(top *html.Node, expr string) ([]*html.Node, error)

Find searches the html.Node that matches by the specified XPath expr.

func FindEach

func FindEach(top *html.Node, expr string, cb func(int, *html.Node)) error

FindEach searches the html.Node and calls functions cb.

func FindOne

func FindOne(top *html.Node, expr string) (*html.Node, error)

FindOne searches the html.Node that matches by the specified XPath expr, and returns first element of matched html.Node.

func InnerText

func InnerText(n *html.Node) string

InnerText returns the text between the start and end tags of the object.

func LoadURL

func LoadURL(url string) (*html.Node, error)

LoadURL loads the HTML document from the specified URL.

func LoadURLWithHeader

func LoadURLWithHeader(link string, headers map[string]string) (*html.Node, error)

LoadURLWithHeader loads the HTML document from the specified URL with http header

func LoadURLWithProxy

func LoadURLWithProxy(link string, proxy string) (*html.Node, error)

LoadURLWithProxy loads the HTML document from the specified URL with Proxy.

func OutputHTML

func OutputHTML(n *html.Node, self bool) string

OutputHTML returns the text including tags name.

func Parse

func Parse(r io.Reader) (*html.Node, error)

Parse returns the parse tree for the HTML from the given Reader.

func SelectAttr

func SelectAttr(n *html.Node, name string) (val string)

SelectAttr returns the attribute value with the specified name.

Types

type NodeNavigator

type NodeNavigator struct {
	// contains filtered or unexported fields
}

func CreateXPathNavigator

func CreateXPathNavigator(top *html.Node) *NodeNavigator

CreateXPathNavigator creates a new xpath.NodeNavigator for the specified html.Node.

func (*NodeNavigator) Copy

func (h *NodeNavigator) Copy() xpath.NodeNavigator

func (*NodeNavigator) Current

func (h *NodeNavigator) Current() *html.Node

func (*NodeNavigator) LocalName

func (h *NodeNavigator) LocalName() string

func (*NodeNavigator) MoveTo

func (h *NodeNavigator) MoveTo(other xpath.NodeNavigator) bool

func (*NodeNavigator) MoveToChild

func (h *NodeNavigator) MoveToChild() bool

func (*NodeNavigator) MoveToFirst

func (h *NodeNavigator) MoveToFirst() bool

func (*NodeNavigator) MoveToNext

func (h *NodeNavigator) MoveToNext() bool

func (*NodeNavigator) MoveToNextAttribute

func (h *NodeNavigator) MoveToNextAttribute() bool

func (*NodeNavigator) MoveToParent

func (h *NodeNavigator) MoveToParent() bool

func (*NodeNavigator) MoveToPrevious

func (h *NodeNavigator) MoveToPrevious() bool

func (*NodeNavigator) MoveToRoot

func (h *NodeNavigator) MoveToRoot()

func (*NodeNavigator) NodeType

func (h *NodeNavigator) NodeType() xpath.NodeType

func (*NodeNavigator) Prefix

func (*NodeNavigator) Prefix() string

func (*NodeNavigator) String

func (h *NodeNavigator) String() string

func (*NodeNavigator) Value

func (h *NodeNavigator) Value() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL