xmlquery

package module
v1.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 13, 2020 License: MIT Imports: 11 Imported by: 0

README

xmlquery

Build Status Coverage Status GoDoc Go Report Card

Overview

xmlquery is an XPath query package for XML document, lets you extract data or evaluate from XML documents by an XPath expression.

xmlquery built-in the query object caching feature will caching the recently used XPATH query string. Enable caching can avoid re-compile XPath expression each query.

Change Logs

2019-11-11

  • Add XPath query caching.

2019-10-05

  • Add new methods that compatible with invalid XPath expression error: QueryAll and Query.
  • Add QuerySelector and QuerySelectorAll methods, supported reused your query object.
  • PR #12 (Thanks @FrancescoIlario)
  • PR #11 (Thanks @gjvnq)

2018-12-23

  • added XML output will including comment node. #9

2018-12-03

  • added support attribute name with namespace prefix and XML output. #6

Installation

 $ go get github.com/antchfx/xmlquery

Getting Started

Find specified XPath query.
list, err := xmlquery.QueryAll(doc, "a")
if err != nil {
	panic(err)
}
Parse a XML from URL.
doc, err := xmlquery.LoadURL("http://www.example.com/sitemap.xml")
Parse a XML from string.
s := `<?xml version="1.0" encoding="utf-8"?><rss version="2.0"></rss>`
doc, err := xmlquery.Parse(strings.NewReader(s))
Parse a XML from io.Reader.
f, err := os.Open("../books.xml")
doc, err := xmlquery.Parse(f)
Find authors of all books in the bookstore.
list := xmlquery.Find(doc, "//book//author")
// or
list := xmlquery.Find(doc, "//author")
Find the second book.
book := xmlquery.FindOne(doc, "//book[2]")
Find all book elements and only get id attribute self. (New Feature)
list := xmlquery.Find(doc,"//book/@id")
Find all books with id is bk104.
list := xmlquery.Find(doc, "//book[@id='bk104']")
Find all books that price less than 5.
list := xmlquery.Find(doc, "//book[price<5]")
Evaluate the total price of all books.
expr, err := xpath.Compile("sum(//book/price)")
price := expr.Evaluate(xmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total price: %f\n", price)
Evaluate the number of all books element.
expr, err := xpath.Compile("count(//book)")
price := expr.Evaluate(xmlquery.CreateXPathNavigator(doc)).(float64)

FAQ

Find() vs QueryAll(), which is better?

Find and QueryAll both do the same things, searches all of matched html nodes. The Find will panics if you give an error XPath query, but QueryAll will return an error for you.

Can I save my query expression object for the next query?

Yes, you can. We offer the QuerySelector and QuerySelectorAll methods, It will accept your query expression object.

Cache a query expression object(or reused) will avoid re-compile XPath query expression, improve your query performance.

Create XML document.
doc := &xmlquery.Node{
	Type: xmlquery.DeclarationNode,
	Data: "xml",
	Attr: []xml.Attr{
		xml.Attr{Name: xml.Name{Local: "version"}, Value: "1.0"},
	},
}
root := &xmlquery.Node{
	Data: "rss",
	Type: xmlquery.ElementNode,
}
doc.FirstChild = root
channel := &xmlquery.Node{
	Data: "channel",
	Type: xmlquery.ElementNode,
}
root.FirstChild = channel
title := &xmlquery.Node{
	Data: "title",
	Type: xmlquery.ElementNode,
}
title_text := &xmlquery.Node{
	Data: "W3Schools Home Page",
	Type: xmlquery.TextNode,
}
title.FirstChild = title_text
channel.FirstChild = title
fmt.Println(doc.OutputXML(true))
// <?xml version="1.0"?><rss><channel><title>W3Schools Home Page</title></channel></rss>

Quick Tutorial

import (
	"github.com/antchfx/xmlquery"
)

func main(){
	s := `<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
  <title>W3Schools Home Page</title>
  <link>https://www.w3schools.com</link>
  <description>Free web building tutorials</description>
  <item>
    <title>RSS Tutorial</title>
    <link>https://www.w3schools.com/xml/xml_rss.asp</link>
    <description>New RSS tutorial on W3Schools</description>
  </item>
  <item>
    <title>XML Tutorial</title>
    <link>https://www.w3schools.com/xml</link>
    <description>New XML tutorial on W3Schools</description>
  </item>
</channel>
</rss>`

	doc, err := xmlquery.Parse(strings.NewReader(s))
	if err != nil {
		panic(err)
	}
	channel := xmlquery.FindOne(doc, "//channel")
	if n := channel.SelectElement("title"); n != nil {
		fmt.Printf("title: %s\n", n.InnerText())
	}
	if n := channel.SelectElement("link"); n != nil {
		fmt.Printf("link: %s\n", n.InnerText())
	}
	for i, n := range xmlquery.Find(doc, "//item/title") {
		fmt.Printf("#%d %s\n", i, n.InnerText())
	}
}

List of supported XPath query packages

Name Description
htmlquery XPath query package for the HTML document
xmlquery XPath query package for the XML document
jsonquery XPath query package for the JSON document

Questions

Please let me know if you have any questions

Documentation

Overview

Package xmlquery provides extract data from XML documents using XPath expression.

Index

Constants

This section is empty.

Variables

View Source
var DisableSelectorCache = false

DisableSelectorCache will disable caching for the query selector if value is true.

View Source
var SelectorCacheMaxEntries = 50

SelectorCacheMaxEntries allows how many selector object can be caching. Default is 50. Will disable caching if SelectorCacheMaxEntries <= 0.

Functions

func FindEach

func FindEach(top *Node, expr string, cb func(int, *Node))

FindEach searches the html.Node and calls functions cb. Important: this method has deprecated, recommend use for .. = range Find(){}.

func FindEachWithBreak

func FindEachWithBreak(top *Node, expr string, cb func(int, *Node) bool)

FindEachWithBreak functions the same as FindEach but allows you to break the loop by returning false from your callback function, cb. Important: this method has deprecated, recommend use for .. = range Find(){}.

Types

type Node

type Node struct {
	Parent, FirstChild, LastChild, PrevSibling, NextSibling *Node

	Type         NodeType
	Data         string
	Prefix       string
	NamespaceURI string
	Attr         []xml.Attr
	// contains filtered or unexported fields
}

A Node consists of a NodeType and some Data (tag name for element nodes, content for text) and are part of a tree of Nodes.

func Find

func Find(top *Node, expr string) []*Node

Find is like QueryAll but it will panics if the `expr` is not a valid XPath expression. See `QueryAll()` function.

func FindOne

func FindOne(top *Node, expr string) *Node

FindOne is like Query but it will panics if the `expr` is not a valid XPath expression. See `Query()` function.

func LoadURL

func LoadURL(url string) (*Node, error)

LoadURL loads the XML document from the specified URL.

func Parse

func Parse(r io.Reader) (*Node, error)

Parse returns the parse tree for the XML from the given Reader.

func Query added in v1.1.0

func Query(top *Node, expr string) (*Node, error)

Query searches the XML Node that matches by the specified XPath expr, and returns first element of matched.

func QueryAll added in v1.1.0

func QueryAll(top *Node, expr string) ([]*Node, error)

QueryAll searches the XML Node that matches by the specified XPath expr. Return an error if the expression `expr` cannot be parsed.

func QuerySelector added in v1.1.0

func QuerySelector(top *Node, selector *xpath.Expr) *Node

QuerySelector returns the first matched XML Node by the specified XPath selector.

func QuerySelectorAll added in v1.1.0

func QuerySelectorAll(top *Node, selector *xpath.Expr) []*Node

QuerySelectorAll searches all of the XML Node that matches the specified XPath selectors.

func (*Node) InnerText

func (n *Node) InnerText() string

InnerText returns the text between the start and end tags of the object.

func (*Node) OutputXML

func (n *Node) OutputXML(self bool) string

OutputXML returns the text that including tags name.

func (*Node) SelectAttr

func (n *Node) SelectAttr(name string) string

SelectAttr returns the attribute value with the specified name.

func (*Node) SelectElement

func (n *Node) SelectElement(name string) *Node

SelectElement finds child elements with the specified name.

func (*Node) SelectElements

func (n *Node) SelectElements(name string) []*Node

SelectElements finds child elements with the specified name.

type NodeNavigator

type NodeNavigator struct {
	// contains filtered or unexported fields
}

func CreateXPathNavigator

func CreateXPathNavigator(top *Node) *NodeNavigator

CreateXPathNavigator creates a new xpath.NodeNavigator for the specified html.Node.

func (*NodeNavigator) Copy

func (x *NodeNavigator) Copy() xpath.NodeNavigator

func (*NodeNavigator) Current

func (x *NodeNavigator) Current() *Node

func (*NodeNavigator) LocalName

func (x *NodeNavigator) LocalName() string

func (*NodeNavigator) MoveTo

func (x *NodeNavigator) MoveTo(other xpath.NodeNavigator) bool

func (*NodeNavigator) MoveToChild

func (x *NodeNavigator) MoveToChild() bool

func (*NodeNavigator) MoveToFirst

func (x *NodeNavigator) MoveToFirst() bool

func (*NodeNavigator) MoveToNext

func (x *NodeNavigator) MoveToNext() bool

func (*NodeNavigator) MoveToNextAttribute

func (x *NodeNavigator) MoveToNextAttribute() bool

func (*NodeNavigator) MoveToParent

func (x *NodeNavigator) MoveToParent() bool

func (*NodeNavigator) MoveToPrevious

func (x *NodeNavigator) MoveToPrevious() bool

func (*NodeNavigator) MoveToRoot

func (x *NodeNavigator) MoveToRoot()

func (*NodeNavigator) NamespaceURL added in v1.2.1

func (x *NodeNavigator) NamespaceURL() string

func (*NodeNavigator) NodeType

func (x *NodeNavigator) NodeType() xpath.NodeType

func (*NodeNavigator) Prefix

func (x *NodeNavigator) Prefix() string

func (*NodeNavigator) String

func (x *NodeNavigator) String() string

func (*NodeNavigator) Value

func (x *NodeNavigator) Value() string

type NodeType

type NodeType uint

A NodeType is the type of a Node.

const (
	// DocumentNode is a document object that, as the root of the document tree,
	// provides access to the entire XML document.
	DocumentNode NodeType = iota
	// DeclarationNode is the document type declaration, indicated by the following
	// tag (for example, <!DOCTYPE...> ).
	DeclarationNode
	// ElementNode is an element (for example, <item> ).
	ElementNode
	// TextNode is the text content of a node.
	TextNode
	// CommentNode a comment (for example, <!-- my comment --> ).
	CommentNode
	// AttributeNode is an attribute of element.
	AttributeNode
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL