README

xmlquery

Build Status Coverage Status GoDoc Go Report Card

Overview

xmlquery is an XPath query package for XML documents, allowing you to extract data or evaluate from XML documents with an XPath expression.

xmlquery has a built-in query object caching feature that caches recently used XPATH query strings. Enabling caching can avoid recompile XPath expression for each query.

Change Logs

2020-08-??

  • Add XML stream loading and parsing support.

2019-11-11

  • Add XPath query caching.

2019-10-05

  • Add new methods compatible with invalid XPath expression error: QueryAll and Query.
  • Add QuerySelector and QuerySelectorAll methods, support for reused query objects.
  • PR #12 (Thanks @FrancescoIlario)
  • PR #11 (Thanks @gjvnq)

2018-12-23

  • Added XML output including comment nodes. #9

2018-12-03

  • Added support to attribute name with namespace prefix and XML output. #6

Installation

 $ go get github.com/antchfx/xmlquery

Getting Started

Find specified XPath query.
list, err := xmlquery.QueryAll(doc, "a")
if err != nil {
	panic(err)
}
Parse an XML from URL.
doc, err := xmlquery.LoadURL("http://www.example.com/sitemap.xml")
Parse an XML from string.
s := `<?xml version="1.0" encoding="utf-8"?><rss version="2.0"></rss>`
doc, err := xmlquery.Parse(strings.NewReader(s))
Parse an XML from io.Reader.
f, err := os.Open("../books.xml")
doc, err := xmlquery.Parse(f)
Parse an XML in a stream fashion (simple case without elements filtering).
f, err := os.Open("../books.xml")
p, err := xmlquery.CreateStreamParser(f, "/bookstore/book")
for {
	n, err := p.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		...
	}
}
Parse an XML in a stream fashion (simple case advanced element filtering).
f, err := os.Open("../books.xml")
p, err := xmlquery.CreateStreamParser(f, "/bookstore/book", "/bookstore/book[price>=10]")
for {
	n, err := p.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		...
	}
}
Find authors of all books in the bookstore.
list := xmlquery.Find(doc, "//book//author")
// or
list := xmlquery.Find(doc, "//author")
Find the second book.
book := xmlquery.FindOne(doc, "//book[2]")
Find all book elements and only get id attribute. (New Feature)
list := xmlquery.Find(doc,"//book/@id")
Find all books with id bk104.
list := xmlquery.Find(doc, "//book[@id='bk104']")
Find all books with price less than 5.
list := xmlquery.Find(doc, "//book[price<5]")
Evaluate total price of all books.
expr, err := xpath.Compile("sum(//book/price)")
price := expr.Evaluate(xmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total price: %f\n", price)
Evaluate number of all book elements.
expr, err := xpath.Compile("count(//book)")
price := expr.Evaluate(xmlquery.CreateXPathNavigator(doc)).(float64)

FAQ

Find() vs QueryAll(), which is better?

Find and QueryAll both do the same thing: searches all of matched XML nodes. Find panics if provided with an invalid XPath query, while QueryAll returns an error.

Can I save my query expression object for the next query?

Yes, you can. We provide QuerySelector and QuerySelectorAll methods; they accept your query expression object.

Caching a query expression object avoids recompiling the XPath query expression, improving query performance.

Create XML document.
doc := &xmlquery.Node{
	Type: xmlquery.DeclarationNode,
	Data: "xml",
	Attr: []xml.Attr{
		xml.Attr{Name: xml.Name{Local: "version"}, Value: "1.0"},
	},
}
root := &xmlquery.Node{
	Data: "rss",
	Type: xmlquery.ElementNode,
}
doc.FirstChild = root
channel := &xmlquery.Node{
	Data: "channel",
	Type: xmlquery.ElementNode,
}
root.FirstChild = channel
title := &xmlquery.Node{
	Data: "title",
	Type: xmlquery.ElementNode,
}
title_text := &xmlquery.Node{
	Data: "W3Schools Home Page",
	Type: xmlquery.TextNode,
}
title.FirstChild = title_text
channel.FirstChild = title
fmt.Println(doc.OutputXML(true))
// <?xml version="1.0"?><rss><channel><title>W3Schools Home Page</title></channel></rss>

Quick Tutorial

import (
	"github.com/antchfx/xmlquery"
)

func main(){
	s := `<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
  <title>W3Schools Home Page</title>
  <link>https://www.w3schools.com</link>
  <description>Free web building tutorials</description>
  <item>
    <title>RSS Tutorial</title>
    <link>https://www.w3schools.com/xml/xml_rss.asp</link>
    <description>New RSS tutorial on W3Schools</description>
  </item>
  <item>
    <title>XML Tutorial</title>
    <link>https://www.w3schools.com/xml</link>
    <description>New XML tutorial on W3Schools</description>
  </item>
</channel>
</rss>`

	doc, err := xmlquery.Parse(strings.NewReader(s))
	if err != nil {
		panic(err)
	}
	channel := xmlquery.FindOne(doc, "//channel")
	if n := channel.SelectElement("title"); n != nil {
		fmt.Printf("title: %s\n", n.InnerText())
	}
	if n := channel.SelectElement("link"); n != nil {
		fmt.Printf("link: %s\n", n.InnerText())
	}
	for i, n := range xmlquery.Find(doc, "//item/title") {
		fmt.Printf("#%d %s\n", i, n.InnerText())
	}
}

List of supported XPath query packages

Name Description
htmlquery XPath query package for HTML documents
xmlquery XPath query package for XML documents
jsonquery XPath query package for JSON documents

Questions

Please let me know if you have any questions

Expand ▾ Collapse ▴

Documentation

Overview

    Package xmlquery provides extract data from XML documents using XPath expression.

    Index

    Constants

    This section is empty.

    Variables

    View Source
    var DisableSelectorCache = false

      DisableSelectorCache will disable caching for the query selector if value is true.

      View Source
      var SelectorCacheMaxEntries = 50

        SelectorCacheMaxEntries allows how many selector object can be caching. Default is 50. Will disable caching if SelectorCacheMaxEntries <= 0.

        Functions

        func AddAttr

        func AddAttr(n *Node, key, val string)

          AddAttr adds a new attribute specified by 'key' and 'val' to a node 'n'.

          func AddChild

          func AddChild(parent, n *Node)

            AddChild adds a new node 'n' to a node 'parent' as its last child.

            func AddSibling

            func AddSibling(sibling, n *Node)

              AddSibling adds a new node 'n' as a sibling of a given node 'sibling'. Note it is not necessarily true that the new node 'n' would be added immediately after 'sibling'. If 'sibling' isn't the last child of its parent, then the new node 'n' will be added at the end of the sibling chain of their parent.

              func FindEach

              func FindEach(top *Node, expr string, cb func(int, *Node))

                FindEach searches the html.Node and calls functions cb. Important: this method is deprecated, instead, use for .. = range Find(){}.

                func FindEachWithBreak

                func FindEachWithBreak(top *Node, expr string, cb func(int, *Node) bool)

                  FindEachWithBreak functions the same as FindEach but allows to break the loop by returning false from the callback function `cb`. Important: this method is deprecated, instead, use .. = range Find(){}.

                  func RemoveFromTree

                  func RemoveFromTree(n *Node)

                    RemoveFromTree removes a node and its subtree from the document tree it is in. If the node is the root of the tree, then it's no-op.

                    Types

                    type Attr

                    type Attr struct {
                    	Name         xml.Name
                    	Value        string
                    	NamespaceURI string
                    }

                    type DecoderOptions

                    type DecoderOptions struct {
                    	Strict    bool
                    	AutoClose []string
                    	Entity    map[string]string
                    }

                      DecoderOptions implement the very same options than the standard encoding/xml package. Please refer to this documentation: https://golang.org/pkg/encoding/xml/#Decoder

                      type Node

                      type Node struct {
                      	Parent, FirstChild, LastChild, PrevSibling, NextSibling *Node
                      
                      	Type         NodeType
                      	Data         string
                      	Prefix       string
                      	NamespaceURI string
                      	Attr         []Attr
                      	// contains filtered or unexported fields
                      }

                        A Node consists of a NodeType and some Data (tag name for element nodes, content for text) and are part of a tree of Nodes.

                        func Find

                        func Find(top *Node, expr string) []*Node

                          Find is like QueryAll but panics if `expr` is not a valid XPath expression. See `QueryAll()` function.

                          func FindOne

                          func FindOne(top *Node, expr string) *Node

                            FindOne is like Query but panics if `expr` is not a valid XPath expression. See `Query()` function.

                            func LoadURL

                            func LoadURL(url string) (*Node, error)

                              LoadURL loads the XML document from the specified URL.

                              func Parse

                              func Parse(r io.Reader) (*Node, error)

                                Parse returns the parse tree for the XML from the given Reader.

                                func ParseWithOptions

                                func ParseWithOptions(r io.Reader, options ParserOptions) (*Node, error)

                                  ParseWithOptions is like parse, but with custom options

                                  func Query

                                  func Query(top *Node, expr string) (*Node, error)

                                    Query searches the XML Node that matches by the specified XPath expr, and returns first matched element.

                                    func QueryAll

                                    func QueryAll(top *Node, expr string) ([]*Node, error)

                                      QueryAll searches the XML Node that matches by the specified XPath expr. Returns an error if the expression `expr` cannot be parsed.

                                      func QuerySelector

                                      func QuerySelector(top *Node, selector *xpath.Expr) *Node

                                        QuerySelector returns the first matched XML Node by the specified XPath selector.

                                        func QuerySelectorAll

                                        func QuerySelectorAll(top *Node, selector *xpath.Expr) []*Node

                                          QuerySelectorAll searches all of the XML Node that matches the specified XPath selectors.

                                          func (*Node) InnerText

                                          func (n *Node) InnerText() string

                                            InnerText returns the text between the start and end tags of the object.

                                            func (*Node) OutputXML

                                            func (n *Node) OutputXML(self bool) string

                                              OutputXML returns the text that including tags name.

                                              func (*Node) SelectAttr

                                              func (n *Node) SelectAttr(name string) string

                                                SelectAttr returns the attribute value with the specified name.

                                                func (*Node) SelectElement

                                                func (n *Node) SelectElement(name string) *Node

                                                  SelectElement finds child elements with the specified name.

                                                  func (*Node) SelectElements

                                                  func (n *Node) SelectElements(name string) []*Node

                                                    SelectElements finds child elements with the specified name.

                                                    type NodeNavigator

                                                    type NodeNavigator struct {
                                                    	// contains filtered or unexported fields
                                                    }

                                                    func CreateXPathNavigator

                                                    func CreateXPathNavigator(top *Node) *NodeNavigator

                                                      CreateXPathNavigator creates a new xpath.NodeNavigator for the specified XML Node.

                                                      func (*NodeNavigator) Copy

                                                      func (x *NodeNavigator) Copy() xpath.NodeNavigator

                                                      func (*NodeNavigator) Current

                                                      func (x *NodeNavigator) Current() *Node

                                                      func (*NodeNavigator) LocalName

                                                      func (x *NodeNavigator) LocalName() string

                                                      func (*NodeNavigator) MoveTo

                                                      func (x *NodeNavigator) MoveTo(other xpath.NodeNavigator) bool

                                                      func (*NodeNavigator) MoveToChild

                                                      func (x *NodeNavigator) MoveToChild() bool

                                                      func (*NodeNavigator) MoveToFirst

                                                      func (x *NodeNavigator) MoveToFirst() bool

                                                      func (*NodeNavigator) MoveToNext

                                                      func (x *NodeNavigator) MoveToNext() bool

                                                      func (*NodeNavigator) MoveToNextAttribute

                                                      func (x *NodeNavigator) MoveToNextAttribute() bool

                                                      func (*NodeNavigator) MoveToParent

                                                      func (x *NodeNavigator) MoveToParent() bool

                                                      func (*NodeNavigator) MoveToPrevious

                                                      func (x *NodeNavigator) MoveToPrevious() bool

                                                      func (*NodeNavigator) MoveToRoot

                                                      func (x *NodeNavigator) MoveToRoot()

                                                      func (*NodeNavigator) NamespaceURL

                                                      func (x *NodeNavigator) NamespaceURL() string

                                                      func (*NodeNavigator) NodeType

                                                      func (x *NodeNavigator) NodeType() xpath.NodeType

                                                      func (*NodeNavigator) Prefix

                                                      func (x *NodeNavigator) Prefix() string

                                                      func (*NodeNavigator) String

                                                      func (x *NodeNavigator) String() string

                                                      func (*NodeNavigator) Value

                                                      func (x *NodeNavigator) Value() string

                                                      type NodeType

                                                      type NodeType uint

                                                        A NodeType is the type of a Node.

                                                        const (
                                                        	// DocumentNode is a document object that, as the root of the document tree,
                                                        	// provides access to the entire XML document.
                                                        	DocumentNode NodeType = iota
                                                        	// DeclarationNode is the document type declaration, indicated by the
                                                        	// following tag (for example, <!DOCTYPE...> ).
                                                        	DeclarationNode
                                                        	// ElementNode is an element (for example, <item> ).
                                                        	ElementNode
                                                        	// TextNode is the text content of a node.
                                                        	TextNode
                                                        	// CharDataNode node <![CDATA[content]]>
                                                        	CharDataNode
                                                        	// CommentNode a comment (for example, <!-- my comment --> ).
                                                        	CommentNode
                                                        	// AttributeNode is an attribute of element.
                                                        	AttributeNode
                                                        )

                                                        type ParserOptions

                                                        type ParserOptions struct {
                                                        	Decoder *DecoderOptions
                                                        }

                                                        type StreamParser

                                                        type StreamParser struct {
                                                        	// contains filtered or unexported fields
                                                        }

                                                          StreamParser enables loading and parsing an XML document in a streaming fashion.

                                                          func CreateStreamParser

                                                          func CreateStreamParser(r io.Reader, streamElementXPath string, streamElementFilter ...string) (*StreamParser, error)

                                                            CreateStreamParser creates a StreamParser. Argument streamElementXPath is required. Argument streamElementFilter is optional and should only be used in advanced scenarios.

                                                            Scenario 1: simple case:

                                                            xml := `<AAA><BBB>b1</BBB><BBB>b2</BBB></AAA>`
                                                            sp, err := CreateStreamParser(strings.NewReader(xml), "/AAA/BBB")
                                                            if err != nil {
                                                                panic(err)
                                                            }
                                                            for {
                                                                n, err := sp.Read()
                                                                if err != nil {
                                                                    break
                                                                }
                                                                fmt.Println(n.OutputXML(true))
                                                            }
                                                            

                                                            Output will be:

                                                            <BBB>b1</BBB>
                                                            <BBB>b2</BBB>
                                                            

                                                            Scenario 2: advanced case:

                                                            xml := `<AAA><BBB>b1</BBB><BBB>b2</BBB></AAA>`
                                                            sp, err := CreateStreamParser(strings.NewReader(xml), "/AAA/BBB", "/AAA/BBB[. != 'b1']")
                                                            if err != nil {
                                                                panic(err)
                                                            }
                                                            for {
                                                                n, err := sp.Read()
                                                                if err != nil {
                                                                    break
                                                                }
                                                                fmt.Println(n.OutputXML(true))
                                                            }
                                                            

                                                            Output will be:

                                                            <BBB>b2</BBB>
                                                            

                                                            As the argument names indicate, streamElementXPath should be used for providing xpath query pointing to the target element node only, no extra filtering on the element itself or its children; while streamElementFilter, if needed, can provide additional filtering on the target element and its children.

                                                            CreateStreamParser returns an error if either streamElementXPath or streamElementFilter, if provided, cannot be successfully parsed and compiled into a valid xpath query.

                                                            func CreateStreamParserWithOptions

                                                            func CreateStreamParserWithOptions(
                                                            	r io.Reader,
                                                            	options ParserOptions,
                                                            	streamElementXPath string,
                                                            	streamElementFilter ...string,
                                                            ) (*StreamParser, error)

                                                              CreateStreamParserWithOptions is like CreateStreamParser, but with custom options

                                                              func (*StreamParser) Read

                                                              func (sp *StreamParser) Read() (*Node, error)

                                                                Read returns a target node that satisfies the XPath specified by caller at StreamParser creation time. If there is no more satisfying target nodes after reading the rest of the XML document, io.EOF will be returned. At any time, any XML parsing error encountered will be returned, and the stream parsing stopped. Calling Read() after an error is returned (including io.EOF) results undefined behavior. Also note, due to the streaming nature, calling Read() will automatically remove any previous target node(s) from the document tree.