xmlquery

package module

v1.2.3 Latest Latest Go to latest Published: Jan 13, 2020 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/CyrodiilSavior/xmlquery

Links

Open Source Insights

README ¶

xmlquery

Overview

xmlquery is an XPath query package for XML document, lets you extract data or evaluate from XML documents by an XPath expression.

xmlquery built-in the query object caching feature will caching the recently used XPATH query string. Enable caching can avoid re-compile XPath expression each query.

Change Logs

2019-11-11

Add XPath query caching.

2019-10-05

Add new methods that compatible with invalid XPath expression error: QueryAll and Query.
Add QuerySelector and QuerySelectorAll methods, supported reused your query object.
PR #12 (Thanks @FrancescoIlario)
PR #11 (Thanks @gjvnq)

2018-12-23

added XML output will including comment node. #9

2018-12-03

added support attribute name with namespace prefix and XML output. #6

Installation

 $ go get github.com/antchfx/xmlquery

Getting Started

Find specified XPath query.

list, err := xmlquery.QueryAll(doc, "a")
if err != nil {
	panic(err)
}

Parse a XML from URL.

doc, err := xmlquery.LoadURL("http://www.example.com/sitemap.xml")

Parse a XML from string.

s := `<?xml version="1.0" encoding="utf-8"?><rss version="2.0"></rss>`
doc, err := xmlquery.Parse(strings.NewReader(s))

Parse a XML from io.Reader.

f, err := os.Open("../books.xml")
doc, err := xmlquery.Parse(f)

Find authors of all books in the bookstore.

list := xmlquery.Find(doc, "//book//author")
// or
list := xmlquery.Find(doc, "//author")

Find the second book.

book := xmlquery.FindOne(doc, "//book[2]")

Find all book elements and only get `id` attribute self. (New Feature)

list := xmlquery.Find(doc,"//book/@id")

Find all books with id is bk104.

list := xmlquery.Find(doc, "//book[@id='bk104']")

Find all books that price less than 5.

list := xmlquery.Find(doc, "//book[price<5]")

Evaluate the total price of all books.

expr, err := xpath.Compile("sum(//book/price)")
price := expr.Evaluate(xmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total price: %f\n", price)

Evaluate the number of all books element.

expr, err := xpath.Compile("count(//book)")
price := expr.Evaluate(xmlquery.CreateXPathNavigator(doc)).(float64)

FAQ

`Find()` vs `QueryAll()`, which is better?

Find and QueryAll both do the same things, searches all of matched html nodes. The Find will panics if you give an error XPath query, but QueryAll will return an error for you.

Can I save my query expression object for the next query?

Yes, you can. We offer the QuerySelector and QuerySelectorAll methods, It will accept your query expression object.

Cache a query expression object(or reused) will avoid re-compile XPath query expression, improve your query performance.

Create XML document.

doc := &xmlquery.Node{
	Type: xmlquery.DeclarationNode,
	Data: "xml",
	Attr: []xml.Attr{
		xml.Attr{Name: xml.Name{Local: "version"}, Value: "1.0"},
	},
}
root := &xmlquery.Node{
	Data: "rss",
	Type: xmlquery.ElementNode,
}
doc.FirstChild = root
channel := &xmlquery.Node{
	Data: "channel",
	Type: xmlquery.ElementNode,
}
root.FirstChild = channel
title := &xmlquery.Node{
	Data: "title",
	Type: xmlquery.ElementNode,
}
title_text := &xmlquery.Node{
	Data: "W3Schools Home Page",
	Type: xmlquery.TextNode,
}
title.FirstChild = title_text
channel.FirstChild = title
fmt.Println(doc.OutputXML(true))
// <?xml version="1.0"?><rss><channel><title>W3Schools Home Page</title></channel></rss>

Quick Tutorial

import (
	"github.com/antchfx/xmlquery"
)

func main(){
	s := `<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
  <title>W3Schools Home Page</title>
  <link>https://www.w3schools.com</link>
  <description>Free web building tutorials</description>
  <item>
    <title>RSS Tutorial</title>
    <link>https://www.w3schools.com/xml/xml_rss.asp</link>
    <description>New RSS tutorial on W3Schools</description>
  </item>
  <item>
    <title>XML Tutorial</title>
    <link>https://www.w3schools.com/xml</link>
    <description>New XML tutorial on W3Schools</description>
  </item>
</channel>
</rss>`

	doc, err := xmlquery.Parse(strings.NewReader(s))
	if err != nil {
		panic(err)
	}
	channel := xmlquery.FindOne(doc, "//channel")
	if n := channel.SelectElement("title"); n != nil {
		fmt.Printf("title: %s\n", n.InnerText())
	}
	if n := channel.SelectElement("link"); n != nil {
		fmt.Printf("link: %s\n", n.InnerText())
	}
	for i, n := range xmlquery.Find(doc, "//item/title") {
		fmt.Printf("#%d %s\n", i, n.InnerText())
	}
}

List of supported XPath query packages

Name	Description
htmlquery	XPath query package for the HTML document
xmlquery	XPath query package for the XML document
jsonquery	XPath query package for the JSON document

Questions

Please let me know if you have any questions

Documentation ¶

Overview ¶

Package xmlquery provides extract data from XML documents using XPath expression.

Index ¶

Variables
func FindEach(top *Node, expr string, cb func(int, *Node))
func FindEachWithBreak(top *Node, expr string, cb func(int, *Node) bool)
type Node
type NodeNavigator
- func CreateXPathNavigator(top *Node) *NodeNavigator
type NodeType

Constants ¶

This section is empty.

Variables ¶

View Source

var DisableSelectorCache = false

DisableSelectorCache will disable caching for the query selector if value is true.

View Source

var SelectorCacheMaxEntries = 50

SelectorCacheMaxEntries allows how many selector object can be caching. Default is 50. Will disable caching if SelectorCacheMaxEntries <= 0.

Functions ¶

func FindEach ¶

func FindEach(top *Node, expr string, cb func(int, *Node))

FindEach searches the html.Node and calls functions cb. Important: this method has deprecated, recommend use for .. = range Find(){}.

func FindEachWithBreak ¶

func FindEachWithBreak(top *Node, expr string, cb func(int, *Node) bool)

FindEachWithBreak functions the same as FindEach but allows you to break the loop by returning false from your callback function, cb. Important: this method has deprecated, recommend use for .. = range Find(){}.

Types ¶

type Node ¶

type Node struct {
	Parent, FirstChild, LastChild, PrevSibling, NextSibling *Node

	Type         NodeType
	Data         string
	Prefix       string
	NamespaceURI string
	Attr         []xml.Attr
	// contains filtered or unexported fields
}

A Node consists of a NodeType and some Data (tag name for element nodes, content for text) and are part of a tree of Nodes.

func Find ¶

func Find(top *Node, expr string) []*Node

Find is like QueryAll but it will panics if the `expr` is not a valid XPath expression. See `QueryAll()` function.

func FindOne ¶

func FindOne(top *Node, expr string) *Node

FindOne is like Query but it will panics if the `expr` is not a valid XPath expression. See `Query()` function.

func LoadURL ¶

func LoadURL(url string) (*Node, error)

LoadURL loads the XML document from the specified URL.

func Parse ¶

func Parse(r io.Reader) (*Node, error)

Parse returns the parse tree for the XML from the given Reader.

func Query ¶ added in v1.1.0

func Query(top *Node, expr string) (*Node, error)

Query searches the XML Node that matches by the specified XPath expr, and returns first element of matched.

func QueryAll ¶ added in v1.1.0

func QueryAll(top *Node, expr string) ([]*Node, error)

QueryAll searches the XML Node that matches by the specified XPath expr. Return an error if the expression `expr` cannot be parsed.

func QuerySelector ¶ added in v1.1.0

func QuerySelector(top *Node, selector *xpath.Expr) *Node

QuerySelector returns the first matched XML Node by the specified XPath selector.

func QuerySelectorAll ¶ added in v1.1.0

func QuerySelectorAll(top *Node, selector *xpath.Expr) []*Node

QuerySelectorAll searches all of the XML Node that matches the specified XPath selectors.

func (*Node) InnerText ¶

func (n *Node) InnerText() string

InnerText returns the text between the start and end tags of the object.

func (*Node) OutputXML ¶

func (n *Node) OutputXML(self bool) string

OutputXML returns the text that including tags name.

func (*Node) SelectAttr ¶

func (n *Node) SelectAttr(name string) string

SelectAttr returns the attribute value with the specified name.

func (*Node) SelectElement ¶

func (n *Node) SelectElement(name string) *Node

SelectElement finds child elements with the specified name.

func (*Node) SelectElements ¶

func (n *Node) SelectElements(name string) []*Node

SelectElements finds child elements with the specified name.

type NodeNavigator ¶

type NodeNavigator struct {
	// contains filtered or unexported fields
}

func CreateXPathNavigator ¶

func CreateXPathNavigator(top *Node) *NodeNavigator

CreateXPathNavigator creates a new xpath.NodeNavigator for the specified html.Node.

func (*NodeNavigator) Copy ¶

func (x *NodeNavigator) Copy() xpath.NodeNavigator

func (*NodeNavigator) Current ¶

func (x *NodeNavigator) Current() *Node

func (*NodeNavigator) LocalName ¶

func (x *NodeNavigator) LocalName() string

func (*NodeNavigator) MoveTo ¶

func (x *NodeNavigator) MoveTo(other xpath.NodeNavigator) bool

func (*NodeNavigator) MoveToChild ¶

func (x *NodeNavigator) MoveToChild() bool

func (*NodeNavigator) MoveToFirst ¶

func (x *NodeNavigator) MoveToFirst() bool

func (*NodeNavigator) MoveToNext ¶

func (x *NodeNavigator) MoveToNext() bool

func (*NodeNavigator) MoveToNextAttribute ¶

func (x *NodeNavigator) MoveToNextAttribute() bool

func (*NodeNavigator) MoveToParent ¶

func (x *NodeNavigator) MoveToParent() bool

func (*NodeNavigator) MoveToPrevious ¶

func (x *NodeNavigator) MoveToPrevious() bool

func (*NodeNavigator) MoveToRoot ¶

func (x *NodeNavigator) MoveToRoot()

func (*NodeNavigator) NamespaceURL ¶ added in v1.2.1

func (x *NodeNavigator) NamespaceURL() string

func (*NodeNavigator) NodeType ¶

func (x *NodeNavigator) NodeType() xpath.NodeType

func (*NodeNavigator) Prefix ¶

func (x *NodeNavigator) Prefix() string

func (*NodeNavigator) String ¶

func (x *NodeNavigator) String() string

func (*NodeNavigator) Value ¶

func (x *NodeNavigator) Value() string

type NodeType ¶

type NodeType uint

A NodeType is the type of a Node.

const (
	// DocumentNode is a document object that, as the root of the document tree,
	// provides access to the entire XML document.
	DocumentNode NodeType = iota
	// DeclarationNode is the document type declaration, indicated by the following
	// tag (for example, <!DOCTYPE...> ).
	DeclarationNode
	// ElementNode is an element (for example, <item> ).
	ElementNode
	// TextNode is the text content of a node.
	TextNode
	// CommentNode a comment (for example, <!-- my comment --> ).
	CommentNode
	// AttributeNode is an attribute of element.
	AttributeNode
)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

xmlquery

Overview

Change Logs

Installation

Getting Started

Find specified XPath query.

Parse a XML from URL.

Parse a XML from string.

Parse a XML from io.Reader.

Find authors of all books in the bookstore.

Find the second book.

Find all book elements and only get id attribute self. (New Feature)

Find all books with id is bk104.

Find all books that price less than 5.

Evaluate the total price of all books.

Evaluate the number of all books element.

FAQ

Find() vs QueryAll(), which is better?

Can I save my query expression object for the next query?

Create XML document.

Quick Tutorial

List of supported XPath query packages

Questions

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func FindEach ¶

func FindEachWithBreak ¶

Types ¶

type Node ¶

func Find ¶

func FindOne ¶

func LoadURL ¶

func Parse ¶

func Query ¶ added in v1.1.0

func QueryAll ¶ added in v1.1.0

func QuerySelector ¶ added in v1.1.0

func QuerySelectorAll ¶ added in v1.1.0

func (*Node) InnerText ¶

func (*Node) OutputXML ¶

func (*Node) SelectAttr ¶

func (*Node) SelectElement ¶

func (*Node) SelectElements ¶

type NodeNavigator ¶

func CreateXPathNavigator ¶

func (*NodeNavigator) Copy ¶

func (*NodeNavigator) Current ¶

func (*NodeNavigator) LocalName ¶

func (*NodeNavigator) MoveTo ¶

func (*NodeNavigator) MoveToChild ¶

func (*NodeNavigator) MoveToFirst ¶

func (*NodeNavigator) MoveToNext ¶

func (*NodeNavigator) MoveToNextAttribute ¶

func (*NodeNavigator) MoveToParent ¶

func (*NodeNavigator) MoveToPrevious ¶

func (*NodeNavigator) MoveToRoot ¶

func (*NodeNavigator) NamespaceURL ¶ added in v1.2.1

func (*NodeNavigator) NodeType ¶

func (*NodeNavigator) Prefix ¶

func (*NodeNavigator) String ¶

func (*NodeNavigator) Value ¶

type NodeType ¶

Source Files ¶

Find all book elements and only get `id` attribute self. (New Feature)

`Find()` vs `QueryAll()`, which is better?