hdq

package module
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 5, 2021 License: Apache-2.0 Imports: 12 Imported by: 0

README

hdq - HTML DOM Query Language for Go+

Build Status Go Report Card GitHub release Coverage Status Language GoDoc

Summary about hdq

hdq is a Go+ package for processing HTML documents.

Tutorials

How to collect all links of a html page? If you use hdq, it is very easy.

import "github.com/qiniu/hdq"

func links(url interface{}) []string {
	doc := hdq.Source(url)
	return [link for a <- doc.any.a, link := a.href?:""; link != ""]
}

At first, we call hdq.Source(url) to create a node set named doc. doc is a node set which only contains one node, the root node.

Then, select all a elements by doc.any.a. Here doc.any means all nodes in the html document.

Then, we visit all these a elements, get href attribute value and assign it to the variable link. If link is not empty, collect it.

At last, we return all collected links. Goto tutorial/01-Links to get the full source code.

Documentation

Index

Constants

View Source
const (
	GopPackage = true // to indicate this is a Go+ package
)

Variables

View Source
var (
	ErrNotFound = errors.New("entity not found")
	ErrBreak    = errors.New("break")

	ErrTooManyNodes = errors.New("too many nodes")
	ErrInvalidNode  = errors.New("invalid node")

	// ErrEmptyText represents an `empty text` error.
	ErrEmptyText = errors.New("empty text")

	// ErrInvalidScanFormat represents an `invalid fmt.Scan format` error.
	ErrInvalidScanFormat = errors.New("invalid fmt.Scan format")
)

Functions

This section is empty.

Types

type NodeEnum

type NodeEnum interface {
	ForEach(filter func(node *html.Node) error)
}

type NodeSet

type NodeSet struct {
	Data NodeEnum
	Err  error
}

func New added in v0.3.0

func New(r io.Reader) NodeSet

New creates a NodeSet object.

func Nodes

func Nodes(nodes ...*html.Node) (ret NodeSet)

func Source

func Source(r interface{}) (ret NodeSet)

Source opens a stream (if necessary) to create a NodeSet object.

func (NodeSet) A

func (p NodeSet) A() (ret NodeSet)

A returns NodeSet which node type is ElementNode and it's element type is `a`.

func (NodeSet) All added in v0.3.0

func (p NodeSet) All() NodeSet

func (NodeSet) Any

func (p NodeSet) Any() (ret NodeSet)

func (NodeSet) AttrVal

func (p NodeSet) AttrVal(k string, exactlyOne ...bool) (text string, err error)

AttrVal returns attribute value of NodeSet. exactlyOne=false: if NodeSet is more than one, returns first node's attribute value.

func (NodeSet) Attr__0

func (p NodeSet) Attr__0(k string, exactlyOne ...bool) (text string, err error)

func (NodeSet) Attr__1

func (p NodeSet) Attr__1(k, v string) (ret NodeSet)

func (NodeSet) Attribute

func (p NodeSet) Attribute(k, v string) (ret NodeSet)

Attribute returns NodeSet which the value of attribute `k` is `v`.

func (NodeSet) Child

func (p NodeSet) Child() (ret NodeSet)

func (NodeSet) ChildEqualText

func (p NodeSet) ChildEqualText(text string) (ret NodeSet)

ChildEqualText returns NodeSet which child node text equals `text`.

func (NodeSet) ChildN

func (p NodeSet) ChildN(level int) (ret NodeSet)

func (NodeSet) ChildrenAsText

func (p NodeSet) ChildrenAsText(doReplace bool) (ret NodeSet)

func (NodeSet) Class

func (p NodeSet) Class(v string) (ret NodeSet)

Class returns NodeSet which `class` attribute is `v`.

func (NodeSet) Collect

func (p NodeSet) Collect() (items []*html.Node, err error)

func (NodeSet) CollectOne

func (p NodeSet) CollectOne(exactly ...bool) (item *html.Node, err error)

func (NodeSet) ContainsClass

func (p NodeSet) ContainsClass(v string) (ret NodeSet)

ContainsClass returns NodeSet which class contains `v`.

func (NodeSet) Div

func (p NodeSet) Div() (ret NodeSet)

Div returns NodeSet which node type is ElementNode and it's element type is `div`.

func (NodeSet) Dump

func (p NodeSet) Dump() NodeSet

Dump prints the NodeSet context and `print("\n\n")`.

func (NodeSet) Element

func (p NodeSet) Element(v interface{}) (ret NodeSet)

Element returns NodeSet which node type is ElementNode and it's element type is `v`.

func (NodeSet) ExactText

func (p NodeSet) ExactText(exactlyOne ...bool) (text string, err error)

ExactText returns text of NodeSet. exactlyOne=false: if NodeSet is more than one, returns first node's text (if node type is not TextNode, return error).

func (NodeSet) FirstChild

func (p NodeSet) FirstChild(nodeType html.NodeType) (ret NodeSet)

func (NodeSet) FirstElementChild

func (p NodeSet) FirstElementChild() (ret NodeSet)

func (NodeSet) FirstTextChild

func (p NodeSet) FirstTextChild() (ret NodeSet)

func (NodeSet) ForEach

func (p NodeSet) ForEach(callback func(node NodeSet))

func (NodeSet) Gop_Enum added in v0.3.0

func (p NodeSet) Gop_Enum(callback func(node NodeSet))

func (NodeSet) H1

func (p NodeSet) H1() (ret NodeSet)

H1 returns NodeSet which node type is ElementNode and it's element type is `h1`.

func (NodeSet) H2

func (p NodeSet) H2() (ret NodeSet)

H2 returns NodeSet which node type is ElementNode and it's element type is `h2`.

func (NodeSet) H3

func (p NodeSet) H3() (ret NodeSet)

H3 returns NodeSet which node type is ElementNode and it's element type is `h3`.

func (NodeSet) H4

func (p NodeSet) H4() (ret NodeSet)

H4 returns NodeSet which node type is ElementNode and it's element type is `h4`.

func (NodeSet) HrefVal

func (p NodeSet) HrefVal(exactlyOne ...bool) (text string, err error)

HrefVal returns href attribute's value of NodeSet. exactlyOne=false: if NodeSet is more than one, returns first node's attribute value.

func (NodeSet) Href__0 added in v0.3.1

func (p NodeSet) Href__0(exactlyOne ...bool) (text string, err error)

Href returns href attribute's value of NodeSet.

func (NodeSet) Href__1 added in v0.3.1

func (p NodeSet) Href__1(v string) (ret NodeSet)

Href returns NodeSet which `href` attribute is `v`.

func (NodeSet) Id added in v0.3.0

func (p NodeSet) Id(v string) (ret NodeSet)

Id returns NodeSet which `id` attribute is `v`.

func (NodeSet) Img

func (p NodeSet) Img() (ret NodeSet)

Img returns NodeSet which node type is ElementNode and it's element type is `img`.

func (NodeSet) Int

func (p NodeSet) Int(exactlyOne ...bool) (v int, err error)

Int returns int value of p.Text(). exactlyOne=false: if NodeSet is more than one, returns first node's value.

func (NodeSet) LastChild

func (p NodeSet) LastChild(nodeType html.NodeType) (ret NodeSet)

func (NodeSet) LastElementChild

func (p NodeSet) LastElementChild() (ret NodeSet)

func (NodeSet) LastTextChild

func (p NodeSet) LastTextChild() (ret NodeSet)

func (NodeSet) Li

func (p NodeSet) Li() (ret NodeSet)

Li returns NodeSet which node type is ElementNode and it's element type is `li`.

func (NodeSet) Match

func (p NodeSet) Match(filter func(node *html.Node) bool) (ret NodeSet)

func (NodeSet) Nav

func (p NodeSet) Nav() (ret NodeSet)

Nav returns NodeSet which node type is ElementNode and it's element type is `nav`.

func (NodeSet) NextSibling

func (p NodeSet) NextSibling(delta int) (ret NodeSet)

func (NodeSet) NextSiblings

func (p NodeSet) NextSiblings() (ret NodeSet)

func (NodeSet) Ok

func (p NodeSet) Ok() bool

func (NodeSet) Ol

func (p NodeSet) Ol() (ret NodeSet)

Ol returns NodeSet which node type is ElementNode and it's element type is `ol`.

func (NodeSet) One

func (p NodeSet) One() (ret NodeSet)

func (NodeSet) P added in v0.3.0

func (p NodeSet) P() (ret NodeSet)

P returns NodeSet which node type is ElementNode and it's element type is `p`.

func (NodeSet) Parent

func (p NodeSet) Parent() (ret NodeSet)

func (NodeSet) ParentN

func (p NodeSet) ParentN(level int) (ret NodeSet)

func (NodeSet) PrevSibling

func (p NodeSet) PrevSibling(delta int) (ret NodeSet)

func (NodeSet) PrevSiblings

func (p NodeSet) PrevSiblings() (ret NodeSet)

func (NodeSet) Printf

func (p NodeSet) Printf(w io.Writer, format string, params ...interface{}) NodeSet

Printf prints the NodeSet context and `print(format, params...)`.

func (NodeSet) Render added in v0.2.1

func (p NodeSet) Render(w io.Writer, suffix ...string) (err error)

Render renders the node set to the given writer.

func (NodeSet) ScanInt

func (p NodeSet) ScanInt(format string, exactlyOne ...bool) (v int, err error)

ScanInt returns int value of p.Text(). exactlyOne=false: if NodeSet is more than one, returns first node's value.

func (NodeSet) Span

func (p NodeSet) Span() (ret NodeSet)

Span returns NodeSet which node type is ElementNode and it's element type is `span`.

func (NodeSet) Td

func (p NodeSet) Td() (ret NodeSet)

Td returns NodeSet which node type is ElementNode and it's element type is `td`.

func (NodeSet) Text

func (p NodeSet) Text(exactlyOne ...bool) (text string, err error)

Text returns text of NodeSet. exactlyOne=false: if NodeSet is more than one, returns first node's text.

func (NodeSet) TextContains added in v0.6.0

func (p NodeSet) TextContains(text string) (ret NodeSet)

TextContains returns NodeSet which node type is TextNode and it's text contains `text`.

func (NodeSet) TextEqual added in v0.6.0

func (p NodeSet) TextEqual(text string) (ret NodeSet)

TextEqual returns NodeSet which node type is TextNode and it's text equals `text`.

func (NodeSet) TextHasPrefix added in v0.6.0

func (p NodeSet) TextHasPrefix(text string) (ret NodeSet)

TextHasPrefix returns NodeSet which node type is TextNode and its prefix is `text`.

func (NodeSet) Ul

func (p NodeSet) Ul() (ret NodeSet)

Ul returns NodeSet which node type is ElementNode and it's element type is `ul`.

func (NodeSet) UnitedFloat

func (p NodeSet) UnitedFloat(exactlyOne ...bool) (v float64, err error)

UnitedFloat returns UnitedFloat value of p.Text(). exactlyOne=false: if NodeSet is more than one, returns first node's value.

Directories

Path Synopsis
zip

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL