hdq

package module
v0.8.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2025 License: Apache-2.0 Imports: 11 Imported by: 0

README

hdq - HTML DOM Query Language for Go+

Build Status Go Report Card GitHub release Coverage Status Language GoDoc

Summary about hdq

hdq is a Go+ package for processing HTML documents.

Tutorials

How to collect all links of a html page? If you use hdq, it is very easy.

import "github.com/goplus/hdq"

func links(url any) []string {
	doc := hdq.Source(url)
	return [link for a <- doc.any.a if link := a.href?:""; link != ""]
}

At first, we call hdq.Source(url) to create a node set named doc. doc is a node set which only contains one node, the root node.

Then, select all a elements by doc.any.a. Here doc.any means all nodes in the html document.

Then, we visit all these a elements, get href attribute value and assign it to the variable link. If link is not empty, collect it.

At last, we return all collected links. Goto tutorial/01-Links to get the full source code.

Documentation

Index

Constants

View Source
const (
	GopPackage = true // to indicate this is a Go+ package
)

Variables

View Source
var (
	ErrNotFound = errors.New("entity not found")
	ErrBreak    = errors.New("break")

	ErrTooManyNodes = errors.New("too many nodes")
	ErrInvalidNode  = errors.New("invalid node")

	// ErrEmptyText represents an `empty text` error.
	ErrEmptyText = errors.New("empty text")

	// ErrInvalidScanFormat represents an `invalid fmt.Scan format` error.
	ErrInvalidScanFormat = errors.New("invalid fmt.Scan format")
)

Functions

This section is empty.

Types

type NodeEnum

type NodeEnum interface {
	ForEach(filter func(node *html.Node) error)
}

type NodeSet

type NodeSet struct {
	Data NodeEnum
	Err  error
}

NodeSet represents a set of nodes.

func New

func New(r io.Reader) NodeSet

New creates a NodeSet object.

func Nodes

func Nodes(nodes ...*html.Node) (ret NodeSet)

Nodes creates a node set from the given nodes.

func Source

func Source(r any) (ret NodeSet)

Source opens a stream (if necessary) to create a NodeSet object.

func (NodeSet) A

func (p NodeSet) A() (ret NodeSet)

A returns NodeSet which node type is ElementNode and it's element type is `a`.

func (NodeSet) All

func (p NodeSet) All() NodeSet

func (NodeSet) Any

func (p NodeSet) Any() (ret NodeSet)

Any returns the all nodes as a node set.

func (NodeSet) AttrVal

func (p NodeSet) AttrVal(k string, exactlyOne ...bool) (text string, err error)

AttrVal returns attribute value of NodeSet. exactlyOne=false: if NodeSet is more than one, returns first node's attribute value.

func (NodeSet) Attr__0

func (p NodeSet) Attr__0(k string, exactlyOne ...bool) (text string, err error)

Attr returns attribute value of NodeSet.

func (NodeSet) Attr__1

func (p NodeSet) Attr__1(k, v string) (ret NodeSet)

func (NodeSet) Attribute__0 added in v0.8.2

func (p NodeSet) Attribute__0(k, v string) (ret NodeSet)

Attribute returns NodeSet which the value of attribute `k` is `v`.

func (NodeSet) Attribute__1 added in v0.8.2

func (p NodeSet) Attribute__1(k string, filter func(v string) bool) (ret NodeSet)

func (NodeSet) Child

func (p NodeSet) Child() (ret NodeSet)

Child returns the child node set. It is equivalent to ChildN(1).

func (NodeSet) ChildEqualText

func (p NodeSet) ChildEqualText(text string) (ret NodeSet)

ChildEqualText returns NodeSet which child node text equals `text`.

func (NodeSet) ChildN

func (p NodeSet) ChildN(level int) (ret NodeSet)

ChildN returns the child node set at the given level.

func (NodeSet) ChildrenAsText

func (p NodeSet) ChildrenAsText(doReplace bool) (ret NodeSet)

func (NodeSet) Class

func (p NodeSet) Class(v string) (ret NodeSet)

Class returns NodeSet which `class` attribute is `v`.

func (NodeSet) Collect

func (p NodeSet) Collect() (items []*html.Node, err error)

Collect returns all nodes.

func (NodeSet) CollectOne__0 added in v0.7.1

func (p NodeSet) CollectOne__0() (item *html.Node, err error)

CollectOne returns the first node.

func (NodeSet) CollectOne__1 added in v0.7.1

func (p NodeSet) CollectOne__1(exactly bool) (item *html.Node, err error)

CollectOne returns the first node. If `exactly` is true, it will return an error if there are more than one node.

func (NodeSet) ContainsClass

func (p NodeSet) ContainsClass(v string) (ret NodeSet)

ContainsClass returns NodeSet which class contains `v`.

func (NodeSet) Div

func (p NodeSet) Div() (ret NodeSet)

Div returns NodeSet which node type is ElementNode and it's element type is `div`.

func (NodeSet) Dl added in v0.8.0

func (p NodeSet) Dl() (ret NodeSet)

Dl returns NodeSet which node type is ElementNode and it's element type is `dl`.

func (NodeSet) Dt added in v0.8.0

func (p NodeSet) Dt() (ret NodeSet)

Dt returns NodeSet which node type is ElementNode and it's element type is `dt`.

func (NodeSet) Dump

func (p NodeSet) Dump() NodeSet

Dump prints the NodeSet context and `print("\n\n")`.

func (NodeSet) Element__0 added in v0.8.2

func (p NodeSet) Element__0(elemType atom.Atom) (ret NodeSet)

Element returns NodeSet which node type is ElementNode and it's element type is `elemType`.

func (NodeSet) Element__1 added in v0.8.2

func (p NodeSet) Element__1(elemType string) (ret NodeSet)

Element returns NodeSet which node type is ElementNode and it's element type is `elemType`.

func (NodeSet) ExactText__0 added in v0.7.1

func (p NodeSet) ExactText__0() (text string, err error)

func (NodeSet) ExactText__1 added in v0.7.1

func (p NodeSet) ExactText__1(exactlyOne bool) (text string, err error)

ExactText returns text of NodeSet. exactlyOne=false: if NodeSet is more than one, returns first node's text (if node type is not TextNode, return error).

func (NodeSet) FirstChild

func (p NodeSet) FirstChild(nodeType html.NodeType) (ret NodeSet)

func (NodeSet) FirstElementChild

func (p NodeSet) FirstElementChild() (ret NodeSet)

func (NodeSet) FirstTextChild

func (p NodeSet) FirstTextChild() (ret NodeSet)

func (NodeSet) ForEach

func (p NodeSet) ForEach(callback func(node NodeSet))

func (NodeSet) Gop_Enum

func (p NodeSet) Gop_Enum(callback func(node NodeSet))

func (NodeSet) H1

func (p NodeSet) H1() (ret NodeSet)

H1 returns NodeSet which node type is ElementNode and it's element type is `h1`.

func (NodeSet) H2

func (p NodeSet) H2() (ret NodeSet)

H2 returns NodeSet which node type is ElementNode and it's element type is `h2`.

func (NodeSet) H3

func (p NodeSet) H3() (ret NodeSet)

H3 returns NodeSet which node type is ElementNode and it's element type is `h3`.

func (NodeSet) H4

func (p NodeSet) H4() (ret NodeSet)

H4 returns NodeSet which node type is ElementNode and it's element type is `h4`.

func (NodeSet) HasAttr added in v0.8.3

func (p NodeSet) HasAttr(k string, exactlyOne ...bool) bool

HasAttr returns true if NodeSet has attribute k or not.

func (NodeSet) HrefVal__0 added in v0.7.1

func (p NodeSet) HrefVal__0() (text string, err error)

func (NodeSet) HrefVal__1 added in v0.7.1

func (p NodeSet) HrefVal__1(exactlyOne bool) (text string, err error)

HrefVal returns href attribute's value of NodeSet. exactlyOne=false: if NodeSet is more than one, returns first node's attribute value.

func (NodeSet) Href__0

func (p NodeSet) Href__0() (text string, err error)

Href returns href attribute's value of NodeSet.

func (NodeSet) Href__1

func (p NodeSet) Href__1(v string) (ret NodeSet)

Href returns NodeSet which `href` attribute is `v`.

func (NodeSet) Href__2 added in v0.7.1

func (p NodeSet) Href__2(exactlyOne bool) (text string, err error)

Href returns href attribute's value of NodeSet.

func (NodeSet) Id

func (p NodeSet) Id(v string) (ret NodeSet)

Id returns NodeSet which `id` attribute is `v`.

func (NodeSet) Img

func (p NodeSet) Img() (ret NodeSet)

Img returns NodeSet which node type is ElementNode and it's element type is `img`.

func (NodeSet) Int__0 added in v0.7.1

func (p NodeSet) Int__0() (v int, err error)

func (NodeSet) Int__1 added in v0.7.1

func (p NodeSet) Int__1(exactlyOne bool) (v int, err error)

Int returns int value of p.Text(). exactlyOne=false: if NodeSet is more than one, returns first node's value.

func (NodeSet) LastChild

func (p NodeSet) LastChild(nodeType html.NodeType) (ret NodeSet)

func (NodeSet) LastElementChild

func (p NodeSet) LastElementChild() (ret NodeSet)

func (NodeSet) LastTextChild

func (p NodeSet) LastTextChild() (ret NodeSet)

func (NodeSet) Li

func (p NodeSet) Li() (ret NodeSet)

Li returns NodeSet which node type is ElementNode and it's element type is `li`.

func (NodeSet) Match

func (p NodeSet) Match(filter func(node *html.Node) bool) (ret NodeSet)

Match returns the matched node set.

func (NodeSet) Nav

func (p NodeSet) Nav() (ret NodeSet)

Nav returns NodeSet which node type is ElementNode and it's element type is `nav`.

func (NodeSet) NextSibling

func (p NodeSet) NextSibling(delta int) (ret NodeSet)

func (NodeSet) NextSiblings

func (p NodeSet) NextSiblings() (ret NodeSet)

func (NodeSet) Ok

func (p NodeSet) Ok() bool

func (NodeSet) Ol

func (p NodeSet) Ol() (ret NodeSet)

Ol returns NodeSet which node type is ElementNode and it's element type is `ol`.

func (NodeSet) One

func (p NodeSet) One() (ret NodeSet)

One returns the first node as a node set.

func (NodeSet) P

func (p NodeSet) P() (ret NodeSet)

P returns NodeSet which node type is ElementNode and it's element type is `p`.

func (NodeSet) Parent

func (p NodeSet) Parent() (ret NodeSet)

Parent returns the parent node set. It is equivalent to ParentN(1).

func (NodeSet) ParentN

func (p NodeSet) ParentN(level int) (ret NodeSet)

ParentN returns the parent node set at the given level.

func (NodeSet) PrevSibling

func (p NodeSet) PrevSibling(delta int) (ret NodeSet)

func (NodeSet) PrevSiblings

func (p NodeSet) PrevSiblings() (ret NodeSet)

func (NodeSet) Printf

func (p NodeSet) Printf(w io.Writer, format string, params ...any) NodeSet

Printf prints the NodeSet context and `print(format, params...)`.

func (NodeSet) Render

func (p NodeSet) Render(w io.Writer, suffix ...string) (err error)

Render renders the node set to the given writer.

func (NodeSet) ScanInt

func (p NodeSet) ScanInt(format string, exactlyOne ...bool) (v int, err error)

ScanInt returns int value of p.Text(). exactlyOne=false: if NodeSet is more than one, returns first node's value.

func (NodeSet) Span

func (p NodeSet) Span() (ret NodeSet)

Span returns NodeSet which node type is ElementNode and it's element type is `span`.

func (NodeSet) Td

func (p NodeSet) Td() (ret NodeSet)

Td returns NodeSet which node type is ElementNode and it's element type is `td`.

func (NodeSet) TextContains

func (p NodeSet) TextContains(text string) (ret NodeSet)

TextContains returns NodeSet which node type is TextNode and it's text contains `text`.

func (NodeSet) TextEqual

func (p NodeSet) TextEqual(text string) (ret NodeSet)

TextEqual returns NodeSet which node type is TextNode and it's text equals `text`.

func (NodeSet) TextHasPrefix

func (p NodeSet) TextHasPrefix(text string) (ret NodeSet)

TextHasPrefix returns NodeSet which node type is TextNode and its prefix is `text`.

func (NodeSet) Text__0 added in v0.7.1

func (p NodeSet) Text__0() (text string, err error)

func (NodeSet) Text__1 added in v0.7.1

func (p NodeSet) Text__1(exactlyOne bool) (text string, err error)

Text returns text of NodeSet. exactlyOne=false: if NodeSet is more than one, returns first node's text.

func (NodeSet) Ul

func (p NodeSet) Ul() (ret NodeSet)

Ul returns NodeSet which node type is ElementNode and it's element type is `ul`.

func (NodeSet) UnitedFloat__0 added in v0.7.1

func (p NodeSet) UnitedFloat__0() (v float64, err error)

func (NodeSet) UnitedFloat__1 added in v0.7.1

func (p NodeSet) UnitedFloat__1(exactlyOne bool) (v float64, err error)

UnitedFloat returns UnitedFloat value of p.Text(). exactlyOne=false: if NodeSet is more than one, returns first node's value.

Directories

Path Synopsis
chore
gopkgimps command
gostdpkgs command
hreflinks command
pysigfetch command
stdpkgprogress command
cmd
hdq command
hdq/internal/base
Package base defines shared basic pieces of the hdq command, in particular logging and the Command structure.
Package base defines shared basic pieces of the hdq command, in particular logging and the Command structure.
hdq/internal/fetch
Package fetch implements the "hdq fetch" command.
Package fetch implements the "hdq fetch" command.
hdq/internal/help
Package help implements the "hdq help” command.
Package help implements the "hdq help” command.
zip
tutorial
01-Links command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL