unhtml

package module
v0.0.0-...-c4becdc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 20, 2014 License: BSD-3-Clause Imports: 7 Imported by: 1

README

unhtml

See the documentation for API documentation

See the examples for a few simple examples

Documentation

Overview

unhtml is a package to parse HTML in the style of marshalling, it uses a similar approach as encoding/xml and encoding/json to parse HTML.

Directions to the unmarshaller are done with xpath. unhtml currently uses http://godoc.org/gopkg.in/xmlpath.v1 for its xpath needs. Reference the documentation of xmlpath for supported xpath features.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Unmarshal

func Unmarshal(r io.Reader, result interface{}, rootpath string) error

Unmarshal parses the HTML in reader and extracts data from it, results are stored in the value pointed to by result.

rootpath is an xpath that can be given to move the root node before unmarshalling, pass an empty string to omit moving the root node.

Unmarshal can store values into the following types:

string, []byte, []rune
Any size unsigned integer and signed integer
float32, float64
An Unmarshaller
An encoding.TextUnmarshaller
Structs (Only considers filling fields with an `unhtml` tag)
Slices and arrays containing any of the above types

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

func NewDecoder

func NewDecoder(r io.Reader) (*Decoder, error)

NewDecoder returns a new Decoder by using the contents of the io.Reader as HTML input. The io.Reader is consumed whole and contents parsed before this function returns.

An error return means something went wrong parsing the HTML.

func (*Decoder) Unmarshal

func (d *Decoder) Unmarshal(result interface{}) error

Unmarshal tries to fill the value given with the input previously given to the Decoder.

Unmarshal only takes a struct as result type, use UnmarshalRelative for other types.

func (*Decoder) UnmarshalRelative

func (d *Decoder) UnmarshalRelative(path string, res interface{}) error

UnmarshalRelative unmarshals from the node depicted by the path given. This allows you to move the root node before unmarshalling.

UnmarshalRelative can return the following errors: - any unhtml errors - xmlpath path compiling - encoding.TextUnmarshaler - unhtml.Unmarshaler

type InvalidUnmarshalError

type InvalidUnmarshalError struct {
	Type reflect.Type
}

Error returned if invalid input was given

func (*InvalidUnmarshalError) Error

func (e *InvalidUnmarshalError) Error() string

type NoNodesAvailable

type NoNodesAvailable string

NoNodesAvailable is returned when an xpath to *Relative functions are unable to find any matching nodes.

func (NoNodesAvailable) Error

func (e NoNodesAvailable) Error() string

type UnmarshalTypeError

type UnmarshalTypeError struct {
	Value string
	Type  reflect.Type
}

Error returned if there was an issue with type compatibility

func (*UnmarshalTypeError) Error

func (e *UnmarshalTypeError) Error() string

type Unmarshaler

type Unmarshaler interface {
	UnmarshalHTML([]byte) error
}

Unmarshaler is an interface that can be implemented to receive the raw resulting node to unmarshal into the type.

This receives a []byte of all text nodes found concatted together in the current node.

Directories

Path Synopsis
A small example on how to use unhtml to parse the GitHub commits page This is for example purpose only, use the GitHub API for actual programmatic access to GitHub.
A small example on how to use unhtml to parse the GitHub commits page This is for example purpose only, use the GitHub API for actual programmatic access to GitHub.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL