xml

package
v2.3.4+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 7, 2018 License: MIT Imports: 4 Imported by: 70

README

XML GoDoc GoCover

This package is an XML lexer written in Go. It follows the specification at Extensible Markup Language (XML) 1.0 (Fifth Edition). The lexer takes an io.Reader and converts it into tokens until the EOF.

Installation

Run the following command

go get github.com/tdewolff/parse/xml

or add the following import and run project with go get

import "github.com/tdewolff/parse/xml"

Lexer

Usage

The following initializes a new Lexer with io.Reader r:

l := xml.NewLexer(r)

To tokenize until EOF an error, use:

for {
	tt, data := l.Next()
	switch tt {
	case xml.ErrorToken:
		// error or EOF set in l.Err()
		return
	case xml.StartTagToken:
		// ...
		for {
			ttAttr, dataAttr := l.Next()
			if ttAttr != xml.AttributeToken {
				// handle StartTagCloseToken/StartTagCloseVoidToken/StartTagClosePIToken
				break
			}
			// ...
		}
	case xml.EndTagToken:
		// ...
	}
}

All tokens:

ErrorToken TokenType = iota // extra token when errors occur
CommentToken
CDATAToken
StartTagToken
StartTagCloseToken
StartTagCloseVoidToken
StartTagClosePIToken
EndTagToken
AttributeToken
TextToken
Examples
package main

import (
	"os"

	"github.com/tdewolff/parse/xml"
)

// Tokenize XML from stdin.
func main() {
	l := xml.NewLexer(os.Stdin)
	for {
		tt, data := l.Next()
		switch tt {
		case xml.ErrorToken:
			if l.Err() != io.EOF {
				fmt.Println("Error on line", l.Line(), ":", l.Err())
			}
			return
		case xml.StartTagToken:
			fmt.Println("Tag", string(data))
			for {
				ttAttr, dataAttr := l.Next()
				if ttAttr != xml.AttributeToken {
					break
				}

				key := dataAttr
				val := l.AttrVal()
				fmt.Println("Attribute", string(key), "=", string(val))
			}
		// ...
		}
	}
}

License

Released under the MIT license.

Documentation

Overview

Package xml is an XML1.0 lexer following the specifications at http://www.w3.org/TR/xml/.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func EscapeAttrVal

func EscapeAttrVal(buf *[]byte, b []byte) []byte

EscapeAttrVal returns the escape attribute value bytes without quotes.

func EscapeCDATAVal

func EscapeCDATAVal(buf *[]byte, b []byte) ([]byte, bool)

EscapeCDATAVal returns the escaped text bytes.

Types

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer is the state for the lexer.

func NewLexer

func NewLexer(r io.Reader) *Lexer

NewLexer returns a new Lexer for a given io.Reader.

Example
l := NewLexer(bytes.NewBufferString("<span class='user'>John Doe</span>"))
out := ""
for {
	tt, data := l.Next()
	if tt == ErrorToken {
		break
	}
	out += string(data)
}
fmt.Println(out)
Output:

<span class='user'>John Doe</span>

func (*Lexer) AttrVal

func (l *Lexer) AttrVal() []byte

AttrVal returns the attribute value when an AttributeToken was returned from Next.

func (*Lexer) Err

func (l *Lexer) Err() error

Err returns the error encountered during lexing, this is often io.EOF but also other errors can be returned.

func (*Lexer) Next

func (l *Lexer) Next() (TokenType, []byte)

Next returns the next Token. It returns ErrorToken when an error was encountered. Using Err() one can retrieve the error message.

func (*Lexer) Restore

func (l *Lexer) Restore()

Restore restores the NULL byte at the end of the buffer.

func (*Lexer) Text

func (l *Lexer) Text() []byte

Text returns the textual representation of a token. This excludes delimiters and additional leading/trailing characters.

type TokenType

type TokenType uint32

TokenType determines the type of token, eg. a number or a semicolon.

const (
	ErrorToken TokenType = iota // extra token when errors occur
	CommentToken
	DOCTYPEToken
	CDATAToken
	StartTagToken
	StartTagPIToken
	StartTagCloseToken
	StartTagCloseVoidToken
	StartTagClosePIToken
	EndTagToken
	AttributeToken
	TextToken
)

TokenType values.

func (TokenType) String

func (tt TokenType) String() string

String returns the string representation of a TokenType.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL