The highest tagged major version is v2.

parse

package module

v2.3.4+incompatible Latest Latest Go to latest Published: Nov 7, 2018 License: MIT Imports: 8 Imported by: 309

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tdewolff/parse

Links

Open Source Insights

README ¶

Parse

This package contains several lexers and parsers written in Go. All subpackages are built to be streaming, high performance and to be in accordance with the official (latest) specifications.

The lexers are implemented using buffer.Lexer in https://github.com/tdewolff/parse/buffer and the parsers work on top of the lexers. Some subpackages have hashes defined (using Hasher) that speed up common byte-slice comparisons.

Buffer

Reader

Reader is a wrapper around a []byte that implements the io.Reader interface. It is comparable to bytes.Reader but has slightly different semantics (and a slightly smaller memory footprint).

Writer

Writer is a buffer that implements the io.Writer interface and expands the buffer as needed. The reset functionality allows for better memory reuse. After calling Reset, it will overwrite the current buffer and thus reduce allocations.

Lexer

Lexer is a read buffer specifically designed for building lexers. It keeps track of two positions: a start and end position. The start position is the beginning of the current token being parsed, the end position is being moved forward until a valid token is found. Calling Shift will collapse the positions to the end and return the parsed []byte.

Moving the end position can go through Move(int) which also accepts negative integers. One can also use Pos() int to try and parse a token, and if it fails rewind with Rewind(int), passing the previously saved position.

Peek(int) byte will peek forward (relative to the end position) and return the byte at that location. PeekRune(int) (rune, int) returns UTF-8 runes and its length at the given byte position. Upon an error Peek will return 0, the user must peek at every character and not skip any, otherwise it may skip a 0 and panic on out-of-bounds indexing.

Lexeme() []byte will return the currently selected bytes, Skip() will collapse the selection. Shift() []byte is a combination of Lexeme() []byte and Skip().

When the passed io.Reader returned an error, Err() error will return that error even if not at the end of the buffer.

StreamLexer

StreamLexer behaves like Lexer but uses a buffer pool to read in chunks from io.Reader, retaining old buffers in memory that are still in use, and re-using old buffers otherwise. Calling Free(n int) frees up n bytes from the internal buffer(s). It holds an array of buffers to accommodate for keeping everything in-memory. Calling ShiftLen() int returns the number of bytes that have been shifted since the previous call to ShiftLen, which can be used to specify how many bytes need to be freed up from the buffer. If you don't need to keep returned byte slices around, call Free(ShiftLen()) after every Shift call.

Strconv

This package contains string conversion function much like the standard library's strconv package, but it is specifically tailored for the performance needs within the minify package.

For example, the floating-point to string conversion function is approximately twice as fast as the standard library, but it is not as precise.

CSS

This package is a CSS3 lexer and parser. Both follow the specification at CSS Syntax Module Level 3. The lexer takes an io.Reader and converts it into tokens until the EOF. The parser returns a parse tree of the full io.Reader input stream, but the low-level Next function can be used for stream parsing to returns grammar units until the EOF.

See README here.

HTML

This package is an HTML5 lexer. It follows the specification at The HTML syntax. The lexer takes an io.Reader and converts it into tokens until the EOF.

See README here.

JS

This package is a JS lexer (ECMA-262, edition 6.0). It follows the specification at ECMAScript Language Specification. The lexer takes an io.Reader and converts it into tokens until the EOF.

See README here.

JSON

This package is a JSON parser (ECMA-404). It follows the specification at JSON. The parser takes an io.Reader and converts it into tokens until the EOF.

See README here.

SVG

This package contains common hashes for SVG1.1 tags and attributes.

XML

This package is an XML1.0 lexer. It follows the specification at Extensible Markup Language (XML) 1.0 (Fifth Edition). The lexer takes an io.Reader and converts it into tokens until the EOF.

See README here.

License

Released under the MIT license.

Documentation ¶

Overview ¶

Package parse contains a collection of parsers for various formats in its subpackages.

Index ¶

Variables
func Copy(src []byte) (dst []byte)
func DataURI(dataURI []byte) ([]byte, []byte, error)
func Dimension(b []byte) (int, int)
func EqualFold(s, targetLower []byte) bool
func IsAllWhitespace(b []byte) bool
func IsNewline(c byte) bool
func IsWhitespace(c byte) bool
func Mediatype(b []byte) ([]byte, map[string]string)
func Number(b []byte) int
func Position(r io.Reader, offset int) (line, col int, context string, err error)
func QuoteEntity(b []byte) (quote byte, n int)
func ReplaceMultipleWhitespace(b []byte) []byte
func ToLower(src []byte) []byte
func TrimWhitespace(b []byte) []byte
type Error
- func NewError(msg string, r io.Reader, offset int) *Error
- func NewErrorLexer(msg string, l *buffer.Lexer) *Error
- func (e *Error) Error() string
- func (e *Error) Position() (int, int, string)

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrBadDataURI = errors.New("not a data URI")

ErrBadDataURI is returned by DataURI when the byte slice does not start with 'data:' or is too short.

Functions ¶

func Copy ¶

func Copy(src []byte) (dst []byte)

Copy returns a copy of the given byte slice.

func DataURI ¶

func DataURI(dataURI []byte) ([]byte, []byte, error)

DataURI parses the given data URI and returns the mediatype, data and ok.

func Dimension ¶

func Dimension(b []byte) (int, int)

Dimension parses a byte-slice and returns the length of the number and its unit.

func EqualFold ¶

func EqualFold(s, targetLower []byte) bool

EqualFold returns true when s matches case-insensitively the targetLower (which must be lowercase).

func IsAllWhitespace ¶

func IsAllWhitespace(b []byte) bool

IsAllWhitespace returns true when the entire byte slice consists of space, \n, \r, \t, \f.

func IsNewline ¶

func IsNewline(c byte) bool

IsNewline returns true for \n, \r.

func IsWhitespace ¶

func IsWhitespace(c byte) bool

IsWhitespace returns true for space, \n, \r, \t, \f.

func Mediatype ¶

func Mediatype(b []byte) ([]byte, map[string]string)

Mediatype parses a given mediatype and splits the mimetype from the parameters. It works similar to mime.ParseMediaType but is faster.

func Number ¶

func Number(b []byte) int

Number returns the number of bytes that parse as a number of the regex format (+|-)?([0-9]+(\.[0-9]+)?|\.[0-9]+)((e|E)(+|-)?[0-9]+)?.

func Position ¶

func Position(r io.Reader, offset int) (line, col int, context string, err error)

Position returns the line and column number for a certain position in a file. It is useful for recovering the position in a file that caused an error. It only treates \n, \r, and \r\n as newlines, which might be different from some languages also recognizing \f, \u2028, and \u2029 to be newlines.

func QuoteEntity ¶

func QuoteEntity(b []byte) (quote byte, n int)

QuoteEntity parses the given byte slice and returns the quote that got matched (' or ") and its entity length.

func ReplaceMultipleWhitespace ¶ added in v1.1.0

func ReplaceMultipleWhitespace(b []byte) []byte

ReplaceMultipleWhitespace replaces character series of space, \n, \t, \f, \r into a single space or newline (when the serie contained a \n or \r).

func ToLower ¶

func ToLower(src []byte) []byte

ToLower converts all characters in the byte slice from A-Z to a-z.

func TrimWhitespace ¶

func TrimWhitespace(b []byte) []byte

TrimWhitespace removes any leading and trailing whitespace characters.

Types ¶

type Error ¶

type Error struct {
	Message string

	Offset int
	// contains filtered or unexported fields
}

Error is a parsing error returned by parser. It contains a message and an offset at which the error occurred.

func NewError ¶

func NewError(msg string, r io.Reader, offset int) *Error

NewError creates a new error

func NewErrorLexer ¶

func NewErrorLexer(msg string, l *buffer.Lexer) *Error

NewErrorLexer creates a new error from a *buffer.Lexer

func (*Error) Error ¶

func (e *Error) Error() string

Error returns the error string, containing the context and line + column number.

func (*Error) Position ¶

func (e *Error) Position() (int, int, string)

Positions re-parses the file to determine the line, column, and context of the error. Context is the entire line at which the error occurred.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
buffer Package buffer contains buffer and wrapper types for byte slices.	Package buffer contains buffer and wrapper types for byte slices.
css Package css is a CSS3 lexer and parser following the specifications at http://www.w3.org/TR/css-syntax-3/.	Package css is a CSS3 lexer and parser following the specifications at http://www.w3.org/TR/css-syntax-3/.
html Package html is an HTML5 lexer following the specifications at http://www.w3.org/TR/html5/syntax.html.	Package html is an HTML5 lexer following the specifications at http://www.w3.org/TR/html5/syntax.html.
js Package js is an ECMAScript5.1 lexer following the specifications at http://www.ecma-international.org/ecma-262/5.1/.	Package js is an ECMAScript5.1 lexer following the specifications at http://www.ecma-international.org/ecma-262/5.1/.
json Package json is a JSON parser following the specifications at http://json.org/.	Package json is a JSON parser following the specifications at http://json.org/.
strconv
svg
xml Package xml is an XML1.0 lexer following the specifications at http://www.w3.org/TR/xml/.	Package xml is an XML1.0 lexer following the specifications at http://www.w3.org/TR/xml/.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL