parse

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 20, 2022 License: MIT Imports: 7 Imported by: 0

Documentation

Index

Constants

View Source
const (
	ErrCodeNone = ErrCode(iota)
	ErrCodeUnexpected
	ErrCodeExpected
	ErrCodeUnterminated
	ErrCodeIncomplete
	ErrCodeUnpaired
	ErrCodeInvalid
	ErrCodeOverflow

	ErrCodeUnmatched = ErrCode(-1)
)
View Source
const Unmatched = rune(0x7fffffff)

Variables

This section is empty.

Functions

func ExtractHex32n

func ExtractHex32n(src Source, max_chars int) (v uint32, overflow bool, n_chars int)

func ExtractHex64n

func ExtractHex64n(src Source, max_chars int) (v uint64, overflow bool, n_chars int)

func ExtractOct32n

func ExtractOct32n(src Source, max_chars int) (v uint32, overflow bool, n_chars int)

func Static

func Static(buf []byte, lc *LineCol) *static_impl

Static implements Source that reads content from memory-loaded data.

func Tokenize

func Tokenize[T Key](buf []byte, bindings []*Binding[T], on_token func(k T, c *Context, lc LineCol)) error

Types

type Binding

type Binding[K Key] struct {
	// contains filtered or unexported fields
}

func Bind

func Bind[K Key](key K, descr string, sequence ...any) *Binding[K]

type Context

type Context struct {
	strings.Builder
	Values []any
}

func (*Context) Reset

func (c *Context) Reset()

type ErrAtLineCol

type ErrAtLineCol struct {
	Err error
	Loc LineCol
}

func (*ErrAtLineCol) Error

func (e *ErrAtLineCol) Error() string

type ErrCode

type ErrCode int

func EOF

func EOF(src Source, ctx *Context) ErrCode

func EOL

func EOL(src Source, ctx *Context) ErrCode

func HexCodepoint_XXXX

func HexCodepoint_XXXX(src Source, ctx *Context) ErrCode

HexCodepoint_XXXX captures four hexadecimal digits and interprets those as a UTF-16 codepoint. This codeunit is then converted to a UTF-8 sequence and inserted into the captured string.

Returned values are:

  • `ErrCodeUnmatched` if src does not start with the hex digit
  • `ErrCodeIncomplete` if src contains less than 4 hex digits
  • `ErrCodeInvalid` if src is a surrogate.
  • `ErrCodeNone` if src contains 4 hex digits that represent a valid codepoint.

If src contains more than 4 digits, this function consumes only the the first 4 them.

func HexCodepoint_XXXXXXXX

func HexCodepoint_XXXXXXXX(src Source, ctx *Context) ErrCode

HexCodepoint_XXXXXXXX captures 8 hexadecimal digits and interprets those as a UTF-32 codeunit. This codeunit is then converted to a UTF-8 sequence and inserted into the captured string. This function does not perform any validation, neither does it check for surrogates.

Returned values are:

  • `ErrCodeUnmatched` if src does not start with the hex digit
  • `ErrCodeIncomplete` if src contains less than 8 hex digits
  • `ErrCodeNone` if src contains 8 hex digits

If src contains more than 8 digits, this function consumes only the the first 8 them.

func HexCodeunit_XX

func HexCodeunit_XX(src Source, ctx *Context) ErrCode

HexCodeunit_XX reads two hexadecimal digits from src and inserts the corresponding numeric value into captured string as a UTF-8 codeunit. The codeunit is inserted as-is, without any validation.

Returned values are:

  • `ErrCodeUnmatched` if src does not start with the hex digit
  • `ErrCodeIncomplete` if src contains only one hex digit
  • `ErrCodeNone` if src contains two hex digits

If src contains more than two hex digits, this function consumes only the the first two of them.

func HexCodeunit_Xn

func HexCodeunit_Xn(src Source, ctx *Context) ErrCode

HexCodeunit_Xn reads hexadecimal digits from src and inserts the corresponding numeric value into captured string as a UTF-8 codeunit. The codeunit is inserted as-is, without any validation.

Returned values are:

  • `ErrCodeUnmatched` if src does not start with the hex digit
  • `ErrCodeInvalid` if the obtained value exceeds 255
  • `ErrCodeNone` if src contains a value in [0..255] range

This function consumes all the hex digits, regardless of overflow.

func OctCodeunit_X3n

func OctCodeunit_X3n(src Source, ctx *Context) ErrCode

OctCodeunit_X3n reads 1~3 octal digits from src and inserts the corresponding numeric value into captured string as a UTF-8 codeunit. The codeunit is inserted as-is, without any validation.

Returned values are:

  • `ErrCodeUnmatched` if src does not start with the hex digit
  • `ErrCodeInvalid` if the obtained value exceeds 255
  • `ErrCodeNone` if src contains a value in [0..255] range

func (ErrCode) String

func (ec ErrCode) String() string

type ErrContent

type ErrContent struct {
	Code ErrCode
	What string
}

func Expected

func Expected(v string) *ErrContent

func Invalid

func Invalid(v string) *ErrContent

func Unexpected

func Unexpected(v string) *ErrContent

func Unpaired

func Unpaired(v string) *ErrContent

func Unterminated

func Unterminated(v string) *ErrContent

func (*ErrContent) Error

func (e *ErrContent) Error() string

type Key

type Key = any

type LineCol

type LineCol struct {
	LineIndex   int // 0-based
	ColumnIndex int // 0-based
}

func (*LineCol) String

func (lc *LineCol) String() string

type Location

type Location struct {
	Offset     int
	LineNumber int
	LineOffset int
}

func (*Location) ColumnNumber

func (l *Location) ColumnNumber() int

type Source

type Source interface {
	// Done indicates that there is no more content available in the input.
	Done() bool

	// Peek previews the codepoint without consuming it. Returns the Unmatched
	// sentinel if the source is at the end of input.
	Peek() rune

	// Hop consumes one codepoint if it matches c.
	Hop(c rune) bool

	// Leap consumes len(seq) bytes only if all the bytes match.
	Leap(seq string) bool

	// Fetch consumes and returns one codepoint if its value is matched by f.
	// Otherwise, it returns the Unmatched sentinel.
	Fetch(f func(rune) bool) rune

	// Skip consumes len(seq) bytes only if all the bytes match and the codepoint
	// that follows matches the term. This is similar to Leap followed by Fetch.
	Skip(seq string, term func(rune) bool) rune
}

type Term

type Term interface {
	TermFunc | rune | string | func(rune) bool
}

type TermFunc

type TermFunc = func(Source, *Context) ErrCode

func AnyOf

func AnyOf(args ...string) TermFunc

AnyOf matches and captures any of the provided literal sequences.

func Between

func Between(prefix, terminator any, content ...any) TermFunc

func Codepoint

func Codepoint(r rune) TermFunc

func CodepointFunc

func CodepointFunc(m func(rune) bool) TermFunc

func Escaped

func Escaped(prefix rune, escapers map[rune]any) TermFunc

Escaped creates a matcher for escape sequences (typically found inside string literals).

Supported escaper types (assuming prefix is `\`):

  • struct{} self-mapping \z -> z
  • byte maps to byte \z -> byte code unit
  • rune maps to rune \z -> utf8-encoded codepoint
  • string maps to string \z -> literal string sequence
  • TermFunc uses termfunc \z... -> envokes TermFunc to decode `...`

A key value in the supplied map may be specified as Unmatched

func FirstOf

func FirstOf(args ...any) TermFunc

func HexCodeunit_XXXX

func HexCodeunit_XXXX(first_prefix, second_prefix string) TermFunc

HexCodeunit_XXXX this is a tricky one that is specialized for escape sequences that may decode into a utf-16 pair of surrogates which, in turn, needs to be re-assembled into a single codeunit. JSON is a good example.

func HexN

func HexN[T unsigned](prefix string) TermFunc

func Literal

func Literal(s string) TermFunc

func OneOrMore

func OneOrMore[T Term](a T) TermFunc

func Optional

func Optional[T Term](a T) TermFunc

Optional matches zero or one: (a)?

func Sequence

func Sequence(args ...any) TermFunc

func Skip

func Skip(content ...any) TermFunc

func Uint

func Uint[T unsigned | signed](prefix string, base uint, maxval T) TermFunc

Uint captures numeric value v from a sequence of one or more digits.

func ZeroOrMore

func ZeroOrMore[T Term](a T) TermFunc

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL