parser

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 2, 2021 License: Apache-2.0 Imports: 7 Imported by: 9

README

Parser Building Blocks

This project contains a lexerless parser, it performs tokenization and parsing in a single step. This enables you to use a language grammar that expresses both the lexical (word level), and the phrase level structure of the language.

Advantages
  • Non-regular lexical structures are handled easier.
  • Context insensitive.
  • No token classification. This removes language reserved words such as 'for' in Go.
  • Grammars are compositional (can be merged automatically).
  • Only one metalanguage is needed. *

I chose to use PEGN as metalanguage in the examples. This is not required, you can also use other metalanguages like PEG or (A)BNF.

Disadvantages
  • More complicated.
  • Less efficient than lexer-parser with regard to both time and memory.

Usage

The project two parts.

Basic Parser
  1. The main parser package with provides you with a basic parser with more control.

Errors are ignored for simplicity!

p, _ := parser.New([]byte("raw data to parse"))
mark, err := p.Expect("two words")

The example above tries to parse the string "two words". In case the parser succeeded in parsing the string, it will return a mark to the last parsed rune. Otherwise, an error will be returned.

This way the implementer is responsible to capture any value if needed. The parser only will let you know if it succeeded in parsing the given value.

Supported Values
  • rune (int will get converted to runes for convenience).
  • string.
  • AnonymousClass (equal to func(p *Parser) (*Cursor, bool)).
  • All operators defined in the op sub-package.
Customizing

The parser expects UTF8 encoded strings by default. It is possible to use other decoders. This can be done by implementing the DecodeRune callback. This is done in the ELF example.

It is also possible to provide additional supported operators or converters.

AST Parser
  1. The ast package which provides you an interface to immediately construct a syntax tree.

Errors are ignored for simplicity!

p, _ := parser.New([]byte("raw data to parse"))
node, err := p.Expect(ast.Capture{
    TypeStrings: []string{"Digit"},
    Value: parser.CheckRuneFunc(func (r rune) bool {
        return '0' <= r && r <= '9'
    }),
})

This example tries to capture a single digit. In case the parser succeeded in parsing a digit, it will return a node with the parsed digit as value. Otherwise, an error will be returned.

The AST parser in build on top of the basic parser. It is extended with a few more values.

Supported Values
  • All values supported in the basic parser.
  • ParseNode (equal to func(p *Parser) (*Node, error))
  • Capture (captures the value in a node)
  • LoopUp

For more info check out the documentation, it contains examples and descriptions for all functionality.

Documentation

You can find the documentation here. Additional examples can be found here.

Contributing

Contributions are welcome. Feel free to create a PR/issue for new features or bug fixes.

License

Apache License found here.

Documentation

Overview

Example (Line_returns)
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	// Unix, Unix, Windows, Mac, Windows, Mac
	p, _ := parser.New([]byte("\n\n\r\n\r\r\n\r"))
	fmt.Println(p.Mark().Position())
	fmt.Println(p.Next().Mark().Position())
	fmt.Println(p.Next().Mark().Position())
	fmt.Println(p.Next().Mark().Position())
	fmt.Println(p.Next().Mark().Position())
	fmt.Println(p.Next().Mark().Position())
	fmt.Println(p.Next().Mark().Position())
	fmt.Println(p.Next().Mark().Position())
}
Output:

0 0
1 0
2 0
2 1
3 0
4 0
4 1
5 0

Index

Examples

Constants

View Source
const EOD = 1<<31 - 1

EOD indicates the End Of (the) Data.

Variables

This section is empty.

Functions

func ConvertAliases

func ConvertAliases(i interface{}) interface{}

ConvertAliases converts various default primitive types to aliases for type matching.

- (int, rune) - ([]interface{}, op.And)

func Stringer added in v0.2.0

func Stringer(i interface{}) string

Types

type AnonymousClass

type AnonymousClass func(p *Parser) (*Cursor, bool)

AnonymousClass represents an anonymous Class.Check function.

The cursor should never be nil except if it fails at the first rune. e.g. "121".Check("123") should return a mark to the 2nd value.

Example (Error)
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("0"))
	lower := func(p *parser.Parser) (*parser.Cursor, bool) {
		r := p.Current()
		return p.Mark(), 'a' <= r && r <= 'z'
	}

	fmt.Println(lower(p))
}
Output:

U+0030: 0 false
Example (Rune)
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("data"))
	alpha := func(p *parser.Parser) (*parser.Cursor, bool) {
		r := p.Current()
		return p.Mark(),
			'A' <= r && r <= 'Z' ||
				'a' <= r && r <= 'z'
	}

	fmt.Println(alpha(p))
}
Output:

U+0064: d true
Example (String)
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte(":="))
	walrus := func(p *parser.Parser) (*parser.Cursor, bool) {
		var last *parser.Cursor
		for _, r := range []rune(":=") {
			if p.Current() != r {
				return nil, false
			}
			last = p.Mark()
			p.Next()
		}
		return last, true
	}

	fmt.Println(walrus(p))
}
Output:

U+003D: = true

func CheckInteger added in v0.2.1

func CheckInteger(i int, leadingZeros bool) AnonymousClass

CheckInteger returns an AnonymousClass that checks whether the following runes are equal to the given integer. It also consumes leading zeros when indicated to do so.

Example
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("-0001 something else"))
	fmt.Println(p.Check(parser.CheckInteger(-1, false)))
	fmt.Println(p.Check(parser.CheckInteger(-1, true)))
}
Output:

<nil> false
U+0031: 1 true

func CheckIntegerRange added in v0.2.1

func CheckIntegerRange(min, max uint, leadingZeros bool) AnonymousClass

CheckIntegerRange returns an AnonymousClass that checks whether the following runes are inside the given range (inclusive). It also consumes leading zeros when indicated to do so.

Note that this check consumes all the sequential numbers it possibly can. e.g. "12543" is not in the range (0, 12345), even the prefix "1254" is.

Example
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("12445"))
	fmt.Println(p.Check(parser.CheckIntegerRange(12, 12345, false)))
	fmt.Println(p.Check(parser.CheckIntegerRange(10000, 54321, false)))

	p0, _ := parser.New([]byte("00012"))
	fmt.Println(p0.Check(parser.CheckIntegerRange(12, 12345, false)))
	fmt.Println(p0.Check(parser.CheckIntegerRange(12, 12345, true)))
}
Output:

<nil> false
U+0035: 5 true
<nil> false
U+0032: 2 true

func CheckRune

func CheckRune(expected rune) AnonymousClass

CheckRune returns an AnonymousClass that checks whether the current rune of the parser matches the given rune. The same result can be achieved by using p.Expect(r). Where 'p' is a reference to the parser an 'r' a rune value.

func CheckRuneCI added in v0.2.2

func CheckRuneCI(expected rune) AnonymousClass

CheckRuneCI returns an AnonymousClass that checks whether the current (lower cased) rune of the parser matches the given (lower cased) rune. The given rune does not need to be lower case.

Example
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("Ee"))
	fmt.Println(p.Expect(parser.CheckRune('E')))
	fmt.Println(p.Expect(parser.CheckRuneCI('e')))
}
Output:

U+0045: E <nil>
U+0065: e <nil>

func CheckRuneFunc

func CheckRuneFunc(f func(r rune) bool) AnonymousClass

CheckRuneFunc returns an AnonymousClass that checks whether the current rune of the parser matches the given validator.

func CheckRuneRange added in v0.1.1

func CheckRuneRange(min, max rune) AnonymousClass

CheckRuneRange returns an AnonymousClass that checks whether the current rune of the parser is inside the given range (inclusive).

func CheckString

func CheckString(s string) AnonymousClass

CheckString returns an AnonymousClass that checks whether the current sequence runes of the parser matches the given string. The same result can be achieved by using p.Expect(s). Where 'p' is a reference to the parser an 's' a string value.

func CheckStringCI added in v0.2.2

func CheckStringCI(s string) AnonymousClass

CheckStringCI returns an AnonymousClass that checks whether the current (lower cased) sequence runes of the parser matches the given (lower cased) string. The given string does not need to be lower case.

Example
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("Ee"))
	fmt.Println(p.Expect(parser.CheckStringCI("ee")))
}
Output:

U+0065: e <nil>

type Class

type Class interface {
	// Check should return the last p.Mark() that matches the class. It should
	// also return whether it was able to check the whole class.
	//
	// e.g. if the class is defined as follows: '<=' / '=>'. Then a parser that
	// only contains '=' will not match this class and return 'nil, false'.
	Check(p *Parser) (*Cursor, bool)
}

Class provides an interface for checking classes.

type Cursor

type Cursor struct {
	// Rune is the value that the cursor points at.
	Rune rune
	// contains filtered or unexported fields
}

Cursor allows you to record your current position so you can return to it later. Keeps track of its own position in the buffer of the parser.

func (*Cursor) Position added in v0.2.0

func (c *Cursor) Position() (int, int)

Position returns the row and column of the cursors location.

func (*Cursor) String

func (c *Cursor) String() string

type ExpectError

type ExpectError struct {
	Message string
}

ExpectError is an error that occurs on when an invalid/unsupported value is passed to the Parser.Expect function.

func (*ExpectError) Error

func (e *ExpectError) Error() string

type ExpectedParseError

type ExpectedParseError struct {
	// The value that was expected.
	Expected interface{}
	// The value it actually got.
	String string
	// The position of the conflicting value.
	Conflict Cursor
}

ExpectedParseError indicates that the parser Expected a different value than the Actual value present in the buffer.

func (*ExpectedParseError) Error

func (e *ExpectedParseError) Error() string

type InitError

type InitError struct {
	// The error message. This should provide an intuitive message or advice on
	// how to solve this error.
	Message string
}

InitError is an error that occurs on instantiating new structures.

func (*InitError) Error

func (e *InitError) Error() string

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser represents a general purpose parser.

Example (Look_back_and_peek)
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("①23"))
	p.Next() // Point at '2'.

	fmt.Println(p.LookBack())
	fmt.Println(p.Peek())
}
Output:

U+2460: ①
U+0033: 3

func New

func New(input []byte) (*Parser, error)

New creates a new Parser.

func (*Parser) Check added in v0.1.1

func (p *Parser) Check(i interface{}) (*Cursor, bool)

Check works the same as Parser.Expect, but instead it returns a bool instead of an error.

Example (Class)
package main

import (
	"fmt"
	"github.com/di-wu/parser"
	"github.com/di-wu/parser/op"
)

func main() {
	p, _ := parser.New([]byte("Aa1_"))
	alphaNum := op.Or{
		parser.CheckRuneRange('a', 'z'),
		parser.CheckRuneRange('A', 'Z'),
		parser.CheckRuneRange('0', '9'),
	}
	fmt.Println(p.Check(alphaNum))
	fmt.Println(p.Check(alphaNum))
	fmt.Println(p.Check(alphaNum))

	fmt.Println(p.Check(alphaNum))
	fmt.Println(p.Check('_'))
}
Output:

U+0041: A true
U+0061: a true
U+0031: 1 true
<nil> false
U+005F: _ true

func (*Parser) Current

func (p *Parser) Current() rune

Current returns the value to which the cursor is pointing at.

Example
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("some data"))

	current := p.Current()
	fmt.Printf("%U: %c", current, current)
}
Output:

U+0073: s

func (*Parser) DecodeRune added in v0.1.4

func (p *Parser) DecodeRune(d func(p []byte) (rune, int))

DecodeRune allows you to redefine the way runes are decoded form the byte stream. By default utf8.DecodeRune is used.

func (*Parser) Done

func (p *Parser) Done() bool

Done checks whether the parser is done parsing.

Example
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("_"))

	fmt.Println(p.Next().Done())
}
Output:

true

func (*Parser) Expect

func (p *Parser) Expect(i interface{}) (*Cursor, error)

Expect checks whether the buffer contains the given value. It consumes their corresponding runes and returns a mark to the last rune of the consumed value. It returns an error if can not find a match with the given value.

It currently supports:

  • rune & string
  • func(p *Parser) (*Cursor, bool) (== AnonymousClass)
  • []interface{} (== op.And)
  • operators: op.Not, op.And, op.Or & op.XOr
Example (Class)
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("1 <= 2"))
	digit := func(p *parser.Parser) (*parser.Cursor, bool) {
		r := p.Current()
		return p.Mark(), '0' <= r && r <= '9'
	}
	lt := func(p *parser.Parser) (*parser.Cursor, bool) {
		var last *parser.Cursor
		for _, r := range []rune("<=") {
			if p.Current() != r {
				return nil, false
			}
			last = p.Mark()
			p.Next()
		}
		return last, true
	}

	mark, _ := p.Expect(digit)
	fmt.Printf("%U: %c\n", mark.Rune, mark.Rune)

	p.Next() // Skip space.

	mark, _ = p.Expect(lt)
	fmt.Printf("%U: %c\n", mark.Rune, mark.Rune)

	p.Next() // Skip space.

	mark, _ = p.Expect(digit)
	fmt.Printf("%U: %c\n", mark.Rune, mark.Rune)
}
Output:

U+0031: 1
U+003D: =
U+0032: 2
Example (Rune)
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("data"))

	mark, _ := p.Expect('d')
	fmt.Printf("%U: %c\n", mark.Rune, mark.Rune)

	_, err := p.Expect('d')
	fmt.Println(err)

	mark, _ = p.Expect('a')
	fmt.Printf("%U: %c\n", mark.Rune, mark.Rune)
	mark, _ = p.Expect('t')
	fmt.Printf("%U: %c\n", mark.Rune, mark.Rune)
	current := p.Current()
	fmt.Printf("%U: %c\n", current, current)

	fmt.Println(p.Next().Done())
}
Output:

U+0064: d
parse conflict [00:001]: expected int32 'd' but got 'a'
U+0061: a
U+0074: t
U+0061: a
true
Example (String)
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("some data"))

	mark, _ := p.Expect("some")
	fmt.Printf("%U: %c\n", mark.Rune, mark.Rune)

	p.Next() // Skip space.

	mark, _ = p.Expect("data")
	fmt.Printf("%U: %c\n", mark.Rune, mark.Rune)
}
Output:

U+0065: e
U+0061: a

func (*Parser) ExpectedParseError added in v0.2.0

func (p *Parser) ExpectedParseError(expected interface{}, start, end *Cursor) *ExpectedParseError

ExpectedParseError creates an ExpectedParseError error based on the given start and end cursor. Resets the parser tot the start cursor.

func (*Parser) Jump

func (p *Parser) Jump(mark *Cursor) *Parser

Jump goes to the position of the given mark.

func (*Parser) LookBack

func (p *Parser) LookBack() *Cursor

LookBack returns the previous cursor without decreasing the parser.

func (*Parser) Mark

func (p *Parser) Mark() *Cursor

Mark returns a copy of the current cursor.

func (*Parser) Next

func (p *Parser) Next() *Parser

Next advances the parser by one rune.

Example
package main

import (
	"fmt"
	"github.com/di-wu/parser"
)

func main() {
	p, _ := parser.New([]byte("some data"))

	current := p.Next().Current()
	fmt.Printf("%U: %c", current, current)
}
Output:

U+006F: o

func (*Parser) Peek

func (p *Parser) Peek() *Cursor

Peek returns the next cursor without advancing the parser.

func (*Parser) SetConverter

func (p *Parser) SetConverter(c func(i interface{}) interface{})

SetConverter allows you to add additional (prioritized) converters to the parser. e.g. convert aliases to other types or overwrite defaults.

func (*Parser) SetOperator

func (p *Parser) SetOperator(o func(i interface{}) (*Cursor, error))

SetOperator allows you to support additional (prioritized) operators. Should return an UnsupportedType error if the given value is not supported.

func (*Parser) Slice

func (p *Parser) Slice(start *Cursor, end *Cursor) string

Slice returns the value in between the two given cursors [start:end]. The end value is inclusive!

type UnsupportedType

type UnsupportedType struct {
	Value interface{}
}

UnsupportedType indicates the type of the value is unsupported.

func (*UnsupportedType) Error

func (e *UnsupportedType) Error() string

Directories

Path Synopsis
examples
elf

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL