rd

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2018 License: MIT Imports: 3 Imported by: 5

README

rd godoc

rd is a small library to build hand-written recursive descent parsers. Besides exposing convenient methods to parse tokens it features automatic parse tree generation and flow tracing for debugging.

Recursive descent parsers can imitate their grammar quite well. For instance for the following grammar:

A → aB
B → b

(where A and B are non-terminals, and a and b are terminals), a recursive descent parser might look like:

func A() bool {
    if nextToken() == "a" {
        return B()
    }
    pushback("a")
    return false
}

func B() bool {
    if nextToken() == "b" {
        return true
    }
    pushback("b")
    return false
}

Parser for the same grammar written using rd:

import "github.com/shivamMg/rd"

func A(b *rd.Builder) (ok bool) {
    b.Enter("A")
    defer b.Exit(&ok)

    return b.Match("a") && B(b)
}

func B(b *rd.Builder) (ok bool) {
    defer b.Enter("B").Exit(&ok)

    return b.Match("b")
}

A builder object keeps track of the current token and exposes convenient methods to write a parser. Match, for instance, resets the original state in case of an unsuccessful match so there's no need for a manual pushback. As non-terminal functions are called, terminal matches, and calls to other non-terminal functions are done. A parse tree is maintained in the builder using these matches and calls:

In case of a successful match the terminal is added to parse tree under the current non-terminal (the one in which Match was called). Same goes in case of a non-terminal function call: the non-terminal, if it exits successfully, is added to parse tree under the current non-terminal. You can imagine this process being repeated recursively.

Argument to Enter is what is added to parse tree as a symbol for the non-terminal. Argument to Exit determines if function call was successful or not.

ParseTree method returns the parse tree which is, well, a tree data structure. It can be pretty-printed.

tokens := []rd.Token{"a", "b"}
b := rd.NewBuilder(tokens)
if ok := A(b); ok {
    fmt.Print(b.ParseTree())
}

The above snippet will print:

A
├─ a
└─ B
   └─ b

A debug tree is also maintained which, unlike the parse tree, contains all matches and calls (not just the successful ones). It's helpful if you want to debug a parsing failure.

fmt.Print(b.DebugTree())

The above snippet will print:

A(true)
├─ a = a
└─ B(true)
   └─ b = b

Parsing errors can be retrieved using Err method. For tokens := []rd.Token{"a", "c"} the following statements:

fmt.Println(b.Err())
fmt.Print(b.DebugTree())

will print:

parsing error
A(false)
├─ a = a
└─ B(false)
   └─ c ≠ b

Examples

Arithmetic expression parser
go get github.com/shivamMg/rd/examples/arithmetic   # requires go modules support (go1.11+)
arithmetic -expr='3.14*4*(6/3)'  # hopefully $GOPATH/bin is in $PATH
arithmetic -expr='3.14*4*(6/3)' -backtrackingparser

Parser and grammar for it can be found inside examples/arithmetic/parser. There's another parser written for a different grammar that also parses arithmetic expressions. This parser can be found inside examples/arithmetic/backtrackingparser. It uses backtracking - notice the use of b.Backtrack(). This example uses chroma for lexing.

PL/0 programming language parser
go get github.com/shivamMg/rd/examples/pl0
cd examples/pl0/
pl0 square.pl0
pl0 multiply.pl0
pl0 prime.pl0

Parser and grammar can be found inside examples/pl0/parser. Grammar has been taken from en.wikipedia.org/wiki/PL/0#Grammar. It also uses chroma for lexing.

Domain name parser
go get github.com/shivamMg/rd/examples/domainname
domainname www.google.co.uk

Grammar has been taken from www.ietf.org/rfc/rfc1035.txt. Its lexer is hand-written.

Licence

MIT

Contribute

Contribute through bug fixes, improvements, new examples, etc. Lucky PR submitters get to walk home with a brand new CRT and an Audi 1987.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Builder

type Builder struct {
	// contains filtered or unexported fields
}

Builder stores details about tokens, index to current token, etc. and provides methods to build recursive descent parsers conveniently. It keeps a track of entry/exit from non-terminal functions, and terminal matches done inside them. Results from non-terminal function calls help create the parse tree. A debug tree is also created to help trace flow across non-terminal functions.

func NewBuilder

func NewBuilder(tokens []Token) *Builder

NewBuilder returns a new Builder for the tokens.

func (*Builder) Add

func (b *Builder) Add(token Token)

Add adds token as a symbol in the parse tree. It's added under the current non-terminal subtree.

func (*Builder) Backtrack

func (b *Builder) Backtrack()

Backtrack resets the current index for the non-terminal function it's called inside, and sets it to the value it was before entering this function. It also discards any matches done inside the function.

func (*Builder) Check

func (b *Builder) Check(token Token, i int) bool

Check is a convenience function over Peek. It calls Peek to check if returned token is same as token, and returned ok is true.

func (*Builder) CheckOrNotOK

func (b *Builder) CheckOrNotOK(token Token, i int) bool

CheckOrNotOK is a convenience function over Peek. It calls Peek to check if returned token is same as token, or returned ok is false.

func (*Builder) DebugTree

func (b *Builder) DebugTree() *DebugTree

DebugTree returns the debug tree which includes all matches and non-matches, and non-terminal results (displayed in parentheses) captured throughout parsing. It helps in tracing the parsing flow. It's set after the root non-terminal exits. Returns nil otherwise.

func (*Builder) Enter

func (b *Builder) Enter(nonTerm interface{}) *Builder

Enter adds non-terminal to the parse tree making it the current non-terminal. Subsequent terminal matches and calls to non-terminal functions add symbols under this non-terminal.

Enter should be called right after entering the non-terminal function.

func (*Builder) Err

func (b *Builder) Err() *ParsingError

Err returns the parsing error. It's set after the root non-terminal exits with a false result. Returns nil otherwise.

func (*Builder) Exit

func (b *Builder) Exit(result *bool)

Exit registers exit from a non-terminal function. result indicates if it had a successful production or not. result must not be nil. In case of a false result or a call to Skip, the current index is reset to where it was before entering the non-terminal. In case of a true result:

  1. Parse tree is set (see ParseTree) if the current non-terminal was root.
  2. Else it's added as a subtree to its parent non-terminal.

The convenient way to call Exit is by using a named boolean return for the non-terminal function, and passing it's address to a deferred Exit.

func (*Builder) Match

func (b *Builder) Match(token Token) (ok bool)

Match matches the next token to token. In case of a non-match the current index is decremented to its original value. ok is false in case of non-match or if no tokens are left, else true for a match.

Internally Match calls Next to grab the next token. In case of a match it adds it by calling Add. Debug info is also added to the debug tree.

func (*Builder) Next

func (b *Builder) Next() (token Token, ok bool)

Next increments the current index to return the next token. ok is false if no tokens are left, else true.

func (*Builder) ParseTree

func (b *Builder) ParseTree() *Tree

ParseTree returns the parse tree. It's set after the root non-terminal exits with true result. Returns nil otherwise.

func (*Builder) Peek

func (b *Builder) Peek(i int) (token Token, ok bool)

Peek returns the ith token without updating the current index. i must be relative to the current index.

ex. if current index points to tkn3:

tokens:           tkn1 tkn2 tkn3 tkn4 tkn5
original indexes:  0    1    2    3    4
relative indexes: -2   -1    0    1    2

you can use:

Peek(-2) to get tkn1,
Peek(-1) to get tkn2,
Peek(1) to get tkn4,
Peek(2) to get tkn5.

ok is false if i lies outside original index range, else true.

func (*Builder) Skip

func (b *Builder) Skip()

Skip removes the current non-terminal from the parse tree regardless of the exit result. It's helpful in case of null productions - where non-terminals don't contribute to the parse tree.

type DebugTree

type DebugTree struct {
	// contains filtered or unexported fields
}

DebugTree is a debug tree node. Can be printed to help tracing the parsing flow.

func (*DebugTree) Children

func (dt *DebugTree) Children() (c []tree.Node)

func (*DebugTree) Data

func (dt *DebugTree) Data() interface{}

func (*DebugTree) String

func (dt *DebugTree) String() string

type ParsingError

type ParsingError struct {
	// contains filtered or unexported fields
}

ParsingError is error returned by Builder's Err method in case an error occurs during parsing.

func (*ParsingError) Error

func (e *ParsingError) Error() string

type Token

type Token interface{}

Token represents a token received after tokenization.

type Tree

type Tree struct {
	Symbol   interface{}
	Subtrees []*Tree
}

Tree is a parse tree node. Symbol can either be a terminal (Token) or a non-terminal (see Builder's Enter method). Tokens matched using Builder's Match method or added using Builder's Add method, can be retrieved by type asserting Symbol. Subtrees are child nodes of the current node.

func NewTree

func NewTree(symbol interface{}, subtrees ...*Tree) *Tree

func (*Tree) Add

func (t *Tree) Add(subtree *Tree)

Add adds a subtree as a child to t.

func (*Tree) Children

func (t *Tree) Children() (c []tree.Node)

func (*Tree) Data

func (t *Tree) Data() interface{}

func (*Tree) Detach

func (t *Tree) Detach(subtree *Tree)

Detach removes a subtree as a child of t.

func (*Tree) String

func (t *Tree) String() string

Directories

Path Synopsis
examples
arithmetic Module
pl0 Module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL