parser

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2019 License: BSD-3-Clause Imports: 5 Imported by: 0

README

Coverage Status GoReportCard GoDoc

A Universal Dynamic LL(*) Parser
written in and for the Go programming language.

romshark/llparser is a dynamic recursive-descent top-down parser which parses any given input stream by trying to recursively match the root rule. It's universal in that it supports any LL(*) grammar.

This library allows building parsers in Go with relatively good error messages and flexible, even dynamic LL(*) grammars which may mutate at runtime. It parses the input stream into a typed parse-tree and allows action hooks to be executed when a particular rule is matched.

Getting Started

Rules

A grammar always begins with a root rule. A rule is a non-terminal symbol. Non-terminals are nodes of the parse-tree that consist of other non-terminals or terminals while terminals are leaf-nodes. A rule consists of a Designation, a Pattern, a Kind and an Action:

mainRule := &llparser.Rule{
	Designation: "name of the rule",
	Kind: 100,
	Pattern: llparser.Exact{
		Kind:        101,
		Expectation: []rune("string"),
	},
	Action: func(f llparser..Fragment) error {
		log.Print("the rule was successfuly matched!")
		return nil
	},
}
  • Designation defines the optional logical name of the rule and is used for debugging and error reporting purposes.
  • Kind defines the type identifier of the rule. If this field isn't set then zero (untyped) is used by default.
  • Pattern defines the expected pattern of the rule. This field is required.
  • Action defines the optional callback which is executed when this rule is matched. The action callback may return an error which will make the parser stop and fail immediately.

Rules can be nested:

ruleTwo := &llparser.Rule{
	Pattern: llparser.Exact{
		Kind:        101,
		Expectation: []rune("string"),
	},
}

ruleOne := &llparser.Rule{
	Pattern: ruleTwo,
}

Rules can also recurse:

rule := &llparser.Rule{Kind: 1}
rule.Pattern = llparser.Sequence{
	llparser.Exact{Expectation: "="},
	llparser.Optional{Pattern: rule}, // potential recursion
}

Terminals

Pattern: Exact

Exact expects a particular sequence of characters to be lexed:

Pattern: llparser.Exact{
	Kind:        SomeKindConstant,
	Expectation: []rune("some string"),
},
Pattern: Lexed

Lexed tries to lex an arbitrary sequence of characters according to Fn:

Pattern: llparser.Lexed{
	Designation: "some lexed terminal",
	Kind:        SomeKindConstant,
	Fn: func(crs llparser.Cursor) uint {
		if crs.File.Src[crs.Index] == '|' {
			return 0
		}
		return 1
	},
},

Fn returns either 0 for ending the sequence, 1 for advancing for 1 rune or any positive integer n to advance for n runes.

Combinators

Pattern: Optional

Optional tries to match a specific pattern but doesn't expect it to be matched:

Pattern: llparser.Optional{
	Pattern: somePattern,
},
Pattern: Sequence

Sequence expects a specific sequence of patterns:

Pattern: llparser.Sequence{
	somePattern,
	llparser.Term(SomeKindConstant),
	llparser.Optional{
		Pattern: llparser.Exact{
			Kind:        SomeKindConstant,
			Expectation: "foo",
		},
	},
},
Pattern: ZeroOrMore

ZeroOrMore tries to match one or many instances of a specific pattern but doesn't expect it to be matched:

Pattern: llparser.ZeroOrMore{
	Pattern: somePattern,
},
Pattern: OneOrMore

OneOrMore expects at least one or many instances of a specific pattern:

Pattern: llparser.OneOrMore{
	Pattern: somePattern,
},
Pattern: Either

Either expects either of the given patterns selecting the first match:

Pattern: llparser.Either{
	somePattern,
	anotherPattern,
},

The Parse-Tree

A parse-tree defines the serialized representation of the parsed input stream and consists of Fragment interfaces represented by the main fragment returned by llparser.Parse. A fragment is a typed chunk of the source code pointing to a start and end position in the source file, defining the kind of the chunk and referring to its child-fragments.

Error-Handling

Normally, when the parser fails to match the provided grammar it returns an ErrUnexpectedToken error which is rather generic and doesn't reflect the actual mistake with a comprehensive error message. To improve the quality of the returned error messages an error-rule can be provided which the parser tries to match at the position of an unexpected token. If the error-rule is matched successfully the error returned by the matched Action callback is returned. If the error-rule doesn't match then the default ErrUnexpectedToken error is returned as usual.

grammar := &parser.Rule{
	Pattern: parser.Sequence{
		parser.Exact{Expectation: []rune("foo")},
		parser.Exact{Expectation: []rune("...")},
	},
}

errRule := &parser.Rule{
	Pattern: parser.Either{
		parser.OneOrMore{
			Pattern: parser.Exact{Expectation: []rune(";")},
		},
		parser.OneOrMore{
			Pattern: parser.Exact{Expectation: []rune(".")},
		},
	},
	Action: func(fr parser.Fragment) error {
		// Return a convenient error message instead of a generic one
		return fmt.Errorf("expected 3 dots, got %d", len(fr.Src()))
	},
}

mainFrag, err := pr.Parse(src, grammar, errRule)
if err != nil {
	log.Fatal("Parser error: ", err)
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func PrintFragment

func PrintFragment(
	fragment Fragment,
	out io.Writer,
	indent string,
) (bytesWritten int, err error)

PrintFragment prints the fragment structure recursively to the given output stream

Types

type Action

type Action func(Fragment) error

Action represents a callback function that's called when a certain fragment is matched

type Construct

type Construct struct {
	*Token
	VElements []Fragment
}

Construct represents a composite fragment

func (*Construct) Elements

func (ct *Construct) Elements() []Fragment

Elements returns the element fragments of the construct fragment

type Cursor

type Cursor struct {
	Index  uint
	Column uint
	Line   uint
	File   *SourceFile
}

Cursor represents a source-code location

func NewCursor

func NewCursor(file *SourceFile) Cursor

NewCursor creates a new cursor location based on the given source file

func (Cursor) String

func (c Cursor) String() string

String stringifies the cursor

type Either

type Either []Pattern

Either represents either of the arbitrary patterns

func (Either) Container

func (Either) Container() bool

Container implements the Pattern interface

func (Either) Desig

func (eth Either) Desig() string

Desig implements the Pattern interface

func (Either) TerminalPattern

func (Either) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Err

type Err struct {
	Err error
	At  Cursor
}

Err represents a generic parser error

func (*Err) Error

func (err *Err) Error() string

type ErrUnexpectedToken

type ErrUnexpectedToken struct {
	At       Cursor
	Expected Pattern
}

ErrUnexpectedToken represents a parser error

func (*ErrUnexpectedToken) Error

func (err *ErrUnexpectedToken) Error() string

type Exact added in v0.3.0

type Exact struct {
	Kind        FragmentKind
	Expectation []rune
}

Exact represents an exact terminal token pattern

func (Exact) Container added in v0.3.0

func (Exact) Container() bool

Container implements the Pattern interface

func (Exact) Desig added in v0.3.0

func (tm Exact) Desig() string

Desig implements the Pattern interface

func (Exact) TerminalPattern added in v0.3.0

func (Exact) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Fragment

type Fragment interface {
	Kind() FragmentKind
	Begin() Cursor
	End() Cursor
	Src() []rune
	Elements() []Fragment
}

Fragment represents an abstract fragment

type FragmentKind

type FragmentKind int

FragmentKind represents a fragment kind identifier

type Lexed added in v0.2.0

type Lexed struct {
	Kind        FragmentKind
	Designation string
	Fn          func(Cursor) uint
}

Lexed represents a lexed pattern

func (Lexed) Container added in v0.2.0

func (Lexed) Container() bool

Container implements the Pattern interface

func (Lexed) Desig added in v0.2.0

func (ck Lexed) Desig() string

Desig implements the Pattern interface

func (Lexed) TerminalPattern added in v0.2.0

func (Lexed) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type OneOrMore

type OneOrMore struct{ Pattern }

OneOrMore represents at least one arbitrary patterns

func (OneOrMore) Container

func (OneOrMore) Container() bool

Container implements the Pattern interface

func (OneOrMore) Desig

func (oom OneOrMore) Desig() string

Desig implements the Pattern interface

func (OneOrMore) TerminalPattern

func (oom OneOrMore) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Optional

type Optional struct{ Pattern }

Optional represents an arbitrary optional pattern

func (Optional) Container

func (Optional) Container() bool

Container implements the Pattern interface

func (Optional) Desig

func (opt Optional) Desig() string

Desig implements the Pattern interface

func (Optional) TerminalPattern

func (opt Optional) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Parser

type Parser struct{}

Parser represents a parser

func NewParser

func NewParser() Parser

NewParser creates a new parser instance

func (Parser) Parse

func (pr Parser) Parse(
	source *SourceFile,
	rule *Rule,
	errRule *Rule,
) (Fragment, error)

Parse parses the given rule

type Pattern

type Pattern interface {
	// Container returns true for container patterns
	Container() bool

	// TerminalPattern returns the terminal pattern for composite patterns
	// and nil for non-terminal patterns
	TerminalPattern() Pattern

	// Desig returns the textual designation of the pattern
	Desig() string
}

Pattern represents an abstract pattern

type Rule

type Rule struct {
	Designation string
	Pattern     Pattern
	Kind        FragmentKind
	Action      Action
}

Rule represents a grammatic rule

func (*Rule) Container

func (*Rule) Container() bool

Container implements the Pattern interface

func (*Rule) Desig

func (rl *Rule) Desig() string

Desig implements the Pattern interface

func (*Rule) TerminalPattern

func (rl *Rule) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Sequence

type Sequence []Pattern

Sequence represents an exact sequence of arbitrary patterns

func (Sequence) Container

func (Sequence) Container() bool

Container implements the Pattern interface

func (Sequence) Desig

func (seq Sequence) Desig() string

Desig implements the Pattern interface

func (Sequence) TerminalPattern

func (Sequence) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type SourceFile

type SourceFile struct {
	Name string
	Src  []rune
}

SourceFile represents a source file

type Token

type Token struct {
	VBegin Cursor
	VEnd   Cursor
	VKind  FragmentKind
}

Token represents a terminal fragment

func (*Token) Begin

func (tk *Token) Begin() Cursor

Begin returns the beginning cursor of the token fragment

func (*Token) Elements

func (tk *Token) Elements() []Fragment

Elements always returns nil for terminal fragments

func (*Token) End

func (tk *Token) End() Cursor

End returns the ending cursor of the token fragment

func (*Token) Kind

func (tk *Token) Kind() FragmentKind

Kind returns the kind of the token fragment

func (*Token) Src

func (tk *Token) Src() []rune

Src returns the source code of the token fragment

func (*Token) String

func (tk *Token) String() string

String stringifies the token

type ZeroOrMore

type ZeroOrMore struct{ Pattern }

ZeroOrMore represents zero or more arbitrary patterns

func (ZeroOrMore) Container

func (ZeroOrMore) Container() bool

Container implements the Pattern interface

func (ZeroOrMore) Desig

func (zom ZeroOrMore) Desig() string

Desig implements the Pattern interface

func (ZeroOrMore) TerminalPattern

func (zom ZeroOrMore) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

Directories

Path Synopsis
examples
dicklang command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL