parser

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 12, 2019 License: BSD-3-Clause Imports: 5 Imported by: 0

README

Coverage Status GoReportCard GoDoc

A Universal Dynamic LL(*) Parser
written in and for the Go programming language.

romshark/llparser is a dynamic recursive-descent top-down parser which parses any given input stream by trying to recursively match the root rule. It's universal in that it supports any LL(*) grammar.

This library allows building parsers in Go with relatively good error messages and flexible, even dynamic LL(*) grammars which may mutate at runtime. It parses the input stream into a typed parse-tree and allows action hooks to be executed when a particular rule is matched.

Getting Started

Rules

A grammar always begins with a root rule. A rule is a non-terminal symbol. Non-terminals are nodes of the parse-tree that consist of other non-terminals or terminals while terminals are leaf-nodes. A rule consists of a Designation, a Pattern, a Kind and an Action:

mainRule := &llparser.Rule{
	Designation: "name of the rule",
	Kind: 100,
	Pattern: llparser.TermExact{
		Kind:        101,
		Expectation: "string",
	},
	Action: func(f llparser..Fragment) error {
		log.Print("the rule was successfuly matched!")
		return nil
	},
}
  • Designation defines the optional logical name of the rule and is used for debugging and error reporting purposes.
  • Kind defines the type identifier of the rule. If this field isn't set then zero (untyped) is used by default.
  • Pattern defines the expected pattern of the rule. This field is required.
  • Action defines the optional callback which is executed when this rule is matched. The action callback may return an error which will make the parser stop and fail immediately.

Rules can be nested:

ruleTwo := &llparser.Rule{
	Pattern: llparser.TermExact{
		Kind:        101,
		Expectation: "string",
	},
}

ruleOne := &llparser.Rule{
	Pattern: ruleTwo,
}

Rules can also recurse:

rule := &llparser.Rule{Kind: 1}
rule.Pattern = llparser.Sequence{
	llparser.TermExact{Expectation: "="},
	llparser.Optional{Pattern: rule}, // potential recursion
}

Terminals

Pattern: Term

Term expects a particular fragment kind to be lexed:

Pattern: llparser.Term(SomeKindConstant),
Pattern: TermExact

TermExact expects a particular sequence of characters to be lexed:

Pattern: llparser.TermExact{
	Kind:        SomeKindConstant,
	Expectation: "some string",
},
Pattern: Checked

Checked expects the lexed fragment to pass an arbitrary user-defined validation function:

Pattern: llparser.Checked{
	Designation: "some checked terminal",
	Fn: func(str string) bool { return len(str) > 5 },
},

Combinators

Pattern: Optional

Optional tries to match a specific pattern but doesn't expect it to be matched:

Pattern: llparser.Optional{
	Pattern: somePattern,
},
Pattern: Sequence

Sequence expects a specific sequence of patterns:

Pattern: llparser.Sequence{
	somePattern,
	llparser.Term(SomeKindConstant),
	llparser.Optional{
		Pattern: llparser.TermExact{
			Kind:        SomeKindConstant,
			Expectation: "foo",
		},
	},
},
Pattern: ZeroOrMore

ZeroOrMore tries to match one or many instances of a specific pattern but doesn't expect it to be matched:

Pattern: llparser.ZeroOrMore{
	Pattern: somePattern,
},
Pattern: OneOrMore

OneOrMore expects at least one or many instances of a specific pattern:

Pattern: llparser.OneOrMore{
	Pattern: somePattern,
},
Pattern: Either

Either expects either of the given patterns selecting the first match:

Pattern: llparser.Either{
	somePattern,
	anotherPattern,
},

Lexer

The lexer is an abstract part of the parser which tokenizes the input stream:

type Lexer interface {
	Read() (*Token, error)
	ReadExact(
		expectation string,
		kind FragmentKind,
	) (
		token *Token,
		matched bool,
		err error,
	)
	Position() Cursor
	Set(Cursor)
}

A default lexer implementation is available out-of-the-box but sometimes implementing your own parser makes more sense. Some examples for when a custom lexer might be useful:

  • Sometimes you don't care how many white-spaces, tabs and line-breaks there are between the patterns you care about and thus it doesn't make any sense to make each individual space character a terminal leaf-node, instead the lexer would read a sequence of whitespaces, tabs and line-breaks as a single typed terminal node (in fact, this is the behavior of the default lexer implementation linked aboved, it will treat these kinds of sequences as misc.FrSpace) reducing the complexity of the resulting parse-tree.
  • If you want to disallow certain kinds of runes in the source code you can make the custom lexer implementation return an ErrUnexpectedToken error when approaching one.

The Parse-Tree

A parse-tree defines the serialized representation of the parsed input stream and consists of Fragment interfaces represented by the main fragment returned by llparser.Parse. A fragment is a typed chunk of the source code pointing to a start and end position in the source file, defining the kind of the chunk and referring to its child-fragments.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func PrintFragment

func PrintFragment(
	fragment Fragment,
	out io.Writer,
	indent string,
) (bytesWritten int, err error)

PrintFragment prints the fragment structure recursively to the given output stream

Types

type Action

type Action func(Fragment) error

Action represents a callback function that's called when a certain fragment is matched

type Checked

type Checked struct {
	Designation string
	Fn          func(string) bool
}

Checked represents an arbitrary terminal token pattern matched by a function

func (Checked) Container

func (Checked) Container() bool

Container implements the Pattern interface

func (Checked) Desig

func (ck Checked) Desig() string

Desig implements the Pattern interface

func (Checked) TerminalPattern

func (Checked) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Construct

type Construct struct {
	*Token
	VElements []Fragment
}

Construct represents a composite fragment

func (*Construct) Elements

func (ct *Construct) Elements() []Fragment

Elements returns the element fragments of the construct fragment

type Cursor

type Cursor struct {
	Index  uint
	Column uint
	Line   uint
	File   *SourceFile
}

Cursor represents a source-code location

func NewCursor

func NewCursor(file *SourceFile) Cursor

NewCursor creates a new cursor location based on the given source file

func (Cursor) String

func (c Cursor) String() string

String stringifies the cursor

type Either

type Either []Pattern

Either represents either of the arbitrary patterns

func (Either) Container

func (Either) Container() bool

Container implements the Pattern interface

func (Either) Desig

func (eth Either) Desig() string

Desig implements the Pattern interface

func (Either) TerminalPattern

func (Either) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Err

type Err struct {
	Err error
	At  Cursor
}

Err represents a generic parser error

func (*Err) Error

func (err *Err) Error() string

type ErrUnexpectedToken

type ErrUnexpectedToken struct {
	At       Cursor
	Expected Pattern
	Actual   *Token
}

ErrUnexpectedToken represents a parser error

func (*ErrUnexpectedToken) Error

func (err *ErrUnexpectedToken) Error() string

type Fragment

type Fragment interface {
	Kind() FragmentKind
	Begin() Cursor
	End() Cursor
	Src() string
	Elements() []Fragment
}

Fragment represents an abstract fragment

type FragmentKind

type FragmentKind int

FragmentKind represents a fragment kind identifier

type Lexer

type Lexer interface {
	Read() (*Token, error)
	ReadExact(
		expectation string,
		kind FragmentKind,
	) (
		token *Token,
		matched bool,
		err error,
	)
	Position() Cursor
	Set(Cursor)
}

Lexer defines the interface of an abstract lexer implementation

type OneOrMore

type OneOrMore struct{ Pattern }

OneOrMore represents at least one arbitrary patterns

func (OneOrMore) Container

func (OneOrMore) Container() bool

Container implements the Pattern interface

func (OneOrMore) Desig

func (oom OneOrMore) Desig() string

Desig implements the Pattern interface

func (OneOrMore) TerminalPattern

func (oom OneOrMore) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Optional

type Optional struct{ Pattern }

Optional represents an arbitrary optional pattern

func (Optional) Container

func (Optional) Container() bool

Container implements the Pattern interface

func (Optional) Desig

func (opt Optional) Desig() string

Desig implements the Pattern interface

func (Optional) TerminalPattern

func (opt Optional) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Parser

type Parser struct{}

Parser represents a parser

func NewParser

func NewParser() Parser

NewParser creates a new parser instance

func (Parser) Parse

func (pr Parser) Parse(lexer Lexer, rule *Rule) (Fragment, error)

Parse parses the given rule

type Pattern

type Pattern interface {
	// Container returns true for container patterns
	Container() bool

	// TerminalPattern returns the terminal pattern for composite patterns
	// and nil for non-terminal patterns
	TerminalPattern() Pattern

	// Desig returns the textual designation of the pattern
	Desig() string
}

Pattern represents an abstract pattern

type Rule

type Rule struct {
	Designation string
	Pattern     Pattern
	Kind        FragmentKind
	Action      Action
}

Rule represents a grammatic rule

func (*Rule) Container

func (*Rule) Container() bool

Container implements the Pattern interface

func (*Rule) Desig

func (rl *Rule) Desig() string

Desig implements the Pattern interface

func (*Rule) TerminalPattern

func (rl *Rule) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Scanner

type Scanner struct {
	Lexer   Lexer
	Records []Fragment
}

Scanner represents a sequence-recording lexing source-code scanner

func NewScanner

func NewScanner(lexer Lexer) *Scanner

NewScanner creates a new scanner instance

func (*Scanner) Append

func (sc *Scanner) Append(
	pattern Pattern,
	fragment Fragment,
)

Append appends a fragment to the records

func (*Scanner) Fragment

func (sc *Scanner) Fragment(kind FragmentKind) Fragment

Fragment returns a typed composite fragment

func (*Scanner) New

func (sc *Scanner) New() *Scanner

New creates a new scanner succeeding the original one dropping its record history

func (*Scanner) Read

func (sc *Scanner) Read() (*Token, error)

Read advances the scanner by 1 token returning either the read fragment or an error if the lexer failed

func (*Scanner) ReadExact

func (sc *Scanner) ReadExact(
	expectation string,
	kind FragmentKind,
) (*Token, bool, error)

ReadExact advances the scanner by 1 exact token returning either the read fragment or nil if the expectation didn't match

func (*Scanner) Set

func (sc *Scanner) Set(cursor Cursor)

Set sets the scanner's lexer position and tidies up the record history

func (*Scanner) TidyUp

func (sc *Scanner) TidyUp() int

TidyUp removes all records after the current position

type Sequence

type Sequence []Pattern

Sequence represents an exact sequence of arbitrary patterns

func (Sequence) Container

func (Sequence) Container() bool

Container implements the Pattern interface

func (Sequence) Desig

func (seq Sequence) Desig() string

Desig implements the Pattern interface

func (Sequence) TerminalPattern

func (Sequence) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type SourceFile

type SourceFile struct {
	Name string
	Src  string
}

SourceFile represents a source file

type Term

type Term FragmentKind

Term represents a concrete terminal token pattern

func (Term) Container

func (Term) Container() bool

Container implements the Pattern interface

func (Term) Desig

func (tm Term) Desig() string

Desig implements the Pattern interface

func (Term) TerminalPattern

func (Term) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type TermExact

type TermExact struct {
	Kind        FragmentKind
	Expectation string
}

TermExact represents an exact terminal token pattern

func (TermExact) Container

func (TermExact) Container() bool

Container implements the Pattern interface

func (TermExact) Desig

func (tm TermExact) Desig() string

Desig implements the Pattern interface

func (TermExact) TerminalPattern

func (TermExact) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

type Token

type Token struct {
	VBegin Cursor
	VEnd   Cursor
	VKind  FragmentKind
}

Token represents a terminal fragment

func (*Token) Begin

func (tk *Token) Begin() Cursor

Begin returns the beginning cursor of the token fragment

func (*Token) Elements

func (tk *Token) Elements() []Fragment

Elements always returns nil for terminal fragments

func (*Token) End

func (tk *Token) End() Cursor

End returns the ending cursor of the token fragment

func (*Token) Kind

func (tk *Token) Kind() FragmentKind

Kind returns the kind of the token fragment

func (*Token) Src

func (tk *Token) Src() string

Src returns the source code of the token fragment

func (*Token) String

func (tk *Token) String() string

String stringifies the token

type ZeroOrMore

type ZeroOrMore struct{ Pattern }

ZeroOrMore represents zero or more arbitrary patterns

func (ZeroOrMore) Container

func (ZeroOrMore) Container() bool

Container implements the Pattern interface

func (ZeroOrMore) Desig

func (zom ZeroOrMore) Desig() string

Desig implements the Pattern interface

func (ZeroOrMore) TerminalPattern

func (zom ZeroOrMore) TerminalPattern() Pattern

TerminalPattern implements the Pattern interface

Directories

Path Synopsis
examples
dicklang command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL