kowhai

package module
v0.0.0-...-5b6ea31 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2015 License: MIT Imports: 2 Imported by: 5

README

Kowhai

Kowhai is an Earley-style parser loosely based on the published MARPA algorithm. Currently it is in an experimental state but should handle all LR grammars.

A kowhai.Grammar is used to build up a ruleset using a API that maps well to typical BNF constructs. Once a top-level start rule is identified, it can build a split-epsilon finite state machine.

A kowhai.Parser, given a stream of tokens (perhaps from a lexer) and a state machine, will produce a parse tree that can then be used to build an astract syntax tree.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DumpTreeNode

func DumpTreeNode(parseNode *ParseTreeNode, depth int)

Types

type AhfaCompletion

type AhfaCompletion struct {
	// contains filtered or unexported fields
}

this will hopefully become a SPPF node

type AhfaMachine

type AhfaMachine struct {
	// contains filtered or unexported fields
}

the state machine

func (*AhfaMachine) AcceptedState

func (ah *AhfaMachine) AcceptedState(state int) bool

func (*AhfaMachine) Completed

func (ah *AhfaMachine) Completed(state int) (rules []Term)

func (*AhfaMachine) Goto

func (machine *AhfaMachine) Goto(state int, symbol Term) int

transition to the next state

func (*AhfaMachine) IndexOf

func (machine *AhfaMachine) IndexOf(state AhfaState) int

used during building of machine to deduplicate states

func (*AhfaMachine) String

func (machine *AhfaMachine) String() string

gives us a way to dump and inspect the machine

type AhfaRule

type AhfaRule struct {
	// contains filtered or unexported fields
}

rule as fit for usage in the state machine

func (*AhfaRule) AsTerm

func (r *AhfaRule) AsTerm() Term

convert to a term for convenience

func (*AhfaRule) IsCompleted

func (rule *AhfaRule) IsCompleted() bool

func (*AhfaRule) String

func (rule *AhfaRule) String() string

type AhfaState

type AhfaState []*AhfaRule

group of dotted rules

func (AhfaState) Contains

func (state AhfaState) Contains(rule *AhfaRule) bool

check to see if a rule is present in a state

type EarleyItem

type EarleyItem struct {
	// contains filtered or unexported fields
}

Tracks AFHA state and parent state should parent be a pointer to the parent EI? symbol is used when memoizing leo items

func (EarleyItem) String

func (item EarleyItem) String() string

type EarleyItemSet

type EarleyItemSet struct {
	// contains filtered or unexported fields
}

func (*EarleyItemSet) AddItem

func (set *EarleyItemSet) AddItem(state int, parent int, parseNode *SppfNode)

add an item to the EIS

func (*EarleyItemSet) String

func (set *EarleyItemSet) String() string

type Grammar

type Grammar struct {
	// contains filtered or unexported fields
}

A grammar is a set of rules that the parser should recognize The usage is create a grammar, add some rules, set a start rule (top-level definition) generate a state machine for the parser

func CreateGrammar

func CreateGrammar() *Grammar

func (*Grammar) BuildStateMachine

func (g *Grammar) BuildStateMachine() *AhfaMachine

the machine can be used by many parses

func (*Grammar) CreateRule

func (g *Grammar) CreateRule(name string, terms ...Term)

this creates a rule OR adds a production to a new rule this is a convenience behaviour since it can make it much easier to transcribe EBNF

func (*Grammar) DumpRules

func (g *Grammar) DumpRules()

func (*Grammar) GetStartRule

func (g *Grammar) GetStartRule() *Rule

func (*Grammar) Lookup

func (g *Grammar) Lookup(name string) (out *Rule)

Lookup a rule this creates the rule if it doesn't exist (makes forward declarations possible) TODO: validate all rules have productions at build time

func (*Grammar) Optional

func (g *Grammar) Optional(terms ...Term) (out Term)

Expand a ? operator (zero-or-one)

func (*Grammar) Or

func (g *Grammar) Or(terms ...Term) (out Term)

Create an OR rule where each term is a different branch

func (*Grammar) Plus

func (g *Grammar) Plus(terms ...Term) (out Term)

Expand a + operator (one-or-more)

func (*Grammar) SetStart

func (g *Grammar) SetStart(name string)

func (*Grammar) Star

func (g *Grammar) Star(terms ...Term) (out Term)

Expand a * operator (zero-or-more)

func (*Grammar) Symbol

func (g *Grammar) Symbol(name string) Term

Create a symbol (which matches a token value)

func (*Grammar) Type

func (g *Grammar) Type(tokenType int) (out Term)

Create a TypedTerm (which matches on token type)

func (*Grammar) UnionSingleTerms

func (g *Grammar) UnionSingleTerms(name string, terms ...Term)

handle common right-hand side of "t1 | t2 | t3" same as top-level OR

type LiteralToken

type LiteralToken string

Simple token type that can be delivered to the parser

func (LiteralToken) AsValue

func (l LiteralToken) AsValue() string

func (LiteralToken) TokenType

func (l LiteralToken) TokenType() int

type MarpaParser

type MarpaParser struct {
	// contains filtered or unexported fields
}

func CreateParser

func CreateParser(machine *AhfaMachine) MarpaParser

create a new parser that uses the machine

func (*MarpaParser) AddOptimization

func (parser *MarpaParser) AddOptimization(o ParseTreeOptimizer)

func (*MarpaParser) BuildParseTree

func (parser *MarpaParser) BuildParseTree() *ParseTreeNode

func (*MarpaParser) DumpMachine

func (parser *MarpaParser) DumpMachine()

dump the Earley sets for inspection

func (*MarpaParser) DumpTable

func (parser *MarpaParser) DumpTable()

dump the Earley sets for inspection

func (*MarpaParser) MakeParseNode

func (parser *MarpaParser) MakeParseNode(rule Term, origin int, location int, w *SppfNode, v *SppfNode, nodes map[string]*SppfNode) (y *SppfNode)

func (*MarpaParser) OptimizeParseTree

func (parser *MarpaParser) OptimizeParseTree(tree *ParseTreeNode) *ParseTreeNode

run any optimizations over the tree

func (*MarpaParser) PrintAcceptedTree

func (parser *MarpaParser) PrintAcceptedTree() bool

placeholder function where we can look at the parse tree once we are building one!

func (*MarpaParser) PrintCNodes

func (parser *MarpaParser) PrintCNodes()

func (*MarpaParser) ScanToken

func (parser *MarpaParser) ScanToken(token Token) (err error)

this handles the next token delivered by the lexer

type OverlappingChildren

type OverlappingChildren struct {
}

This optimization handles child nodes that overlap (due to partial parse trees) Generally applying this optimization makes the resulting parse tree easier to understand

func (OverlappingChildren) Postprocess

func (o OverlappingChildren) Postprocess(node *ParseTreeNode) *ParseTreeNode

func (OverlappingChildren) Preprocess

func (o OverlappingChildren) Preprocess(node *ParseTreeNode) *ParseTreeNode

type ParseNodes

type ParseNodes []AhfaCompletion

used to implement sort interface

func (ParseNodes) Len

func (x ParseNodes) Len() int

func (ParseNodes) Less

func (x ParseNodes) Less(i, j int) bool

func (ParseNodes) Swap

func (x ParseNodes) Swap(i, j int)

type ParseTreeNode

type ParseTreeNode struct {
	Term     Term
	Parent   *ParseTreeNode
	Children []*ParseTreeNode
	// contains filtered or unexported fields
}

func (*ParseTreeNode) Overlaps

func (node *ParseTreeNode) Overlaps(other *ParseTreeNode) bool

type ParseTreeOptimizer

type ParseTreeOptimizer interface {
	Preprocess(node *ParseTreeNode) *ParseTreeNode
	Postprocess(node *ParseTreeNode) *ParseTreeNode
}

type ParseTreeToken

type ParseTreeToken struct {
	Token Token
}

func (*ParseTreeToken) IsRule

func (t *ParseTreeToken) IsRule() bool

func (*ParseTreeToken) MatchesToken

func (t *ParseTreeToken) MatchesToken(token Token) bool

func (*ParseTreeToken) String

func (t *ParseTreeToken) String() string

type Production

type Production []Term

Production is the set of terms that make up the right side of a rule

type RemoveSyntheticNodes

type RemoveSyntheticNodes struct {
}

The optimization removes parse nodes thar derive from rules that were created by the grammar operators (+, ?, *, |)

func (RemoveSyntheticNodes) GetNaturalChildren

func (o RemoveSyntheticNodes) GetNaturalChildren(node *ParseTreeNode) (children []*ParseTreeNode)

func (RemoveSyntheticNodes) Postprocess

func (o RemoveSyntheticNodes) Postprocess(node *ParseTreeNode) *ParseTreeNode

func (RemoveSyntheticNodes) Preprocess

func (o RemoveSyntheticNodes) Preprocess(node *ParseTreeNode) *ParseTreeNode

type Rule

type Rule struct {
	Productions []Production
	// contains filtered or unexported fields
}

Name is the left hand side Productions are the various right hand sides

func (*Rule) Add

func (r *Rule) Add(s Production)

func (*Rule) IsAllowedChild

func (r *Rule) IsAllowedChild(term Term) bool

this is used when building parse trees to prune out alternative shorter parse possibilities

func (*Rule) IsRightRecursive

func (r *Rule) IsRightRecursive() bool

func (*Rule) IsRule

func (r *Rule) IsRule() bool

func (*Rule) MatchesToken

func (r *Rule) MatchesToken(token Token) bool

func (*Rule) String

func (r *Rule) String() string

type SppfNode

type SppfNode struct {
	// contains filtered or unexported fields
}

func (*SppfNode) Label

func (s *SppfNode) Label() string

func (*SppfNode) String

func (s *SppfNode) String() string

type SppfTerm

type SppfTerm string

Used only in building the parse tree will need to refactor away later

func (SppfTerm) IsRule

func (l SppfTerm) IsRule() bool

func (SppfTerm) MatchesToken

func (l SppfTerm) MatchesToken(token Token) bool

func (SppfTerm) String

func (l SppfTerm) String() string

type Symbol

type Symbol string

Symbol looks at a token and matches the value Generally used for keywords and punctuation referenced directly in rule definitions.

func (Symbol) IsRule

func (s Symbol) IsRule() bool

func (Symbol) MatchesToken

func (s Symbol) MatchesToken(token Token) bool

type Term

type Term interface {
	IsRule() bool
	MatchesToken(token Token) bool
}

A term is anything that can appear on the RHS of a rule Here we define Symbol (for literal terminals that appear in a rule definition), Rule (for non-terminals), TypedTerm (for matching type of token produced by a lexer)

type Token

type Token interface {
	AsValue() string //exposes a string value for symbol matches
	TokenType() int  // exposes a token type for a type match
}

A trivial interface for the tokens comsumed by the parser

type TrimWhenSingle

type TrimWhenSingle struct {
	AppliesTo []string
}

This optimization can be used to remove intermediate nodes that have only a single child.

func (TrimWhenSingle) Postprocess

func (o TrimWhenSingle) Postprocess(node *ParseTreeNode) *ParseTreeNode

func (TrimWhenSingle) Preprocess

func (o TrimWhenSingle) Preprocess(node *ParseTreeNode) *ParseTreeNode

type TypedTerm

type TypedTerm int

A TypedTerm looks at the token type for a match Generally used for recognizing things produced by a lexer such as an int literal, string literal, etc.

func (TypedTerm) IsRule

func (t TypedTerm) IsRule() bool

func (TypedTerm) MatchesToken

func (t TypedTerm) MatchesToken(token Token) bool

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL