Documentation
¶
Overview ¶
Package combinator provides data types and primitive constructs for building parser combinators. A parser combinator is a higher-order function that takes one or more parsers as input and produces a new composite parser as output.
A parser itself is a function that processes an input stream of characters and returns an output structure, such as an abstract syntax tree (AST), a finite automaton, or another representation of the parsed data.
Parser combinators enable a modular, top-down recursive descent parsing strategy. They allow complex parsers to be constructed from smaller, reusable components, making the parsing process easier to build, maintain, and test.
Top-down parsing involves constructing a parse tree for the input string, starting from the root node (representing the start symbol) and expanding the nodes in preorder. Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input string.
A recursive descent parser is a top-down parser constructed from mutually recursive procedures (or their non-recursive equivalents), where each procedure corresponds to a nonterminal in the grammar. This structure closely mirrors the grammar, making it intuitive and directly aligned with the rules it recognizes.
For more details on parsing theory, refer to "Compilers: Principles, Techniques, and Tools (2nd Edition)".
Example ¶
package main import ( "fmt" "github.com/moorara/algo/parser/combinator" ) // stringInput implements the combinator.Input interface for strings. type stringInput struct { pos int runes []rune } func newStringInput(s string) combinator.Input { return &stringInput{ pos: 0, runes: []rune(s), } } func (s *stringInput) Current() (rune, int) { return s.runes[0], s.pos } func (s *stringInput) Remaining() combinator.Input { if len(s.runes) == 1 { return nil } return &stringInput{ pos: s.pos + 1, runes: s.runes[1:], } } func main() { add := combinator.ExpectRune('+').Map(toAdd) // add → "+" digit := combinator.ExpectRuneInRange('0', '9').Map(toDigit) // digit → "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" num := digit.REP1().Map(toNum) // num → digit+ expr := num.CONCAT(add, num).Map(evalExpr) // expr → num "+" num in := newStringInput("27+69") if out, ok := expr(in); ok { n := out.Result.Val.(int) fmt.Println(n) } } func toAdd(r combinator.Result) (combinator.Result, bool) { return combinator.Result{ Val: nil, Pos: r.Pos, }, true } func toDigit(r combinator.Result) (combinator.Result, bool) { v := r.Val.(rune) digit := int(v - '0') return combinator.Result{ Val: digit, Pos: r.Pos, }, true } func toNum(r combinator.Result) (combinator.Result, bool) { l := r.Val.(combinator.List) var num int for _, r := range l { num = num*10 + r.Val.(int) } return combinator.Result{ Val: num, Pos: l[0].Pos, }, true } func evalExpr(r combinator.Result) (combinator.Result, bool) { r0, _ := r.Get(0) r2, _ := r.Get(2) n0 := r0.Val.(int) n2 := r2.Val.(int) return combinator.Result{ Val: int(n0 + n2), Pos: r0.Pos, }, true }
Index ¶
- type Bag
- type BagKey
- type BagVal
- type Binder
- type Empty
- type Input
- type List
- type Mapper
- type Output
- type Parser
- func (p Parser) ALT(q ...Parser) Parser
- func (p Parser) Bind(f Binder) Parser
- func (p Parser) CONCAT(q ...Parser) Parser
- func (p Parser) Flatten() Parser
- func (p Parser) Get(i int) Parser
- func (p Parser) Map(f Mapper) Parser
- func (p Parser) OPT() Parser
- func (p Parser) REP() Parser
- func (p Parser) REP1() Parser
- func (p Parser) Select(i ...int) Parser
- type Result
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Binder ¶
Binder is a function that receives a parsing result and returns a new parser. It is often used for modifying the behavior of previous parsers in the chain.
func ExcludeRunes ¶
ExcludeRunes can be bound on a rune parser to exclude certain runes.
type Input ¶
type Input interface { // Current returns the current rune from input along with its position in the input. Current() (rune, int) // Remaining returns the remaining of input. If no input left, it returns nil. Remaining() Input }
Input is the input to a parser function.
type Mapper ¶
Mapper is a function that receives a parsing result and returns a new result. The second return value determines whether or not the mapping was successful and the first value is valid.
type Parser ¶
Parser is the type for a function that receives a parsing input and returns a parsing output. The second return value determines whether or not the parsing was successful and the output is valid.
func ExpectRune ¶
ExpectRune creates a parser that returns a successful result only if the input starts with the given rune.
func ExpectRuneIn ¶
ExpectRuneIn creates a parser that returns a successful result only if the input starts with one of the given runes.
func ExpectRuneInRange ¶
ExpectRuneInRange creates a parser that returns a successful result only if the input starts with a rune in the given range.
func ExpectRunes ¶
ExpectRunes creates a parser that returns a successful result only if the input starts with the given runes in the given order.
func ExpectString ¶
ExpectString creates a parser that returns a successful result only if the input starts with the given string.
func (Parser) ALT ¶
ALT composes a parser that alternate parser p with a sequence of parsers. It applies parser p to the input and if it does not succeed, it applies the next parser to the same input, and continues parsing to the last parser. It stops at the first successful parsing and returns its result.
- EBNF Operator: Alternation
- EBNF Notation: p | q
func (Parser) Bind ¶
Bind composes a parser that uses parser p to parse the input and builds a second parser from the result of parsing. It then applies the new parser to the remaining input from the first parser. You can use this to implement syntax annotations.
func (Parser) CONCAT ¶
CONCAT composes a parser that concats parser p to a sequence of parsers. It applies parser p to the input, then applies the next parser to the remaining of the input, and continues parsing to the last parser.
- EBNF Operator: Concatenation
- EBNF Notation: p q
func (Parser) Flatten ¶
Flatten composes a parser that applies parser p to the input and flattens all results into a single list. This can be used for accessing the values of symbols in the right-side of a production rule more intuitively.
func (Parser) Get ¶
Get composes a parser that applies parser p to the input and returns the value of a symbol from the right-side of the production rule. This can be used after CONCAT, REP, REP1, Flatten, and/or Select. It will not have any effect if used after other operators and the result of parsing is not a list. If index is invalid, you will get the empty string ε.
func (Parser) Map ¶
Map composes a parser that uses parser p to parse the input and applies a mapper function to the result of parsing. If the parser does not succeed, the mapper function will not be applied.
func (Parser) OPT ¶
OPT composes a parser that applies parser p zero or one time to the input. If the parser does not succeed, it will return an empty result.
- EBNF Operator: Optional
- EBNF Notation: [ p ] or p?
func (Parser) REP ¶
REP composes a parser that applies parser p zero or more times to the input and accumulates the results. If the parser does not succeed, it will return an empty result.
- EBNF Operator: Repetition (Kleene Star)
- EBNF Notation: { p } or p*
func (Parser) REP1 ¶
REP1 composes a parser that applies parser p one or more times to the input and accumulates the results. This does not allow parsing zero times (empty result).
- EBNF Operator: Kleene Plus
- EBNF Notation: p+
type Result ¶
type Result struct { // Val is the actual result of a parser function. // It can be an abstract syntax tree, a finite automata, or any other data structure. Val any // Pos is the first position in the source corresponding to the parsing result. Pos int // Bag is an optional collection of key-value pairs holding extra information and metadata about the parsing result. // You should always check this field to be not nil before using it. Bag Bag }
Result is the result of parsing a production rule. It represents a production rule result.