Documentation
¶
Overview ¶
Package combinator provides data types and primitive constructs for building parser combinators. A parser combinator is a higher-order function that takes one or more parsers as input and produces a new composite parser as output.
A parser itself is a function that processes an input stream of characters and returns an output structure, such as an abstract syntax tree (AST), a finite automaton, or another representation of the parsed data.
Parser combinators enable a modular, top-down recursive descent parsing strategy. They allow complex parsers to be constructed from smaller, reusable components, making the parsing process easier to build, maintain, and test.
Top-down parsing involves constructing a parse tree for the input string, starting from the root node (representing the start symbol) and expanding the nodes in preorder. Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input string.
A recursive descent parser is a top-down parser constructed from mutually recursive procedures (or their non-recursive equivalents), where each procedure corresponds to a nonterminal in the grammar. This structure closely mirrors the grammar, making it intuitive and directly aligned with the rules it recognizes.
For more details on parsing theory, refer to "Compilers: Principles, Techniques, and Tools (2nd Edition)".
Example ¶
package main
import (
"fmt"
"github.com/moorara/algo/parser/combinator"
)
func main() {
digit := combinator.ExpectRuneInRange('0', '9').Map(toDigit) // digit → "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
num := digit.REP1().Map(toNum) // num → digit+
expr := num.CONCAT(combinator.ExpectRune('+'), num).Map(evalExpr) // expr → num "+" num
in := newStringInput("27 + 69")
out, err := expr(in)
if err != nil {
fmt.Println("Error:", err)
return
}
n := out.Result.Val.(int)
fmt.Println(n)
}
func toDigit(r combinator.Result) (combinator.Result, error) {
v := r.Val.(rune)
digit := int(v - '0')
return combinator.Result{
Val: digit,
Pos: r.Pos,
}, nil
}
func toNum(r combinator.Result) (combinator.Result, error) {
l := r.Val.(combinator.List)
var num int
for _, r := range l {
num = num*10 + r.Val.(int)
}
return combinator.Result{
Val: num,
Pos: l[0].Pos,
}, nil
}
func evalExpr(r combinator.Result) (combinator.Result, error) {
r0, _ := r.Get(0)
r2, _ := r.Get(2)
n0 := r0.Val.(int)
n2 := r2.Val.(int)
return combinator.Result{
Val: int(n0 + n2),
Pos: r0.Pos,
}, nil
}
// stringInput implements the combinator.Input interface for strings.
type stringInput struct {
pos int
runes []rune
}
func newStringInput(s string) combinator.Input {
return &stringInput{
pos: 0,
runes: []rune(s),
}
}
func (s *stringInput) Current() (rune, int) {
if s.runes[0] == ' ' {
s.pos, s.runes = s.pos+1, s.runes[1:]
return s.Current()
}
return s.runes[0], s.pos
}
func (s *stringInput) Remaining() combinator.Input {
if len(s.runes) == 1 {
return nil
}
return &stringInput{
pos: s.pos + 1,
runes: s.runes[1:],
}
}
Output:
Index ¶
- type Bag
- type BagKey
- type BagVal
- type BindFunc
- type Empty
- type Input
- type List
- type MapFunc
- type Output
- type Parser
- func ALT(p ...Parser) Parser
- func CONCAT(p ...Parser) Parser
- func ExpectRune(r rune) Parser
- func ExpectRuneIn(runes ...rune) Parser
- func ExpectRuneInRange(lo, hi rune) Parser
- func ExpectRunes(runes ...rune) Parser
- func ExpectString(s string) Parser
- func NotExpectRune(r rune) Parser
- func NotExpectRuneIn(runes ...rune) Parser
- func NotExpectRuneInRange(lo, hi rune) Parser
- func NotExpectRunes(runes ...rune) Parser
- func NotExpectString(s string) Parser
- func OPT(p Parser) Parser
- func REP(p Parser) Parser
- func REP1(p Parser) Parser
- func (p Parser) ALT(q ...Parser) Parser
- func (p Parser) Bind(f BindFunc) Parser
- func (p Parser) CONCAT(q ...Parser) Parser
- func (p Parser) Flatten() Parser
- func (p Parser) Get(i int) Parser
- func (p Parser) Map(f MapFunc) Parser
- func (p Parser) OPT() Parser
- func (p Parser) REP() Parser
- func (p Parser) REP1() Parser
- func (p Parser) Select(i ...int) Parser
- type Result
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BindFunc ¶ added in v0.12.2
BindFunc is a function that receives a parsing result and returns a new parser. It is more powerful than MapFunc as it can produce a new parser based on the value of the parsing result.
type Input ¶
type Input interface {
// Current returns the current rune from input along with its position in the input.
Current() (rune, int)
// Remaining returns the remaining of input. If no input left, it returns nil.
Remaining() Input
}
Input is the input to a parser function.
type MapFunc ¶ added in v0.12.2
MapFunc is a function that receives a parsing result and returns a new result.
A mapping error is considerd a non-syntactic error, causing combinators like ALT, OPT, REP, and REP1 to stop parsing and return the error immediately.
type Parser ¶
Parser is the type for a function that receives an input and returns an output.
var E Parser = func(in Input) (*Output, error) { _, pos := in.Current() return &Output{ Result: Result{ Val: Empty{}, Pos: pos, }, Remaining: in, }, nil }
E is the empty parser for consuming the empty string ε. It always succeeds without consuming any input.
func ALT ¶ added in v0.12.2
ALT composes a parser that alternates a sequence of parsers. It applies the first parser to the input and if it does not succeed, it applies the next parser to the same input, and continues parsing to the last parser. It stops at the first successful parsing and returns its result.
- EBNF Operator: Alternation
- EBNF Notation: p | q
func CONCAT ¶ added in v0.12.2
CONCAT composes a parser that concats a sequence of parsers. It applies the first parser to the input, then applies the next parser to the remaining of the input, and continues parsing to the last parser.
- EBNF Operator: Concatenation
- EBNF Notation: p q
func ExpectRune ¶
ExpectRune creates a parser that returns a successful result only if the input starts with the given rune.
func ExpectRuneIn ¶
ExpectRuneIn creates a parser that returns a successful result only if the input starts with any of the given runes.
func ExpectRuneInRange ¶
ExpectRuneInRange creates a parser that returns a successful result only if the input starts with a rune in the given range.
func ExpectRunes ¶
ExpectRunes creates a parser that returns a successful result only if the input starts with the given runes in the given order.
func ExpectString ¶
ExpectString creates a parser that returns a successful result only if the input starts with the given string.
func NotExpectRune ¶ added in v0.12.2
NotExpectRune creates a parser that returns a successful result only if the input does not start with the given rune.
func NotExpectRuneIn ¶ added in v0.12.2
NotExpectRuneIn creates a parser that returns a successful result only if the input does not start with any of the given runes.
func NotExpectRuneInRange ¶ added in v0.12.2
NotExpectRuneInRange creates a parser that returns a successful result only if the input does not start with a rune in the given range.
func NotExpectRunes ¶ added in v0.12.2
NotExpectRunes creates a parser that returns a successful result only if the input does not start with the given runes in the given order.
func NotExpectString ¶ added in v0.12.2
NotExpectString creates a parser that returns a successful result only if the input does not start with the given string.
func OPT ¶ added in v0.12.2
OPT composes a parser that applies parser p zero or one time to the input. If the parser does not succeed, it will return an empty result.
- EBNF Operator: Optional
- EBNF Notation: [ p ] or p?
OPT returns an error only if a SemanticError is encountered.
func REP ¶ added in v0.12.2
REP composes a parser that applies parser p zero or more times to the input and accumulates the results. If the parser does not succeed, it will return an empty result.
- EBNF Operator: Repetition (Kleene Star)
- EBNF Notation: { p } or p*
REP returns an error only if a SemanticError is encountered.
func REP1 ¶ added in v0.12.2
REP1 composes a parser that applies parser p one or more times to the input and accumulates the results. This does not allow parsing zero times (empty result).
- EBNF Operator: Kleene Plus
- EBNF Notation: p+
func (Parser) ALT ¶
ALT composes a parser that alternates parser p with a sequence of parsers. It applies parser p to the input and if it does not succeed, it applies the next parser to the same input, and continues parsing to the last parser. It stops at the first successful parsing and returns its result.
- EBNF Operator: Alternation
- EBNF Notation: p | q
func (Parser) Bind ¶
Bind composes a parser that uses parser p to parse the input and produces a new parser from the result. It then applies the new parser to the remaining input from the first parser.
Bind lets later parsing decisions depend on values parsed earlier. This is useful for context-sensitive checks that cannot be expressed by purely context-free grammar rules.
Quick refresher:
- Context-free grammars define productions where each rule's right-hand side does not depend on previously parsed values. They are sufficient for many language constructs.
- Context-sensitive constructs require the parser to adapt based on previously parsed data (for example, requiring a sequence to repeat N times where N was parsed earlier).
Bind is a convenient way to add lightweight, context-sensitive constraints (sometimes called syntax annotations) on top of a context-free parser. The bound function inspects the parsing result and returns a new parser that enforces the constraint.
Consider the following context-free production, where a number is followed by zero or more identifiers:
stmt → num id*
Context-sensitive constraint: the number indicates how many identifiers must follow.
Using Bind, you parse the number first, then return a parser that consumes exactly that many id occurrences. This keeps the grammar mostly context-free while enforcing a small context-sensitive constraint using a Bind-produced parser.
Example ¶
package main
import (
"fmt"
"github.com/moorara/algo/parser/combinator"
)
func main() {
// Production rule: copy → letter* "+" letter*
// Context-sensitive constraint: the two letter* must be identical.
letters := combinator.ExpectRuneInRange('a', 'z').REP().Map(toString)
copy := combinator.CONCAT(
letters,
combinator.ExpectRune('+'),
letters,
).Bind(func(r combinator.Result) combinator.Parser {
return func(in combinator.Input) (*combinator.Output, error) {
r0, _ := r.Get(0)
r2, _ := r.Get(2)
s0 := r0.Val.(string)
s1 := r2.Val.(string)
if s0 != s1 {
return nil, fmt.Errorf("mismatched strings: %q != %q", s0, s1)
}
return &combinator.Output{
Result: combinator.Result{
Val: s0 + s1,
Pos: r0.Pos,
},
Remaining: in,
}, nil
}
})
for _, s := range []string{"foo + bar", "baz + baz"} {
in := newStringInput(s)
out, err := copy(in)
if err == nil {
fmt.Println("Parsed:", out.Result.Val)
} else {
fmt.Println("Failed to parse:", s)
}
}
}
func toString(r combinator.Result) (combinator.Result, error) {
l := r.Val.(combinator.List)
runes := make([]rune, len(l))
for i, r := range l {
runes[i] = r.Val.(rune)
}
return combinator.Result{
Val: string(runes),
Pos: l[0].Pos,
}, nil
}
// stringInput implements the combinator.Input interface for strings.
type stringInput struct {
pos int
runes []rune
}
func newStringInput(s string) combinator.Input {
return &stringInput{
pos: 0,
runes: []rune(s),
}
}
func (s *stringInput) Current() (rune, int) {
if s.runes[0] == ' ' {
s.pos, s.runes = s.pos+1, s.runes[1:]
return s.Current()
}
return s.runes[0], s.pos
}
func (s *stringInput) Remaining() combinator.Input {
if len(s.runes) == 1 {
return nil
}
return &stringInput{
pos: s.pos + 1,
runes: s.runes[1:],
}
}
Output:
func (Parser) CONCAT ¶
CONCAT composes a parser that concats parser p to a sequence of parsers. It applies parser p to the input, then applies the next parser to the remaining of the input, and continues parsing to the last parser.
- EBNF Operator: Concatenation
- EBNF Notation: p q
func (Parser) Flatten ¶
Flatten composes a parser that applies parser p to the input and flattens all results into a single list. This can be used for accessing the values of symbols in the right-side of a production rule more intuitively.
func (Parser) Get ¶
Get composes a parser that applies parser p to the input and returns the value of a symbol from the right-side of the production rule. This can be used after CONCAT, REP, REP1, Flatten, and/or Select.
It will not have any effect if used after other operators and the result of parsing is not a list. If index is invalid, you will get the empty string ε.
func (Parser) Map ¶
Map composes a parser that uses parser p to parse the input and applies a map function to the result. If the parser does not succeed, the map function will not be applied.
Use Map to transform, annotate or convert parsing results into another form (for example: convert a matched rune into an int, build AST nodes, attach metadata, etc.).
The remaining input returned by p is preserved when the mapping succeeds.
Any error returned by the map function is considerd a non-syntactic error, causing combinators like ALT, OPT, REP, and REP1 to stop parsing and return the error immediately.
Example ¶
package main
import (
"fmt"
"github.com/moorara/algo/parser/combinator"
)
func main() {
// Production rule: num → [0-9]+
num := combinator.ExpectRuneInRange('0', '9').REP1().Map(func(r combinator.Result) (combinator.Result, error) {
l := r.Val.(combinator.List)
num := 0
for _, r := range l {
num = num*10 + int(r.Val.(rune)-'0')
}
return combinator.Result{Val: num, Pos: l[0].Pos}, nil
})
in := newStringInput("1024")
out, err := num(in)
if err != nil {
fmt.Println("Error:", err)
return
}
d := out.Result.Val.(int)
fmt.Println(d)
}
// stringInput implements the combinator.Input interface for strings.
type stringInput struct {
pos int
runes []rune
}
func newStringInput(s string) combinator.Input {
return &stringInput{
pos: 0,
runes: []rune(s),
}
}
func (s *stringInput) Current() (rune, int) {
if s.runes[0] == ' ' {
s.pos, s.runes = s.pos+1, s.runes[1:]
return s.Current()
}
return s.runes[0], s.pos
}
func (s *stringInput) Remaining() combinator.Input {
if len(s.runes) == 1 {
return nil
}
return &stringInput{
pos: s.pos + 1,
runes: s.runes[1:],
}
}
Output:
func (Parser) OPT ¶
OPT composes a parser that applies parser p zero or one time to the input. If the parser does not succeed, it will return an empty result.
- EBNF Operator: Optional
- EBNF Notation: [ p ] or p?
func (Parser) REP ¶
REP composes a parser that applies parser p zero or more times to the input and accumulates the results. If the parser does not succeed, it will return an empty result.
- EBNF Operator: Repetition (Kleene Star)
- EBNF Notation: { p } or p*
func (Parser) REP1 ¶
REP1 composes a parser that applies parser p one or more times to the input and accumulates the results. This does not allow parsing zero times (empty result).
- EBNF Operator: Kleene Plus
- EBNF Notation: p+
type Result ¶
type Result struct {
// Val is the actual result of a parser function.
// It can be an abstract syntax tree, a finite automata, or any other data structure.
Val any
// Pos is the first position in the source corresponding to the parsing result.
Pos int
// Bag is an optional collection of key-value pairs holding extra information and metadata about the parsing result.
// You should always check this field to be not nil before using it.
Bag Bag
}
Result is the result of parsing a production rule. It represents a production rule result.