combinator

package
v0.14.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 19, 2025 License: ISC Imports: 3 Imported by: 0

Documentation

Overview

Package combinator provides data types and primitive constructs for building parser combinators. A parser combinator is a higher-order function that takes one or more parsers as input and produces a new composite parser as output.

A parser itself is a function that processes an input stream of characters and returns an output structure, such as an abstract syntax tree (AST), a finite automaton, or another representation of the parsed data.

Parser combinators enable a modular, top-down recursive descent parsing strategy. They allow complex parsers to be constructed from smaller, reusable components, making the parsing process easier to build, maintain, and test.

Top-down parsing involves constructing a parse tree for the input string, starting from the root node (representing the start symbol) and expanding the nodes in preorder. Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input string.

A recursive descent parser is a top-down parser constructed from mutually recursive procedures (or their non-recursive equivalents), where each procedure corresponds to a nonterminal in the grammar. This structure closely mirrors the grammar, making it intuitive and directly aligned with the rules it recognizes.

For more details on parsing theory, refer to "Compilers: Principles, Techniques, and Tools (2nd Edition)".

Example
package main

import (
	"fmt"

	"github.com/moorara/algo/parser/combinator"
)

func main() {
	digit := combinator.ExpectRuneInRange('0', '9').Map(toDigit)      // digit → "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
	num := digit.REP1().Map(toNum)                                    // num → digit+
	expr := num.CONCAT(combinator.ExpectRune('+'), num).Map(evalExpr) // expr → num "+" num

	in := newStringInput("27 + 69")

	out, err := expr(in)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}

	n := out.Result.Val.(int)
	fmt.Println(n)
}

func toDigit(r combinator.Result) (combinator.Result, error) {
	v := r.Val.(rune)
	digit := int(v - '0')

	return combinator.Result{
		Val: digit,
		Pos: r.Pos,
	}, nil
}

func toNum(r combinator.Result) (combinator.Result, error) {
	l := r.Val.(combinator.List)

	var num int
	for _, r := range l {
		num = num*10 + r.Val.(int)
	}

	return combinator.Result{
		Val: num,
		Pos: l[0].Pos,
	}, nil
}

func evalExpr(r combinator.Result) (combinator.Result, error) {
	r0, _ := r.Get(0)
	r2, _ := r.Get(2)

	n0 := r0.Val.(int)
	n2 := r2.Val.(int)

	return combinator.Result{
		Val: int(n0 + n2),
		Pos: r0.Pos,
	}, nil
}

// stringInput implements the combinator.Input interface for strings.
type stringInput struct {
	pos   int
	runes []rune
}

func newStringInput(s string) combinator.Input {
	return &stringInput{
		pos:   0,
		runes: []rune(s),
	}
}

func (s *stringInput) Current() (rune, int) {

	if s.runes[0] == ' ' {
		s.pos, s.runes = s.pos+1, s.runes[1:]
		return s.Current()
	}

	return s.runes[0], s.pos
}

func (s *stringInput) Remaining() combinator.Input {
	if len(s.runes) == 1 {
		return nil
	}

	return &stringInput{
		pos:   s.pos + 1,
		runes: s.runes[1:],
	}
}

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Bag

type Bag map[BagKey]BagVal

Bag is the type for a collection of key-value pairs.

type BagKey

type BagKey string

BagKey is the type for the keys in Bag type.

type BagVal

type BagVal any

BagVal is the type for the values in Bag type.

type BindFunc added in v0.12.2

type BindFunc func(Result) Parser

BindFunc is a function that receives a parsing result and returns a new parser. It is more powerful than MapFunc as it can produce a new parser based on the value of the parsing result.

type Empty

type Empty struct{}

Empty represents the empty string ε.

type Input

type Input interface {
	// Current returns the current rune from input along with its position in the input.
	Current() (rune, int)
	// Remaining returns the remaining of input. If no input left, it returns nil.
	Remaining() Input
}

Input is the input to a parser function.

type List

type List []Result

List is the type for the result of concatenation or repetition.

type MapFunc added in v0.12.2

type MapFunc func(Result) (Result, error)

MapFunc is a function that receives a parsing result and returns a new result.

A mapping error is considerd a non-syntactic error, causing combinators like ALT, OPT, REP, and REP1 to stop parsing and return the error immediately.

type Output

type Output struct {
	Result    Result
	Remaining Input
}

Output is the output of a parser function.

type Parser

type Parser func(Input) (*Output, error)

Parser is the type for a function that receives an input and returns an output.

var E Parser = func(in Input) (*Output, error) {
	_, pos := in.Current()

	return &Output{
		Result: Result{
			Val: Empty{},
			Pos: pos,
		},
		Remaining: in,
	}, nil
}

E is the empty parser for consuming the empty string ε. It always succeeds without consuming any input.

func ALT added in v0.12.2

func ALT(p ...Parser) Parser

ALT composes a parser that alternates a sequence of parsers. It applies the first parser to the input and if it does not succeed, it applies the next parser to the same input, and continues parsing to the last parser. It stops at the first successful parsing and returns its result.

  • EBNF Operator: Alternation
  • EBNF Notation: p | q

func CONCAT added in v0.12.2

func CONCAT(p ...Parser) Parser

CONCAT composes a parser that concats a sequence of parsers. It applies the first parser to the input, then applies the next parser to the remaining of the input, and continues parsing to the last parser.

  • EBNF Operator: Concatenation
  • EBNF Notation: p q

func ExpectRune

func ExpectRune(r rune) Parser

ExpectRune creates a parser that returns a successful result only if the input starts with the given rune.

func ExpectRuneIn

func ExpectRuneIn(runes ...rune) Parser

ExpectRuneIn creates a parser that returns a successful result only if the input starts with any of the given runes.

func ExpectRuneInRange

func ExpectRuneInRange(lo, hi rune) Parser

ExpectRuneInRange creates a parser that returns a successful result only if the input starts with a rune in the given range.

func ExpectRunes

func ExpectRunes(runes ...rune) Parser

ExpectRunes creates a parser that returns a successful result only if the input starts with the given runes in the given order.

func ExpectString

func ExpectString(s string) Parser

ExpectString creates a parser that returns a successful result only if the input starts with the given string.

func NotExpectRune added in v0.12.2

func NotExpectRune(r rune) Parser

NotExpectRune creates a parser that returns a successful result only if the input does not start with the given rune.

func NotExpectRuneIn added in v0.12.2

func NotExpectRuneIn(runes ...rune) Parser

NotExpectRuneIn creates a parser that returns a successful result only if the input does not start with any of the given runes.

func NotExpectRuneInRange added in v0.12.2

func NotExpectRuneInRange(lo, hi rune) Parser

NotExpectRuneInRange creates a parser that returns a successful result only if the input does not start with a rune in the given range.

func NotExpectRunes added in v0.12.2

func NotExpectRunes(runes ...rune) Parser

NotExpectRunes creates a parser that returns a successful result only if the input does not start with the given runes in the given order.

func NotExpectString added in v0.12.2

func NotExpectString(s string) Parser

NotExpectString creates a parser that returns a successful result only if the input does not start with the given string.

func OPT added in v0.12.2

func OPT(p Parser) Parser

OPT composes a parser that applies parser p zero or one time to the input. If the parser does not succeed, it will return an empty result.

  • EBNF Operator: Optional
  • EBNF Notation: [ p ] or p?

OPT returns an error only if a SemanticError is encountered.

func REP added in v0.12.2

func REP(p Parser) Parser

REP composes a parser that applies parser p zero or more times to the input and accumulates the results. If the parser does not succeed, it will return an empty result.

  • EBNF Operator: Repetition (Kleene Star)
  • EBNF Notation: { p } or p*

REP returns an error only if a SemanticError is encountered.

func REP1 added in v0.12.2

func REP1(p Parser) Parser

REP1 composes a parser that applies parser p one or more times to the input and accumulates the results. This does not allow parsing zero times (empty result).

  • EBNF Operator: Kleene Plus
  • EBNF Notation: p+

func (Parser) ALT

func (p Parser) ALT(q ...Parser) Parser

ALT composes a parser that alternates parser p with a sequence of parsers. It applies parser p to the input and if it does not succeed, it applies the next parser to the same input, and continues parsing to the last parser. It stops at the first successful parsing and returns its result.

  • EBNF Operator: Alternation
  • EBNF Notation: p | q

func (Parser) Bind

func (p Parser) Bind(f BindFunc) Parser

Bind composes a parser that uses parser p to parse the input and produces a new parser from the result. It then applies the new parser to the remaining input from the first parser.

Bind lets later parsing decisions depend on values parsed earlier. This is useful for context-sensitive checks that cannot be expressed by purely context-free grammar rules.

Quick refresher:

  • Context-free grammars define productions where each rule's right-hand side does not depend on previously parsed values. They are sufficient for many language constructs.
  • Context-sensitive constructs require the parser to adapt based on previously parsed data (for example, requiring a sequence to repeat N times where N was parsed earlier).

Bind is a convenient way to add lightweight, context-sensitive constraints (sometimes called syntax annotations) on top of a context-free parser. The bound function inspects the parsing result and returns a new parser that enforces the constraint.

Consider the following context-free production, where a number is followed by zero or more identifiers:

stmt → num id*

Context-sensitive constraint: the number indicates how many identifiers must follow.

Using Bind, you parse the number first, then return a parser that consumes exactly that many id occurrences. This keeps the grammar mostly context-free while enforcing a small context-sensitive constraint using a Bind-produced parser.

Example
package main

import (
	"fmt"

	"github.com/moorara/algo/parser/combinator"
)

func main() {
	// Production rule: copy → letter* "+" letter*
	// Context-sensitive constraint: the two letter* must be identical.

	letters := combinator.ExpectRuneInRange('a', 'z').REP().Map(toString)
	copy := combinator.CONCAT(
		letters,
		combinator.ExpectRune('+'),
		letters,
	).Bind(func(r combinator.Result) combinator.Parser {
		return func(in combinator.Input) (*combinator.Output, error) {
			r0, _ := r.Get(0)
			r2, _ := r.Get(2)

			s0 := r0.Val.(string)
			s1 := r2.Val.(string)

			if s0 != s1 {
				return nil, fmt.Errorf("mismatched strings: %q != %q", s0, s1)
			}

			return &combinator.Output{
				Result: combinator.Result{
					Val: s0 + s1,
					Pos: r0.Pos,
				},
				Remaining: in,
			}, nil
		}
	})

	for _, s := range []string{"foo + bar", "baz + baz"} {
		in := newStringInput(s)
		out, err := copy(in)

		if err == nil {
			fmt.Println("Parsed:", out.Result.Val)
		} else {
			fmt.Println("Failed to parse:", s)
		}
	}
}

func toString(r combinator.Result) (combinator.Result, error) {
	l := r.Val.(combinator.List)

	runes := make([]rune, len(l))
	for i, r := range l {
		runes[i] = r.Val.(rune)
	}

	return combinator.Result{
		Val: string(runes),
		Pos: l[0].Pos,
	}, nil
}

// stringInput implements the combinator.Input interface for strings.
type stringInput struct {
	pos   int
	runes []rune
}

func newStringInput(s string) combinator.Input {
	return &stringInput{
		pos:   0,
		runes: []rune(s),
	}
}

func (s *stringInput) Current() (rune, int) {

	if s.runes[0] == ' ' {
		s.pos, s.runes = s.pos+1, s.runes[1:]
		return s.Current()
	}

	return s.runes[0], s.pos
}

func (s *stringInput) Remaining() combinator.Input {
	if len(s.runes) == 1 {
		return nil
	}

	return &stringInput{
		pos:   s.pos + 1,
		runes: s.runes[1:],
	}
}

func (Parser) CONCAT

func (p Parser) CONCAT(q ...Parser) Parser

CONCAT composes a parser that concats parser p to a sequence of parsers. It applies parser p to the input, then applies the next parser to the remaining of the input, and continues parsing to the last parser.

  • EBNF Operator: Concatenation
  • EBNF Notation: p q

func (Parser) Flatten

func (p Parser) Flatten() Parser

Flatten composes a parser that applies parser p to the input and flattens all results into a single list. This can be used for accessing the values of symbols in the right-side of a production rule more intuitively.

func (Parser) Get

func (p Parser) Get(i int) Parser

Get composes a parser that applies parser p to the input and returns the value of a symbol from the right-side of the production rule. This can be used after CONCAT, REP, REP1, Flatten, and/or Select.

It will not have any effect if used after other operators and the result of parsing is not a list. If index is invalid, you will get the empty string ε.

func (Parser) Map

func (p Parser) Map(f MapFunc) Parser

Map composes a parser that uses parser p to parse the input and applies a map function to the result. If the parser does not succeed, the map function will not be applied.

Use Map to transform, annotate or convert parsing results into another form (for example: convert a matched rune into an int, build AST nodes, attach metadata, etc.).

The remaining input returned by p is preserved when the mapping succeeds.

Any error returned by the map function is considerd a non-syntactic error, causing combinators like ALT, OPT, REP, and REP1 to stop parsing and return the error immediately.

Example
package main

import (
	"fmt"

	"github.com/moorara/algo/parser/combinator"
)

func main() {
	// Production rule: num → [0-9]+
	num := combinator.ExpectRuneInRange('0', '9').REP1().Map(func(r combinator.Result) (combinator.Result, error) {
		l := r.Val.(combinator.List)

		num := 0
		for _, r := range l {
			num = num*10 + int(r.Val.(rune)-'0')
		}

		return combinator.Result{Val: num, Pos: l[0].Pos}, nil
	})

	in := newStringInput("1024")

	out, err := num(in)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}

	d := out.Result.Val.(int)
	fmt.Println(d)
}

// stringInput implements the combinator.Input interface for strings.
type stringInput struct {
	pos   int
	runes []rune
}

func newStringInput(s string) combinator.Input {
	return &stringInput{
		pos:   0,
		runes: []rune(s),
	}
}

func (s *stringInput) Current() (rune, int) {

	if s.runes[0] == ' ' {
		s.pos, s.runes = s.pos+1, s.runes[1:]
		return s.Current()
	}

	return s.runes[0], s.pos
}

func (s *stringInput) Remaining() combinator.Input {
	if len(s.runes) == 1 {
		return nil
	}

	return &stringInput{
		pos:   s.pos + 1,
		runes: s.runes[1:],
	}
}

func (Parser) OPT

func (p Parser) OPT() Parser

OPT composes a parser that applies parser p zero or one time to the input. If the parser does not succeed, it will return an empty result.

  • EBNF Operator: Optional
  • EBNF Notation: [ p ] or p?

func (Parser) REP

func (p Parser) REP() Parser

REP composes a parser that applies parser p zero or more times to the input and accumulates the results. If the parser does not succeed, it will return an empty result.

  • EBNF Operator: Repetition (Kleene Star)
  • EBNF Notation: { p } or p*

func (Parser) REP1

func (p Parser) REP1() Parser

REP1 composes a parser that applies parser p one or more times to the input and accumulates the results. This does not allow parsing zero times (empty result).

  • EBNF Operator: Kleene Plus
  • EBNF Notation: p+

func (Parser) Select

func (p Parser) Select(i ...int) Parser

Select composes a parser that applies parser p to the input and returns a list of symbols from the right-side of the production rule.

This will not have any effect if the result of parsing is not a list. If indices are invalid, you will get the empty string ε.

type Result

type Result struct {
	// Val is the actual result of a parser function.
	// It can be an abstract syntax tree, a finite automata, or any other data structure.
	Val any
	// Pos is the first position in the source corresponding to the parsing result.
	Pos int
	// Bag is an optional collection of key-value pairs holding extra information and metadata about the parsing result.
	// You should always check this field to be not nil before using it.
	Bag Bag
}

Result is the result of parsing a production rule. It represents a production rule result.

func (Result) Get

func (r Result) Get(i int) (Result, bool)

Get returns the parsing result of a symbol from the right-side of a production rule. If the position i is out of bounds, the second return value is false.

Example:

// Production Rule: range → "{" num ( "," num? )? "}"
r = {2,4}
r.Get(1) = 2
r.Get(3) = 4

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL