egg

command module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 30, 2026 License: BSD-3-Clause Imports: 6 Imported by: 0

README

Go Reference

logo

egg

Expression Grammar Generator Command

Documentation

Overview

Command egg is an Expression Grammar Generator. At the moment it handles LL(1) grammars.

Step by Step Example - JSON parser

We start in a directory containing the grammar file.

~/tmp/json$ ls
json.ebnf
~/tmp/json$

The JSON grammar is

~/tmp/json$ cat json.ebnf
Array  = '[' [ Value { ',' Value } ] ']' .
JSON = Value .
Member = str ':' Value .
Object = '{' [ Member { ',' Member } ] '}' .
Value  = str | number | Object | Array | "true" | "false" | "null" .
digit = `[0-9]` .
digits = `[0-9]+` .
exponent = ( 'e' | 'E' ) [ '+' | '-' ] digits .
fraction = '.' digits .
hex = `[0-9a-fA-F]` .
integer = digit | non_zero_digit digits | '-' digit | '-' non_zero_digit digits .
non_zero_digit = `[1-9]` .
number = ( integer | integer fraction | integer exponent | integer fraction exponent ) .
str = '"' { `[^"\\\x00-\x1F]` | `\\u` hex hex hex hex | `\\["\\\/bfnrt]` } '"'.
white_space = `( |\t|\r|\n)+` .
~/tmp/json$

Generate the parser from the above EBNF.

~/tmp/json$ egg -o parser.go -test parser_test.go -start JSON json.ebnf
~/tmp/json$ ls
json.ebnf  parser.go  parser_test.go
~/tmp/json$

Format the resulting Go files.

~/tmp/json$ gofmt -l -w .
parser.go
~/tmp/json$

Check the package docs.

~/tmp/json$ go doc Parser
type Parser struct {
	// Has unexported fields.
}

func (p *Parser) Array() (r []int32)
func (p *Parser) EOF() bool
func (p *Parser) JSON() (r []int32)
func (p *Parser) Member() (r []int32)
func (p *Parser) Object() (r []int32)
func (p *Parser) Parse(name string, src []byte) (ast []int32, err error)
func (p *Parser) Scan() (r scanner.Token)
func (p *Parser) Token(n int32) (r scanner.Token)
func (p *Parser) Value() (r []int32)
~/tmp/json$

Initialize the module and pull in any dependencies.

~/tmp/json$ go mod init example.com/json && go mod tidy
go: creating new go.mod: module example.com/json
go: to add module requirements and sums:
	go mod tidy
go: finding module for package modernc.org/scanner
go: found modernc.org/scanner in modernc.org/scanner v1.2.0
~/tmp/json$

Run some tests.

~/tmp/json$ go test -v -yes '^y_' -no '^n_' -may '^i_' -- ~/src/github.com/nst/JSONTestSuite/test_parsing/*.json
=== RUN   Test
    parser_test.go:61: tested files: 318
--- PASS: Test (0.34s)
PASS
ok  	example.com/json	0.357s
~/tmp/json$

Run some benchmarks.

~/tmp/json$ go test -bench . -- ~/src/modernc.org/egg/lib/testdata/json/bench/*.json
goos: linux
goarch: amd64
pkg: example.com/json
cpu: AMD Ryzen 9 3900X 12-Core Processor
Benchmark//home/jnml/src/modernc.org/egg/lib/testdata/json/bench/y_canada_geometry.json-24         	     387	   3058389 ns/op	  88.41 MB/s	 4169885 B/op	      54 allocs/op
Benchmark//home/jnml/src/modernc.org/egg/lib/testdata/json/bench/y_citm_catalog.json-24            	      74	  16133539 ns/op	 107.06 MB/s	17511805 B/op	      90 allocs/op
Benchmark//home/jnml/src/modernc.org/egg/lib/testdata/json/bench/y_golang_source.json-24           	      30	  34006219 ns/op	  57.06 MB/s	44393455 B/op	      72 allocs/op
Benchmark//home/jnml/src/modernc.org/egg/lib/testdata/json/bench/y_string_unicode.json-24          	   18906	     60256 ns/op	 300.78 MB/s	   11512 B/op	      30 allocs/op
Benchmark//home/jnml/src/modernc.org/egg/lib/testdata/json/bench/y_synthea_fhir.json-24            	      64	  18625119 ns/op	 107.84 MB/s	22372796 B/op	      92 allocs/op
Benchmark//home/jnml/src/modernc.org/egg/lib/testdata/json/bench/y_twitter_status.json-24          	     211	   5556977 ns/op	 113.64 MB/s	 6411640 B/op	      77 allocs/op
PASS
ok  	example.com/json	10.128s
~/tmp/json$

EBNF Grammars

Expression grammars are a slightly modified version of the EBNF used by the Go language specification:

Syntax      = { Production } .
Production  = production_name '=' [ Expression ] '.' .
Expression  = Term { '|' Term } .
Term        = Factor { Factor } .
Factor      = production_name | token | Group | Option | Repetition .
Group       = '(' Expression ')' .
Option      = '[' Expression ']' .
Repetition  = '{' Expression '}' .

Productions are expressions constructed from terms and the following operators, in increasing precedence:

|   alternation
()  grouping
[]  option (0 or 1 times)
{}  repetition (0 to n times)

Lowercase production names are used to identify lexical (terminal) tokens. Non-terminals are in CamelCase.

Interpreted strings literals, like "foo", are tokens and will match literally, in this example the rune sequence "foo".

Raw string literals, like `[0-9]`, are tokens and are interpreted as regexps, in this example matching a character class '0'-'9'. Repetitions, like in `re{min,max}` are not supported.

Rune literals, like 'a', are tokens and will match literally, in this example the rune 0x61.

Comments in the expression grammar start with the '#' character and stop at the end of the line.

Lexical production names starting with '_' are reserved.

LL(1) Parsers

Generated LL(1) parsers return an []int32. The slice encodes the AST as follows

  • A non-terminal symbol NT has a header encoded as two int32 values (-NT, n), where 'n' is the number of following slice items that represent the AST of the non-terminal symbol.
  • A terminal symbol T is represented by a positive index 'n'. The token for T can be obtained using the parser's Token method using 'n' as its argument.

This is testdata/parser/expr.ebnf

# A simple list of mathematical expressions.
# 'Start' is the entry point.

Start      = Expr { ";" Expr } .
Expr       = Term { operator Term } .
Term       = operand | "(" Expr ")" .

# Lexical tokens (lowercased)
operator    = "+" | "-" | "*" | "/" .
operand     = `[0-9]+` .
white_space = `[ \t\n\r]+` .

The parser generated from the above grammar in expr_test.go defines these symbols

// Symbols
const (
	expr_TOK_EOF     = expr_Symbol(0) // EOF
	expr_TOK_0028    = expr_Symbol(1) // '('
	expr_TOK_0029    = expr_Symbol(2) // ')'
	expr_TOK_003b    = expr_Symbol(3) // ';'
	expr_operand     = expr_Symbol(4) // operand
	expr_operator    = expr_Symbol(5) // operator
	expr_white_space = expr_Symbol(6) // white_space
	expr_Start       = expr_Symbol(7) // Start
	expr_Expr        = expr_Symbol(8) // Expr
	expr_Term        = expr_Symbol(9) // Term
)

Having this example walk function

func walk[S ~int32](p interface{ Token(int32) scanner.Token }, ast []int32, lvl int) {
	for len(ast) != 0 {
		next := int32(1)
		switch n := ast[0]; {
		case n < 0:
			fmt.Printf("%s%v\n", strings.Repeat("· ", lvl), S(-n))
			next = 2 + ast[1]
			walk[S](p, ast[2:next], lvl+1)
		default:
			tok := p.Token(n)
			fmt.Printf("%s %s [%v]\n", strings.Repeat("· ", lvl), tok, S(tok.Ch))
		}
		ast = ast[next:]
	}
}

And this AST from parsing testdata/expr/number="42\n" is

[-7 6 -8 3 -9 1 0 1]

Calling

walk[expr_Symbol](p, ast, 0)

Prints (the part starting with # is an explanatory comment, it is not actually printed)

Start                                      # -7, 6         Start=[-8 3 -9 1 0 1]
· Expr                                     # · -8, 3       · Expr=[-9 1 0]
· · Term                                   # · · -9, 1     · · Term=[0]
· · · number:1:1: "" "42" U+0004 [operand] # · · · 0       · · · Token[0]
· number:1:4: "\n" "" U+0000 [EOF]         # · 1           · Token[1]

The parser's Token method signature is

func(*receiver) Token(n int32) scanner.Token

Usage

$ egg [flags] <file>

The <file> argument names an Expression Grammar (EBNF) file.

Commnand Line Flags

  • -constprefix <string>

    Strings to prepend to names of generated constants. The default is an empty string.

  • -dbg

    Generate a parser producing debug traces during parsing. The parser calls function `trc` with the same signature as fmt.Printf. You need to provide the trc function somewhere in your package.

  • -methodprefix <string>

    Strings to prepend to names of generated parser methods. The default is an empty string.

  • -o <filename>

    Name the output parser file. Example

    $ egg -o parser.go ebnf.go

    This flag is optional, output defaults to os.Stdout.

  • -package <string>

    The string argument is the package name for the produced parser. The argument defaults to "main".

  • -start <string>

    The string argument names the top level non terminal in the input EBNF file. The argument defaults to "Start".

  • -test <filename>

    Write a rudimentary test/benchmark facility to <filename>. Example

    $ egg -o parser.go -test parser_test.go

  • -type <string>

    The string argument names the parser type. The argument defaults to "Parser".

  • -typeprefix <string>

    Strings to prepend to names of other generated types. The default is an empty string.

  • -varprefix <string>

    Strings to prepend to names of generated variables. The default is an empty string.

  • -white <string>

    The string argument names the lexical/terminal production in the input EBNF file which defines white space separating tokens. The argument defaults to "white_space".

Test Facitlity Command Line Flags

When using the produced test facility, all non flag argument on the command line are considered test inputs. Example

$ go test -- ~/foo/*.bar

The produced test command line flags are

  • -yes <string>

    A regular expression for input base names that must parse without errors. Defaults to "" meaning all test inputs are expected to parse without errors.

    Example: `-yes '^y_'` means all files with prefix `y_` must parse without errors.

  • -no <string>

    A regular expression for input base names that must parse with errors. Defaults to "" meaning no test inputs are expected to parse with errors.

    Example: `-no '^n_'` means all files with prefix `n_` must parse with errors.

  • -may <string>

    A regular expression for input base names that may parse with errors but that is not considered a failure. Defaults to "" meaning no test inputs are expected to parse with errors.

    Example: `-may '^i_'` means all files with prefix `i_` may parse with errors without failing the test.

Test Facitlity Benchmarking

The benchmark considers all non flag argument on the command line as test inputs. Example

$ go test -bench . -- ~/foo/*.bar

Directories

Path Synopsis
lib
Package egg supports Expression Grammar Generators.
Package egg supports Expression Grammar Generators.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL