ebnf

package module
v0.0.0-...-9658a1a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 14, 2026 License: MIT Imports: 4 Imported by: 0

README

ebnf

A production-ready EBNF parser and transformation framework for Go. Parse text with EBNF grammars and transform parse trees into any structure—from simple calculators to full compilers and document processors.

Features

  • Powerful Tree Transformation: Multi-pass, top-down, context-aware transforms with rich error handling
  • Standard EBNF: Terminals, non-terminals, sequences, choices, repetition (*, +), optionals (?)
  • Modern Extensions:
    • Regex patterns: Instaparse-style #"regex" for complex patterns
    • Character classes: [a-z], [0-9], [^abc] with negation support
    • Case-insensitive literals: 'SELECT'i matches "select", "SELECT", "SeLeCt"
    • PEG features: Ordered choice (/), lookahead (!, &)
    • Comments: C-style (* comment *)
    • Multiple assignment operators: =, :=, ::=, <-
  • Hidden Tokens: <token> syntax and <rule> definitions to omit structural noise from AST
  • Production Ready: Comprehensive error messages with source positions and visual highlighting
  • Pure Go: No external dependencies or code generation required

Why This Library?

Most parser libraries give you a parse tree and leave you to figure out what to do with it. This library provides a complete transformation framework that makes it easy to go from text to your desired output:

// Define grammar -> Parse input -> Transform to result
// All with automatic type conversion, error handling, and position tracking

Perfect for:

  • Compilers and Interpreters: Multi-pass transforms, scoping, symbol tables
  • Document Processors: Markdown, reStructuredText, custom markup languages
  • Configuration Languages: TOML, YAML-like formats, DSLs
  • Data Formats: Custom serialization, query languages
  • Template Engines: With variable substitution and logic

Installation

go get github.com/wbrown/ebnf

Quick Start: Calculator in ~20 Lines

package main

import (
    "fmt"
    "strconv"
    "github.com/wbrown/ebnf"
    "github.com/wbrown/ebnf/parse"
)

func main() {
    // Load grammar
    grammar, _ := ebnf.LoadGrammar("examples/arithmetic.ebnf")
    parser := parse.New(grammar)
    
    // Define transformations - one function per rule
    transforms := parse.TransformMap{
        "number": func(s string) float64 {
            f, _ := strconv.ParseFloat(s, 64)
            return f
        },
        "add": func(a, b float64) float64 { return a + b },
        "sub": func(a, b float64) float64 { return a - b },
        "mul": func(a, b float64) float64 { return a * b },
        "div": func(a, b float64) float64 { return a / b },
        "neg": func(a float64) float64 { return -a },
    }
    
    // Parse and evaluate
    tree, _ := parser.Parse("(2 + 3) * 4 - -5", "expr")
    result, _ := parse.Transform(tree, transforms)
    fmt.Printf("Result: %.2f\n", result) // Result: 25.00
}

That's it! The transform system handles:

  • ✅ Type conversions (string → float64)
  • ✅ Bottom-up evaluation (leaves to root)
  • ✅ Tree traversal
  • ✅ Error propagation

Understanding Transforms: A Compilation Pipeline

Here's the key insight: This library transforms source text into structures your code can use. Parse once, transform once, then use the result efficiently.

Source Text → Parse Tree → Transform → Usable Output (AST, structs, bytecode, etc.)

The transforms run at parse time (once), not runtime. You're building the compiler frontend, not the runtime:

What You're Building Parse Transform To Then Use For
JSON Parser JSON text Go maps/structs Fast lookups at runtime
Template Engine Template syntax Compiled template Fast rendering (no re-parsing)
SQL Query SQL text Query AST Optimized execution
Config Processor Config file Validated structs Application startup
Compiler Source code AST/Bytecode/IR Optimization & codegen
Markdown Converter Markdown HTML/PDF Static site generation

Example: The calculator isn't evaluating expressions in a hot loop. It parses "2 + 3 * 4" once, transforms to the result 14.0, and you're done. For a real application, you'd transform to an AST or bytecode that your fast runtime interpreter/VM executes.

Real-World Pattern
// Parse + transform once (at load/compile time)
configTree, _ := parser.Parse(configFile, "config")
config, _ := parse.Transform(configTree, configTransforms)
// Result: validated *Config struct

// Use many times (at runtime, fast)
for request := range requests {
    if config.EnableFeature {  // No parsing overhead
        handleRequest(request)
    }
}

This is a compiler frontend tool, not a runtime interpreter. Your production code uses the transformed output, not the parser.

What You Can Build
  • Expression evaluators: Calculator, spreadsheet formulas, query languages
  • Compilers: Parse source → Transform to AST → Transform to bytecode → Transform to machine code (each pass is a transform)
  • Template engines: Parse template → Transform with variable substitution
  • Configuration processors: Parse config → Transform to validated structs
  • Document converters: Parse Markdown → Transform to HTML/PDF/LaTeX
  • DSLs: Parse domain language → Transform to actions/API calls

The calculator demonstrates this pattern. Use the same approach to build any language processor.

Transform System

The transform system provides multiple modes for different use cases.

Basic Usage
transforms := parse.TransformMap{
    "number": func(s string) float64 {
        f, _ := strconv.ParseFloat(s, 64)
        return f
    },
    "add": func(a, b float64) float64 { return a + b },
}

result, err := parse.Transform(tree, transforms)
Advanced: Access Source Positions
transforms := parse.TransformMap{
    "number": func(node *parse.Node, s string) (*Number, error) {
        val, err := strconv.Atoi(s)
        if err != nil {
            return nil, fmt.Errorf("invalid number at line %d", node.Line)
        }
        return &Number{Value: val, Line: node.Line}, nil
    },
}
Advanced: Multi-Pass Transformations

Perfect for document processors and multi-stage compilation:

// Pass 1: Transform inline elements
pass1 := parse.TransformMap{
    "bold": func(text string) string {
        return fmt.Sprintf("<strong>%s</strong>", text)
    },
    "italic": func(text string) string {
        return fmt.Sprintf("<em>%s</em>", text)
    },
}

// Pass 2: Transform block elements
pass2 := parse.TransformMap{
    "paragraph": func(inlines ...string) string {
        return fmt.Sprintf("<p>%s</p>", strings.Join(inlines, ""))
    },
    "document": func(blocks ...string) string {
        return strings.Join(blocks, "\n")
    },
}

// Apply both passes sequentially
html, err := parse.TransformMultiPass(tree, []parse.TransformMap{pass1, pass2})

See examples/markdown/ for a complete Markdown-to-HTML transformer.

Advanced: Context-Aware Transforms

Access parent nodes, siblings, and tree context:

transforms := parse.TransformMap{
    "list_item": func(ctx *parse.TransformContext, text string) string {
        // Add comma unless it's the last item
        if !ctx.IsLast() {
            return text + ", "
        }
        return text
    },
}
Advanced: Top-Down Transforms

Process parent before children—useful for scoping and symbol tables:

// Build symbol table top-down
var symbolTable = make(map[string]*Function)

transforms := parse.TransformMap{
    "function": func(ctx *parse.TransformContext, name string, body interface{}) {
        symbolTable[name] = &Function{Name: name}
        // Children will see this in symbol table
    },
}

result, err := parse.TransformTopDown(tree, transforms)
Rich Error Messages

Errors automatically include source position and visual highlighting:

transform error in rule "number" at line 3, column 5
  source: "abc + 123"
          ^^^
  error: strconv.Atoi: parsing "abc": invalid syntax
Complete Documentation

Real-World Examples

Markdown to HTML

Complete working example in examples/markdown/:

  • Multi-pass transformation (inline → block elements)
  • Context-aware list handling
  • Type preservation between passes
  • Rich error reporting
cd examples/markdown
go run markdown_to_html.go example.md
Calculator

Complete calculator with precedence and parentheses in examples/calculator/:

cd examples/calculator
go run main.go

Parsing Input

// Load grammar
grammar, err := ebnf.LoadGrammar("examples/json.ebnf")
if err != nil {
    log.Fatal(err)
}

// Create parser
parser := parse.New(grammar)

// Parse input
tree, err := parser.Parse(`{"name": "Alice", "age": 30}`, "json")
if err != nil {
    log.Fatal(err)
}

// The parse tree includes:
// - Rule names
// - Source positions (line, column, start, end)
// - Terminal values
// - Child nodes

EBNF Grammar Syntax

Basic Elements
(* Terminals *)
literal = "text" ;
char = 'a' ;

(* Character classes *)
letter = [a-zA-Z] ;
digit = [0-9] ;
not_quote = [^"] ;

(* Regex patterns *)
identifier = #"[a-zA-Z_][a-zA-Z0-9_]*" ;

(* Case-insensitive literals *)
keyword = 'SELECT'i ;             (* Matches select, SELECT, SeLeCt *)
sql = "FROM"i table ;

(* Repetition *)
zero_or_more = element* ;
one_or_more = element+ ;
optional = element? ;

(* Choices *)
unordered = a | b | c ;    (* Any match wins *)
ordered = a / b / c ;      (* First match wins - PEG style *)

(* Grouping *)
grouped = (a | b) c ;

(* Lookahead *)
not_keyword = !keyword identifier ;
starts_with_digit = &digit number ;
Hidden Tokens

Remove structural noise from your parse tree:

(* Hide specific references *)
expr = <"("> value <")"> ;        (* Parens hidden *)
rule = <whitespace>* term ;       (* Whitespace hidden *)

(* Hide entire rule wrapper *)
<value> = object | array | string | number ;
<whitespace> = " " | "\t" | "\n" ;

Hidden rule definitions (with <> around the name) remove the wrapper node entirely, flattening the tree.

Example Grammars

Arithmetic with Clean Parse Trees

examples/arithmetic.ebnf uses right recursion to create S-expression-like trees:

add = mul_div <"+"> add_sub ;
mul = unary <"*"> mul_div ;

This creates trees like add(2, mul(3, 4)) instead of flat lists, making transformations trivial.

JSON Parser

examples/json.ebnf demonstrates:

  • Recursive structures (objects, arrays)
  • String escape sequences
  • Hidden structural tokens for clean trees
Regex Patterns

examples/regex_demo.ebnf:

identifier = #"[a-zA-Z][a-zA-Z0-9_]*" ;
email = #"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" ;

Use Cases

This library is suitable for:

✅ Production Compilers and Interpreters
  • Multi-pass compilation (parse → resolve → typecheck → codegen)
  • Symbol table construction with scoping
  • Source position tracking for error messages
  • Type-safe transformations
✅ Document Processors
  • Markdown, reStructuredText, AsciiDoc
  • Custom markup languages
  • Multi-pass processing (inline → block → layout)
  • Context-aware formatting
✅ Configuration Languages
  • Custom DSLs for your application
  • TOML/YAML-like formats
  • Validation with rich error messages
  • Type-safe configuration objects
✅ Data Formats and Query Languages
  • Custom serialization formats
  • SQL-like query languages
  • Template languages
  • Expression evaluators
✅ Development Tools
  • Syntax highlighters
  • Code formatters
  • Linters and analyzers
  • Educational parsing tools

API Reference

Loading Grammars
// From string
grammar, err := ebnf.ParseString(grammarText)

// From file
grammar, err := ebnf.LoadGrammar("path/to/grammar.ebnf")

// Access rules
rule := grammar.GetRule("expr")
Parsing
// Create parser
parser := parse.New(grammar)

// Parse input
tree, err := parser.Parse(input, "start_rule")

// Print parse tree
parse.PrintAST(os.Stdout, tree)          // Full tree
parse.CompactPrintAST(os.Stdout, tree)   // Compact format
Parser Options
// Terminals are case-sensitive by default.
// Use WithCaseInsensitive(true) for global case-insensitive matching,
// or use the 'i' suffix on individual terminals: 'SELECT'i
parser := parse.New(grammar, parse.WithCaseInsensitive(true))

Per-terminal case-insensitivity uses the i suffix in the grammar:

(* Only keywords are case-insensitive, identifiers are case-sensitive *)
query = 'SELECT'i column 'FROM'i table ;
column = identifier ;
table = identifier ;
identifier = #"[a-zA-Z_]+" ;
Transforming
// Bottom-up (default)
result, err := parse.Transform(tree, transforms)

// Multi-pass
result, err := parse.TransformMultiPass(tree, []parse.TransformMap{pass1, pass2})

// Top-down
result, err := parse.TransformTopDown(tree, transforms)

Parse Tree Structure

The parser produces *parse.Node with:

type Node struct {
    Rule             string      // Grammar rule name
    Value            string      // Terminal text value
    TransformedValue interface{} // Typed value from transforms
    Children         []*Node     // Child nodes
    Line, Column     int         // Source position
    Start, End       int         // Character offsets
}

Comparison to Other Tools

Feature This Library ANTLR Yacc/Bison Hand-Written
Pure Go ❌ (Java tool) ❌ (C tool)
No code gen
Transform framework ⚠️ (listeners)
Multi-pass ⚠️ (manual)
Rich errors ⚠️ ⚠️ (manual)
Learning curve Low High High Medium
Runtime deps None Runtime JAR C runtime None

Performance

The transform system uses reflection for function signature detection but only during initialization. Transform execution is direct function calls with no reflection overhead.

For large inputs, consider:

  • Streaming parsers for multi-GB files
  • This library excels at medium-sized inputs (source files, documents, configs)

Contributing

Contributions welcome! Please submit issues or pull requests.

License

MIT License - See LICENSE file for details

  • Instaparse - Clojure parser library that inspired the transform system
  • ANTLR - Full-featured parser generator
  • PEG parsers - For background on ordered choice

Documentation

Overview

Package ebnf provides a parser for Extended Backus-Naur Form (EBNF) grammars.

This package supports standard EBNF syntax with modern extensions including regex patterns, PEG-style ordered choice, and hidden tokens.

Basic Usage

Parse a grammar from a string:

grammar, err := ebnf.ParseString(`
    expr = term ( "+" term | "-" term )* ;
    term = number ;
    number = #"[0-9]+" ;
`)

Load a grammar from a file:

grammar, err := ebnf.LoadGrammar("path/to/grammar.ebnf")

Access rules:

rule := grammar.GetRule("expr")
if rule != nil {
    fmt.Printf("Rule type: %s\n", ebnf.ExpressionType(rule.Expression))
}

EBNF Syntax Supported

Terminals:

  • "text" or 'text' - literal strings
  • #"regex" - regular expression patterns

Non-terminals:

  • identifier - reference to another rule

Character classes:

  • [a-z] - character ranges
  • [abc] - character set
  • [^abc] - negated character class

Repetition:

  • expr* - zero or more
  • expr+ - one or more
  • expr? - optional
  • {expr} - zero or more (alternative syntax)

Choices:

  • a | b - unordered choice
  • a / b - ordered choice (PEG-style, first match wins)

Grouping:

  • (expr) - group expressions

Lookahead:

  • !expr - negative lookahead
  • &expr - positive lookahead

Hidden tokens (omitted from AST):

  • <"token"> - hidden terminal
  • <rule> - hidden non-terminal
  • <expr> - hidden expression

Comments:

  • (* comment *) - C-style comments

Assignment operators:

  • =, :=, ::=, <- - all supported

AST Structure

The parser produces a typed AST with the following node types:

  • Grammar: Container for all rules
  • Rule: A named production rule
  • Terminal: Literal string or character
  • NonTerminal: Reference to another rule
  • Sequence: Concatenation of expressions
  • Choice: Unordered alternatives (|)
  • OrderedChoice: Ordered alternatives (/)
  • Optional: Optional expression (?)
  • Repetition: Zero or more (*)
  • OneOrMore: One or more (+)
  • CharClass: Character class [...]
  • Regex: Regular expression #"..."
  • Predicate: Negative lookahead (!)
  • PositiveLookahead: Positive lookahead (&)
  • Hidden: Marks hidden expressions

All AST node types implement the Expression interface.

Examples

See the examples/ directory for complete grammar examples including:

  • arithmetic.ebnf - Simple expression grammar
  • json.ebnf - Complete JSON grammar
  • regex_demo.ebnf - Regex pattern examples
  • instaparse_demo.ebnf - PEG-style features

Package ebnf implements a parser for EBNF grammars

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExpressionType

func ExpressionType(expr Expression) string

ExpressionType returns a string describing the type of expression

func PrintGrammarSummary

func PrintGrammarSummary(g *Grammar)

PrintGrammarSummary prints a summary of the grammar

Types

type CharClass

type CharClass struct {
	Chars   []rune
	Ranges  []CharRange
	Negated bool
}

CharClass represents character classes like [a-zA-Z]

func (*CharClass) String

func (c *CharClass) String() string

String representation for debugging

type CharRange

type CharRange struct {
	From rune
	To   rune
}

type Choice

type Choice struct {
	Alternatives []Expression
}

Choice represents alternatives (|)

type Empty

type Empty struct{}

Empty represents an empty expression

type Expression

type Expression interface {
	// contains filtered or unexported methods
}

Expression is the interface for all EBNF expressions

type Grammar

type Grammar struct {
	Rules []*Rule
}

func LoadGrammar

func LoadGrammar(filename string) (*Grammar, error)

LoadGrammar loads and parses an EBNF grammar from a file

Example
package main

import (
	"github.com/wbrown/ebnf"
)

func main() {
	// Load a grammar from a file
	grammar, err := ebnf.LoadGrammar("examples/arithmetic.ebnf")
	if err != nil {
		panic(err)
	}

	// Access a specific rule
	exprRule := grammar.GetRule("expr")
	if exprRule != nil {
		// Process the rule...
		_ = exprRule
	}
}

func ParseFile

func ParseFile(filename string) (*Grammar, error)

ParseFile parses an EBNF grammar from a file

func ParseString

func ParseString(input string) (*Grammar, error)

ParseString parses an EBNF grammar from a string

Example
package main

import (
	"github.com/wbrown/ebnf"
)

func main() {
	// Define a simple grammar inline
	grammarText := `
		greeting = "Hello" <ws> name "!" ;
		name = #"[A-Z][a-z]+" ;
		ws = #"\s+" ;
	`

	grammar, err := ebnf.ParseString(grammarText)
	if err != nil {
		panic(err)
	}

	// Use the grammar...
	_ = grammar
}

func (*Grammar) GetRule

func (g *Grammar) GetRule(name string) *Rule

GetRule finds a rule by name

type Group

type Group struct {
	Expr Expression
}

Group represents a grouped expression in parentheses

type Hidden

type Hidden struct {
	Expr Expression
}

Hidden represents a hidden expression <expr>

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer for EBNF

func NewLexer

func NewLexer(input string) *Lexer

func (*Lexer) NextToken

func (l *Lexer) NextToken() (Token, error)

type NonTerminal

type NonTerminal struct {
	Name   string
	Hidden bool // true if wrapped in < >
}

NonTerminal represents a reference to another rule

type OneOrMore

type OneOrMore struct {
	Expr Expression
}

OneOrMore represents one or more (+)

type Optional

type Optional struct {
	Expr Expression
}

Optional represents an optional expression (?)

type OrderedChoice

type OrderedChoice struct {
	Alternatives []Expression
}

OrderedChoice represents ordered alternatives (/)

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser for EBNF grammars

func NewParser

func NewParser(input string) *Parser

func (*Parser) ParseGrammar

func (p *Parser) ParseGrammar() (*Grammar, error)

ParseGrammar parses the entire EBNF grammar

type PositiveLookahead

type PositiveLookahead struct {
	Expr Expression
}

PositiveLookahead represents a positive lookahead (&expr)

type Predicate

type Predicate struct {
	Expr Expression
}

Predicate represents a negative lookahead (!expr)

type Regex

type Regex struct {
	Pattern string
	Hidden  bool // true if wrapped in < >
}

Regex represents a regular expression pattern #"..."

type Repetition

type Repetition struct {
	Expr Expression
}

Repetition represents zero or more (*)

type Rule

type Rule struct {
	Name       string
	Expression Expression
	Hidden     bool // true if rule name is wrapped in < >
}

type Sequence

type Sequence struct {
	Elements []Expression
}

Sequence represents concatenation of expressions

type Terminal

type Terminal struct {
	Value           string
	Hidden          bool // true if wrapped in < >
	CaseInsensitive bool // true for case-insensitive matching ('x'i or "x"i)
}

Terminal represents a literal string or character

type Token

type Token struct {
	Type  TokenType
	Value string
	Line  int
	Col   int
}

type TokenType

type TokenType int

Token types

const (
	TokenEOF TokenType = iota
	TokenIdent
	TokenString
	TokenChar
	TokenStringCI // Case-insensitive string "..."i
	TokenCharCI   // Case-insensitive char '...'i
	TokenComment
	TokenEquals
	TokenPipe
	TokenSemi
	TokenLParen
	TokenRParen
	TokenLBracket
	TokenRBracket
	TokenLBrace
	TokenRBrace
	TokenLAngle
	TokenRAngle
	TokenPlus
	TokenStar
	TokenQuestion
	TokenExclam
	TokenComma
	TokenCaret
	TokenMinus
	TokenRegex     // New token type for regex literals #"..."
	TokenSlash     // New token type for ordered choice /
	TokenAmpersand // New token type for positive lookahead &
)

Directories

Path Synopsis
cmd
ebnf-inspect command
Command ebnf-inspect loads and displays information about EBNF grammar files.
Command ebnf-inspect loads and displays information about EBNF grammar files.
ebnf-parse command
examples
calculator command
edn command
Package main demonstrates parsing EDN (Extensible Data Notation) using the EBNF parser framework, with transforms that produce typed Go values.
Package main demonstrates parsing EDN (Extensible Data Notation) using the EBNF parser framework, with transforms that produce typed Go values.
markdown command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL