participle

package module

v2.1.1 Latest Latest Go to latest Published: Nov 30, 2023 License: MIT Imports: 11 Imported by: 280

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/alecthomas/participle

Links

Open Source Insights

README ¶

A dead simple parser package for Go

V2
Introduction
Tutorial
Tag syntax
Overview
Grammar syntax
Capturing
- Capturing boolean value
"Union" types
Custom parsing
Lexing
Options
Examples
Performance
Concurrency
Error reporting
Comments
Limitations
EBNF
Syntax/Railroad Diagrams

V2

This is version 2 of Participle.

It can be installed with:

$ go get github.com/alecthomas/participle/v2@latest

The latest version from v0 can be installed via:

$ go get github.com/alecthomas/participle@latest

Introduction

The goal of this package is to provide a simple, idiomatic and elegant way of defining parsers in Go.

Participle's method of defining grammars should be familiar to any Go programmer who has used the encoding/json package: struct field tags define what and how input is mapped to those same fields. This is not unusual for Go encoders, but is unusual for a parser.

Tutorial

A tutorial is available, walking through the creation of an .ini parser.

Tag syntax

Participle supports two forms of struct tag grammar syntax.

The easiest to read is when the grammar uses the entire struct tag content, eg.

Field string `@Ident @("," Ident)*`

However, this does not coexist well with other tags such as JSON, etc. and may cause issues with linters. If this is an issue then you can use the parser:"" tag format. In this case single quotes can be used to quote literals making the tags somewhat easier to write, eg.

Field string `parser:"@ident (',' Ident)*" json:"field"`

Overview

A grammar is an annotated Go structure used to both define the parser grammar, and be the AST output by the parser. As an example, following is the final INI parser from the tutorial.

type INI struct {
  Properties []*Property `@@*`
  Sections   []*Section  `@@*`
}

type Section struct {
  Identifier string      `"[" @Ident "]"`
  Properties []*Property `@@*`
}

type Property struct {
  Key   string `@Ident "="`
  Value *Value `@@`
}

type Value struct {
  String *string  `  @String`
  Float *float64  `| @Float`
  Int    *int     `| @Int`
}

Note: Participle also supports named struct tags (eg. Hello string `parser:"@Ident"`).

A parser is constructed from a grammar and a lexer:

parser, err := participle.Build[INI]()

Once constructed, the parser is applied to input to produce an AST:

ast, err := parser.ParseString("", "size = 10")
// ast == &INI{
//   Properties: []*Property{
//     {Key: "size", Value: &Value{Int: &10}},
//   },
// }

Grammar syntax

Participle grammars are defined as tagged Go structures. Participle will first look for tags in the form parser:"...". It will then fall back to using the entire tag body.

The grammar format is:

@<expr> Capture expression into the field.
@@ Recursively capture using the fields own type.
<identifier> Match named lexer token.
( ... ) Group.
"..." or '...' Match the literal (note that the lexer must emit tokens matching this literal exactly).
"...":<identifier> Match the literal, specifying the exact lexer token type to match.
<expr> <expr> ... Match expressions.
<expr> | <expr> | ... Match one of the alternatives. Each alternative is tried in order, with backtracking.
~<expr> Match any token that is not the start of the expression (eg: @~";" matches anything but the ; character into the field).
(?= ... ) Positive lookahead group - requires the contents to match further input, without consuming it.
(?! ... ) Negative lookahead group - requires the contents not to match further input, without consuming it.

The following modifiers can be used after any expression:

* Expression can match zero or more times.
+ Expression must match one or more times.
? Expression can match zero or once.
! Require a non-empty match (this is useful with a sequence of optional matches eg. ("a"? "b"? "c"?)!).

Notes:

Each struct is a single production, with each field applied in sequence.
@<expr> is the mechanism for capturing matches into the field.
if a struct field is not keyed with "parser", the entire struct tag will be used as the grammar fragment. This allows the grammar syntax to remain clear and simple to maintain.

Capturing

Prefixing any expression in the grammar with @ will capture matching values for that expression into the corresponding field.

For example:

// The grammar definition.
type Grammar struct {
  Hello string `@Ident`
}

// The source text to parse.
source := "world"

// After parsing, the resulting AST.
result == &Grammar{
  Hello: "world",
}

For slice and string fields, each instance of @ will accumulate into the field (including repeated patterns). Accumulation into other types is not supported.

For integer and floating point types, a successful capture will be parsed with strconv.ParseInt() and strconv.ParseFloat() respectively.

A successful capture match into a bool field will set the field to true.

Tokens can also be captured directly into fields of type lexer.Token and []lexer.Token.

Custom control of how values are captured into fields can be achieved by a field type implementing the Capture interface (Capture(values []string) error).

Additionally, any field implementing the encoding.TextUnmarshaler interface will be capturable too. One caveat is that UnmarshalText() will be called once for each captured token, so eg. @(Ident Ident Ident) will be called three times.

Capturing boolean value

By default, a boolean field is used to indicate that a match occurred, which turns out to be much more useful and common in Participle than parsing true or false literals. For example, parsing a variable declaration with a trailing optional syntax:

type Var struct {
  Name string `"var" @Ident`
  Type string `":" @Ident`
  Optional bool `@"?"?`
}

In practice this gives more useful ASTs. If bool were to be parsed literally then you'd need to have some alternate type for Optional such as string or a custom type.

To capture literal boolean values such as true or false, implement the Capture interface like so:

type Boolean bool

func (b *Boolean) Capture(values []string) error {
	*b = values[0] == "true"
	return nil
}

type Value struct {
	Float  *float64 `  @Float`
	Int    *int     `| @Int`
	String *string  `| @String`
	Bool   *Boolean `| @("true" | "false")`
}

"Union" types

A very common pattern in parsers is "union" types, an example of which is shown above in the Value type. A common way of expressing this in Go is via a sealed interface, with each member of the union implementing this interface.

eg. this is how the Value type could be expressed in this way:

type Value interface { value() }

type Float struct { Value float64 `@Float` }
func (f Float) value() {}

type Int struct { Value int `@Int` }
func (f Int) value() {}

type String struct { Value string `@String` }
func (f String) value() {}

type Bool struct { Value Boolean `@("true" | "false")` }
func (f Bool) value() {}

Thanks to the efforts of Jacob Ryan McCollum, Participle now supports this pattern. Simply construct your parser with the Union[T](member...T) option, eg.

parser := participle.MustBuild[AST](participle.Union[Value](Float{}, Int{}, String{}, Bool{}))

Custom parsers may also be defined for union types with the ParseTypeWith option.

Custom parsing

There are three ways of defining custom parsers for nodes in the grammar:

Implement the Capture interface.
Implement the Parseable interface.
Use the ParseTypeWith option to specify a custom parser for union interface types.

Lexing

Participle relies on distinct lexing and parsing phases. The lexer takes raw bytes and produces tokens which the parser consumes. The parser transforms these tokens into Go values.

The default lexer, if one is not explicitly configured, is based on the Go text/scanner package and thus produces tokens for C/Go-like source code. This is surprisingly useful, but if you do require more control over lexing the included stateful participle/lexer lexer should cover most other cases. If that in turn is not flexible enough, you can implement your own lexer.

Configure your parser with a lexer using the participle.Lexer() option.

To use your own Lexer you will need to implement two interfaces: Definition (and optionally StringsDefinition and BytesDefinition) and Lexer.

Stateful lexer

In addition to the default lexer, Participle includes an optional stateful/modal lexer which provides powerful yet convenient construction of most lexers. (Notably, indentation based lexers cannot be expressed using the stateful lexer -- for discussion of how these lexers can be implemented, see #20).

It is sometimes the case that a simple lexer cannot fully express the tokens required by a parser. The canonical example of this is interpolated strings within a larger language. eg.

let a = "hello ${name + ", ${last + "!"}"}"

This is impossible to tokenise with a normal lexer due to the arbitrarily deep nesting of expressions. To support this case Participle's lexer is now stateful by default.

The lexer is a state machine defined by a map of rules keyed by the state name. Each rule within the state includes the name of the produced token, the regex to match, and an optional operation to apply when the rule matches.

As a convenience, any Rule starting with a lowercase letter will be elided from output, though it is recommended to use participle.Elide() instead, as it better integrates with the parser.

Lexing starts in the Root group. Each rule is matched in order, with the first successful match producing a lexeme. If the matching rule has an associated Action it will be executed.

A state change can be introduced with the Action Push(state). Pop() will return to the previous state.

To reuse rules from another state, use Include(state).

A special named rule Return() can also be used as the final rule in a state to always return to the previous state.

As a special case, regexes containing backrefs in the form \N (where N is a digit) will match the corresponding capture group from the immediate parent group. This can be used to parse, among other things, heredocs. See the tests for an example of this, among others.

Example stateful lexer

Here's a cut down example of the string interpolation described above. Refer to the stateful example for the corresponding parser.

var lexer = lexer.Must(Rules{
	"Root": {
		{`String`, `"`, Push("String")},
	},
	"String": {
		{"Escaped", `\\.`, nil},
		{"StringEnd", `"`, Pop()},
		{"Expr", `\${`, Push("Expr")},
		{"Char", `[^$"\\]+`, nil},
	},
	"Expr": {
		Include("Root"),
		{`whitespace`, `\s+`, nil},
		{`Oper`, `[-+/*%]`, nil},
		{"Ident", `\w+`, nil},
		{"ExprEnd", `}`, Pop()},
	},
})

Example simple/non-stateful lexer

Other than the default and stateful lexers, it's easy to define your own stateless lexer using the lexer.MustSimple() and lexer.NewSimple() functions. These functions accept a slice of lexer.SimpleRule{} objects consisting of a key and a regex-style pattern.

Note: The stateful lexer replaces the old regex lexer.

For example, the lexer for a form of BASIC:

var basicLexer = stateful.MustSimple([]stateful.SimpleRule{
    {"Comment", `(?i)rem[^\n]*`},
    {"String", `"(\\"|[^"])*"`},
    {"Number", `[-+]?(\d*\.)?\d+`},
    {"Ident", `[a-zA-Z_]\w*`},
    {"Punct", `[-[!@#$%^&*()+_={}\|:;"'<,>.?/]|]`},
    {"EOL", `[\n\r]+`},
    {"whitespace", `[ \t]+`},
})

Experimental - code generation

Participle v2 now has experimental support for generating code to perform lexing.

This will generally provide around a 10x improvement in lexing performance while producing O(1) garbage.

To use:

Serialize the stateful lexer definition to a JSON file (pass to json.Marshal).
Run the participle command (see scripts/participle) to generate go code from the lexer JSON definition. For example:

participle gen lexer <package name> [--name SomeCustomName] < mylexer.json | gofmt > mypackage/mylexer.go

(see genLexer in conformance_test.go for a more detailed example)

When constructing your parser, use the generated lexer for your lexer definition, such as:

var ParserDef = participle.MustBuild[someGrammer](participle.Lexer(mylexer.SomeCustomnameLexer))

Consider contributing to the tests in conformance_test.go if they do not appear to cover the types of expressions you are using the generated lexer.

Known limitations of the code generated lexer:

The lexer is always greedy. e.g., the regex "[A-Z][A-Z][A-Z]?T" will not match "EST" in the generated lexer because the quest operator is a greedy match and does not "give back" to try other possibilities; you can overcome by using | if you have a non-greedy match, e.g., "[A-Z][A-Z]|(?:[A-Z]T|T)" will produce correct results in both lexers (see #276 for more detail); this limitation allows the generated lexer to be very fast and memory efficient
Backreferences in regular expressions are not currently supported

Options

The Parser's behaviour can be configured via Options.

Examples

There are several examples included, some of which are linked directly here. These examples should be run from the _examples subdirectory within a cloned copy of this repository.

Example	Description
BASIC	A lexer, parser and interpreter for a rudimentary dialect of BASIC.
EBNF	Parser for the form of EBNF used by Go.
Expr	A basic mathematical expression parser and evaluator.
GraphQL	Lexer+parser for GraphQL schemas
HCL	A parser for the HashiCorp Configuration Language.
INI	An INI file parser.
Protobuf	A full Protobuf version 2 and 3 parser.
SQL	A very rudimentary SQL SELECT parser.
Stateful	A basic example of a stateful lexer and corresponding parser.
Thrift	A full Thrift parser.
TOML	A TOML parser.

Included below is a full GraphQL lexer and parser:

package main

import (
	"fmt"
	"os"

	"github.com/alecthomas/kong"
	"github.com/alecthomas/repr"

	"github.com/alecthomas/participle/v2"
	"github.com/alecthomas/participle/v2/lexer"
)

type File struct {
	Entries []*Entry `@@*`
}

type Entry struct {
	Type   *Type   `  @@`
	Schema *Schema `| @@`
	Enum   *Enum   `| @@`
	Scalar string  `| "scalar" @Ident`
}

type Enum struct {
	Name  string   `"enum" @Ident`
	Cases []string `"{" @Ident* "}"`
}

type Schema struct {
	Fields []*Field `"schema" "{" @@* "}"`
}

type Type struct {
	Name       string   `"type" @Ident`
	Implements string   `( "implements" @Ident )?`
	Fields     []*Field `"{" @@* "}"`
}

type Field struct {
	Name       string      `@Ident`
	Arguments  []*Argument `( "(" ( @@ ( "," @@ )* )? ")" )?`
	Type       *TypeRef    `":" @@`
	Annotation string      `( "@" @Ident )?`
}

type Argument struct {
	Name    string   `@Ident`
	Type    *TypeRef `":" @@`
	Default *Value   `( "=" @@ )`
}

type TypeRef struct {
	Array       *TypeRef `(   "[" @@ "]"`
	Type        string   `  | @Ident )`
	NonNullable bool     `( @"!" )?`
}

type Value struct {
	Symbol string `@Ident`
}

var (
	graphQLLexer = lexer.MustSimple([]lexer.Rule{
		{"Comment", `(?:#|//)[^\n]*\n?`, nil},
		{"Ident", `[a-zA-Z]\w*`, nil},
		{"Number", `(?:\d*\.)?\d+`, nil},
		{"Punct", `[-[!@#$%^&*()+_={}\|:;"'<,>.?/]|]`, nil},
		{"Whitespace", `[ \t\n\r]+`, nil},
	})
	parser = participle.MustBuild[File](
		participle.Lexer(graphQLLexer),
		participle.Elide("Comment", "Whitespace"),
		participle.UseLookahead(2),
	)
)

var cli struct {
	EBNF  bool     `help"Dump EBNF."`
	Files []string `arg:"" optional:"" type:"existingfile" help:"GraphQL schema files to parse."`
}

func main() {
	ctx := kong.Parse(&cli)
	if cli.EBNF {
		fmt.Println(parser.String())
		ctx.Exit(0)
	}
	for _, file := range cli.Files {
		r, err := os.Open(file)
		ctx.FatalIfErrorf(err)
		ast, err := parser.Parse(file, r)
		r.Close()
		repr.Println(ast)
		ctx.FatalIfErrorf(err)
	}
}

Performance

One of the included examples is a complete Thrift parser (shell-style comments are not supported). This gives a convenient baseline for comparing to the PEG based pigeon, which is the parser used by go-thrift. Additionally, the pigeon parser is utilising a generated parser, while the participle parser is built at run time.

You can run the benchmarks yourself, but here's the output on my machine:

BenchmarkParticipleThrift-12    	   5941	   201242 ns/op	 178088 B/op	   2390 allocs/op
BenchmarkGoThriftParser-12      	   3196	   379226 ns/op	 157560 B/op	   2644 allocs/op

On a real life codebase of 47K lines of Thrift, Participle takes 200ms and go- thrift takes 630ms, which aligns quite closely with the benchmarks.

Concurrency

A compiled Parser instance can be used concurrently. A LexerDefinition can be used concurrently. A Lexer instance cannot be used concurrently.

Error reporting

There are a few areas where Participle can provide useful feedback to users of your parser.

Errors returned by Parser.Parse*() will be:
1. Of type Error. This will contain positional information where available.
2. May either be ParseError or lexer.Error
Participle will make a best effort to return as much of the AST up to the error location as possible.
Any node in the AST containing a field Pos lexer.Position [^1] will be automatically populated from the nearest matching token.
Any node in the AST containing a field EndPos lexer.Position [^1] will be automatically populated from the token at the end of the node.
Any node in the AST containing a field Tokens []lexer.Token will be automatically populated with all tokens captured by the node, including elided tokens.

[^1]: Either the concrete type or a type convertible to it, allowing user defined types to be used.

These related pieces of information can be combined to provide fairly comprehensive error reporting.

Comments

Comments can be difficult to capture as in most languages they may appear almost anywhere. There are three ways of capturing comments, with decreasing fidelity.

The first is to elide tokens in the parser, then add Tokens []lexer.Token as a field to each AST node. Comments will be included. This has the downside that there's no straightforward way to know where the comments are relative to non-comment tokens in that node.

The second way is to not elide comment tokens, and explicitly capture them at every location in the AST where they might occur. This has the downside that unless you place these captures in every possible valid location, users might insert valid comments that then fail to parse.

The third way is to elide comment tokens and capture them where they're semantically meaningful, such as for documentation comments. Participle supports explicitly matching elided tokens for this purpose.

Limitations

Internally, Participle is a recursive descent parser with backtracking (see UseLookahead(K)).

Among other things, this means that Participle grammars do not support left recursion. Left recursion must be eliminated by restructuring your grammar.

EBNF

The old EBNF lexer was removed in a major refactoring at 362b26 -- if you have an EBNF grammar you need to implement, you can either translate it into regex-style lexer.Rule{} syntax or implement your own EBNF lexer you might be able to use the old EBNF lexer -- as a starting point.

Participle supports outputting an EBNF grammar from a Participle parser. Once the parser is constructed simply call String().

Participle also includes a parser for this form of EBNF (naturally).

eg. The GraphQL example gives in the following EBNF:

File = Entry* .
Entry = Type | Schema | Enum | "scalar" ident .
Type = "type" ident ("implements" ident)? "{" Field* "}" .
Field = ident ("(" (Argument ("," Argument)*)? ")")? ":" TypeRef ("@" ident)? .
Argument = ident ":" TypeRef ("=" Value)? .
TypeRef = "[" TypeRef "]" | ident "!"? .
Value = ident .
Schema = "schema" "{" Field* "}" .
Enum = "enum" ident "{" ident* "}" .

Syntax/Railroad Diagrams

Participle includes a command-line utility to take an EBNF representation of a Participle grammar (as returned by Parser.String()) and produce a Railroad Diagram using tabatkins/railroad-diagrams.

Here's what the GraphQL grammar looks like:

EBNF Railroad Diagram

Documentation ¶

Overview ¶

Package participle constructs parsers from definitions in struct tags and parses directly into those structs. The approach is philosophically similar to how other marshallers work in Go, "unmarshalling" an instance of a grammar into a struct.

The supported annotation syntax is:

`@<expr>` Capture expression into the field.
`@@` Recursively capture using the fields own type.
`<identifier>` Match named lexer token.
`( ... )` Group.
`"..."` Match the literal (note that the lexer must emit tokens matching this literal exactly).
`"...":<identifier>` Match the literal, specifying the exact lexer token type to match.
`<expr> <expr> ...` Match expressions.
`<expr> | <expr>` Match one of the alternatives.

The following modifiers can be used after any expression:

`*` Expression can match zero or more times.
`+` Expression must match one or more times.
`?` Expression can match zero or once.
`!` Require a non-empty match (this is useful with a sequence of optional matches eg. `("a"? "b"? "c"?)!`).

Here's an example of an EBNF grammar.

type Group struct {
    Expression *Expression `"(" @@ ")"`
}

type Option struct {
    Expression *Expression `"[" @@ "]"`
}

type Repetition struct {
    Expression *Expression `"{" @@ "}"`
}

type Literal struct {
    Start string `@String` // lexer.Lexer token "String"
    End   string `("…" @String)?`
}

type Term struct {
    Name       string      `  @Ident`
    Literal    *Literal    `| @@`
    Group      *Group      `| @@`
    Option     *Option     `| @@`
    Repetition *Expression `| "(" @@ ")"`
}

type Sequence struct {
    Terms []*Term `@@+`
}

type Expression struct {
    Alternatives []*Sequence `@@ ("|" @@)*`
}

type Expressions []*Expression

type Production struct {
    Name        string      `@Ident "="`
    Expressions Expressions `@@+ "."`
}

type EBNF struct {
    Productions []*Production `@@*`
}

Index ¶

Constants
Variables
func FormatError(err Error) string
type Capture
type Error
- func Errorf(pos lexer.Position, format string, args ...interface{}) Error
- func Wrapf(pos lexer.Position, err error, format string, args ...interface{}) Error
type Mapper
type Option
type ParseError
type ParseOption
- func AllowTrailing(ok bool) ParseOption
- func Trace(w io.Writer) ParseOption
type Parseable
type Parser
type UnexpectedTokenError

Constants ¶

View Source

const MaxLookahead = 99999

MaxLookahead can be used with UseLookahead to get pseudo-infinite lookahead without the risk of pathological cases causing a stack overflow.

Variables ¶

View Source

var (
	// MaxIterations limits the number of elements capturable by {}.
	MaxIterations = 1000000

	// NextMatch should be returned by Parseable.Parse() method implementations to indicate
	// that the node did not match and that other matches should be attempted, if appropriate.
	NextMatch = errors.New("no match") // nolint: golint
)

Functions ¶

func FormatError ¶

func FormatError(err Error) string

FormatError formats an error in the form "[<filename>:][<line>:<pos>:] <message>"

Types ¶

type Capture ¶

type Capture interface {
	Capture(values []string) error
}

Capture can be implemented by fields in order to transform captured tokens into field values.

type Error ¶

type Error interface {
	error
	// Unadorned message.
	Message() string
	// Closest position to error location.
	Position() lexer.Position
}

Error represents an error while parsing.

The format of an Error is in the form "[<filename>:][<line>:<pos>:] <message>".

The error will contain positional information if available.

func Errorf ¶

func Errorf(pos lexer.Position, format string, args ...interface{}) Error

Errorf creates a new Error at the given position.

func Wrapf ¶

func Wrapf(pos lexer.Position, err error, format string, args ...interface{}) Error

Wrapf attempts to wrap an existing error in a new message.

If "err" is a participle.Error, its positional information will be used and "pos" will be ignored.

The returned error implements the Unwrap() method supported by the errors package.

type Mapper ¶

type Mapper func(token lexer.Token) (lexer.Token, error)

Mapper function for mutating tokens before being applied to the AST.

type Option ¶

type Option func(p *parserOptions) error

An Option to modify the behaviour of the Parser.

func CaseInsensitive ¶

func CaseInsensitive(tokens ...string) Option

CaseInsensitive allows the specified token types to be matched case-insensitively.

Note that the lexer itself will also have to be case-insensitive; this option just controls whether literals in the grammar are matched case insensitively.

func Elide ¶

func Elide(types ...string) Option

Elide drops tokens of the specified types.

func Lexer ¶

func Lexer(def lexer.Definition) Option

Lexer is an Option that sets the lexer to use with the given grammar.

func Map ¶

func Map(mapper Mapper, symbols ...string) Option

Map is an Option that configures the Parser to apply a mapping function to each Token from the lexer.

This can be useful to eg. upper-case all tokens of a certain type, or dequote strings.

"symbols" specifies the token symbols that the Mapper will be applied to. If empty, all tokens will be mapped.

func ParseTypeWith ¶

func ParseTypeWith[T any](parseFn func(*lexer.PeekingLexer) (T, error)) Option

ParseTypeWith associates a custom parsing function with some interface type T. When the parser encounters a value of type T, it will use the given parse function to parse a value from the input.

The parse function may return anything it wishes as long as that value satisfies the interface T. However, only a single function can be defined for any type T. If you want to have multiple parse functions returning types that satisfy the same interface, you'll need to define new wrapper types for each one.

This can be useful if you want to parse a DSL within the larger grammar, or if you want to implement an optimized parsing scheme for some portion of the grammar.

func Union ¶

func Union[T any](members ...T) Option

Union associates several member productions with some interface type T. Given members X, Y, Z, and W for a union type U, then the EBNF rule is:

U = X | Y | Z | W .

When the parser encounters a field of type T, it will attempt to parse each member in sequence and take the first match. Because of this, the order in which the members are defined is important. You must be careful to order your members appropriately.

An example of a bad parse that can happen if members are out of order:

If the first member matches A, and the second member matches A B, and the source string is "AB", then the parser will only match A, and will not try to parse the second member at all.

func Unquote ¶

func Unquote(types ...string) Option

Unquote applies strconv.Unquote() to tokens of the given types.

Tokens of type "String" will be unquoted if no other types are provided.

func Upper ¶

func Upper(types ...string) Option

Upper is an Option that upper-cases all tokens of the given type. Useful for case normalisation.

func UseLookahead ¶

func UseLookahead(n int) Option

UseLookahead allows branch lookahead up to "n" tokens.

If parsing cannot be disambiguated before "n" tokens of lookahead, parsing will fail.

Note that increasing lookahead has a minor performance impact, but also reduces the accuracy of error reporting.

If "n" is negative, it will be treated as "infinite" lookahead. This can have a large impact on performance, and does not provide any protection against stack overflow during parsing. In most cases, using MaxLookahead will achieve the same results in practice, but with a concrete upper bound to prevent pathological behavior in the parser. Using infinite lookahead can be useful for testing, or for parsing especially ambiguous grammars. Use at your own risk!

type ParseError ¶

type ParseError struct {
	Msg string
	Pos lexer.Position
}

ParseError is returned when a parse error occurs.

It is useful for differentiating between parse errors and other errors such as lexing and IO errors.

func (*ParseError) Error ¶

func (p *ParseError) Error() string

func (*ParseError) Message ¶

func (p *ParseError) Message() string

func (*ParseError) Position ¶

func (p *ParseError) Position() lexer.Position

type ParseOption ¶

type ParseOption func(p *parseContext)

ParseOption modifies how an individual parse is applied.

func AllowTrailing ¶

func AllowTrailing(ok bool) ParseOption

AllowTrailing tokens without erroring.

That is, do not error if a full parse completes but additional tokens remain.

func Trace ¶

func Trace(w io.Writer) ParseOption

Trace the parse to "w".

type Parseable ¶

type Parseable interface {
	// Parse into the receiver.
	//
	// Should return NextMatch if no tokens matched and parsing should continue.
	// Nil should be returned if parsing was successful.
	Parse(lex *lexer.PeekingLexer) error
}

The Parseable interface can be implemented by any element in the grammar to provide custom parsing.

type Parser ¶

type Parser[G any] struct {
	// contains filtered or unexported fields
}

A Parser for a particular grammar and lexer.

func Build ¶

func Build[G any](options ...Option) (parser *Parser[G], err error)

Build constructs a parser for the given grammar.

If "Lexer()" is not provided as an option, a default lexer based on text/scanner will be used. This scans typical Go- like tokens.

See documentation for details.

func MustBuild ¶

func MustBuild[G any](options ...Option) *Parser[G]

MustBuild calls Build[G](options...) and panics if an error occurs.

func ParserForProduction ¶

func ParserForProduction[P, G any](parser *Parser[G]) (*Parser[P], error)

ParserForProduction returns a new parser for the given production in grammar G.

func (*Parser[G]) Lex ¶

func (p *Parser[G]) Lex(filename string, r io.Reader) ([]lexer.Token, error)

Lex uses the parser's lexer to tokenise input. Parameter filename is used as an opaque prefix in error messages.

func (*Parser[G]) Lexer ¶

func (p *Parser[G]) Lexer() lexer.Definition

Lexer returns the parser's builtin lexer.

func (*Parser[G]) Parse ¶

func (p *Parser[G]) Parse(filename string, r io.Reader, options ...ParseOption) (v *G, err error)

Parse from r into grammar v which must be of the same type as the grammar passed to Build(). Parameter filename is used as an opaque prefix in error messages.

This may return an Error.

func (*Parser[G]) ParseBytes ¶

func (p *Parser[G]) ParseBytes(filename string, b []byte, options ...ParseOption) (v *G, err error)

ParseBytes from b into grammar v which must be of the same type as the grammar passed to Build(). Parameter filename is used as an opaque prefix in error messages.

This may return an Error.

func (*Parser[G]) ParseFromLexer ¶

func (p *Parser[G]) ParseFromLexer(lex *lexer.PeekingLexer, options ...ParseOption) (*G, error)

ParseFromLexer into grammar v which must be of the same type as the grammar passed to Build().

This may return a Error.

func (*Parser[G]) ParseString ¶

func (p *Parser[G]) ParseString(filename string, s string, options ...ParseOption) (v *G, err error)

ParseString from s into grammar v which must be of the same type as the grammar passed to Build(). Parameter filename is used as an opaque prefix in error messages.

This may return an Error.

func (*Parser[G]) String ¶

func (p *Parser[G]) String() string

String returns the EBNF for the grammar.

Productions are always upper cased. Lexer tokens are always lower case.

type UnexpectedTokenError ¶

type UnexpectedTokenError struct {
	Unexpected lexer.Token
	Expect     string
	// contains filtered or unexported fields
}

UnexpectedTokenError is returned by Parse when an unexpected token is encountered.

This is useful for composing parsers in order to detect when a sub-parser has terminated.

func (*UnexpectedTokenError) Error ¶

func (u *UnexpectedTokenError) Error() string

func (*UnexpectedTokenError) Message ¶

func (u *UnexpectedTokenError) Message() string

func (*UnexpectedTokenError) Position ¶

func (u *UnexpectedTokenError) Position() lexer.Position

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
railroad Package main generates Railroad Diagrams from Participle grammar EBNF.	Package main generates Railroad Diagrams from Participle grammar EBNF.
ebnf Package ebnf contains the AST and parser for parsing the form of EBNF produced by Participle.	Package ebnf contains the AST and parser for parsing the form of EBNF produced by Participle.
lexer Package lexer defines interfaces and implementations used by Participle to perform lexing.	Package lexer defines interfaces and implementations used by Participle to perform lexing.
internal Code generated by Participle.	Code generated by Participle.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL