participle

package module

v0.1.0 Latest Latest Go to latest Published: Sep 11, 2018 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/alecthomas/parser

Links

Open Source Insights

README ¶

A dead simple parser package for Go

Introduction
Limitations
Tutorial
Overview
Annotation syntax
Capturing
Lexing
Options
Example
Performance

Introduction

The goal of this package is to provide a simple, idiomatic and elegant way of defining parsers in Go.

Participle's method of defining grammars should be familiar to any Go programmer who has used the encoding/json package: struct field tags define what and how input is mapped to those same fields. This is not unusual for Go encoders, but is unusual for a parser.

Limitations

Participle parsers are recursive descent. This means that they do not support left recursion.

There is an experimental lookahead option for using precomputed lookahead tables for disambiguation. You can enable this with the parser option participle.UseLookahead().

Left recursion must be eliminated by restructuring your grammar.

Tutorial

A tutorial is available, walking through the creation of an .ini parser.

Overview

A grammar is an annotated Go structure used to both define the parser grammar, and be the AST output by the parser. As an example, following is the final INI parser from the tutorial.

type INI struct {
  Properties []*Property `{ @@ }`
  Sections   []*Section  `{ @@ }`
}

type Section struct {
  Identifier string      `"[" @Ident "]"`
  Properties []*Property `{ @@ }`
}

type Property struct {
  Key   string `@Ident "="`
  Value *Value `@@`
}

type Value struct {
  String *string  `  @String`
  Number *float64 `| @Float`
}

Note: Participle also supports named struct tags (eg. Hello string `parser:"@Ident"`).

A parser is constructed from a grammar and a lexer:

parser, err := participle.Build(&INI{})

Once constructed, the parser is applied to input to produce an AST:

ast := &INI{}
err := parser.ParseString("size = 10", ast)
// ast == &INI{
//   Properties: []*Property{
//     {Key: "size", Value: &Value{Number: &10}},
//   },
// }

Annotation syntax

@<expr> Capture expression into the field.
@@ Recursively capture using the fields own type.
<identifier> Match named lexer token.
{ ... } Match 0 or more times.
( ... ) Group.
[ ... ] Optional.
"..."[:<identifier>] Match the literal, optionally specifying the exact lexer token type to match.
<expr> <expr> ... Match expressions.
<expr> | <expr> Match one of the alternatives.

Notes:

Each struct is a single production, with each field applied in sequence.
@<expr> is the mechanism for capturing matches into the field.
if a struct field is not keyed with "parser", the entire struct tag will be used as the grammar fragment. This allows the grammar syntax to remain clear and simple to maintain.

Capturing

Prefixing any expression in the grammar with @ will capture matching values for that expression into the corresponding field.

For example:

// The grammar definition.
type Grammar struct {
  Hello string `@Ident`
}

// The source text to parse.
source := "world"

// After parsing, the resulting AST.
result == &Grammar{
  Hello: "world",
}

For slice and string fields, each instance of @ will accumulate into the field (including repeated patterns). Accumulation into other types is not supported.

A successful capture match into a boolean field will set the field to true.

For integer and floating point types, a successful capture will be parsed with strconv.ParseInt() and strconv.ParseBool() respectively.

Custom control of how values are captured into fields can be achieved by a field type implementing the Capture interface (Capture(values []string) error).

Lexing

Participle operates on tokens and thus relies on a lexer to convert character streams to tokens.

Three lexers are provided, varying in speed and flexibility. The fastest lexer is based on the text/scanner package but only allows tokens provided by that package. Next fastest is the regexp lexer (lexer.Regexp()). The slowest is currently the EBNF based lexer, but it has a large potential for optimisation through code generation.

To use your own Lexer you will need to implement two interfaces: Definition and Lexer.

Options

The Parser's behaviour can be configured via Options.

Example

There are several examples included in the source. Of particular note is the ini parser, which also includes a custom lexer.

Included below is a parser for the form of EBNF used by exp/ebnf:

package main

import (
  "fmt"
  "os"

  "github.com/alecthomas/participle"
)

type Group struct {
  Expression *Expression `'(' @@ ')'`
}

type Option struct {
  Expression *Expression `'[' @@ ']'`
}

type Repetition struct {
  Expression *Expression `'{' @@ '}'`
}

type Literal struct {
  Start string `@String`
  End   string `[ '…' @String ]`
}

type Term struct {
  Name       string      `@Ident |`
  Literal    *Literal    `@@ |`
  Group      *Group      `@@ |`
  Option     *Option     `@@ |`
  Repetition *Repetition `@@`
}

type Sequence struct {
  Terms []*Term `@@ { @@ }`
}

type Expression struct {
  Alternatives []*Sequence `@@ { '|' @@ }`
}

type Expressions []*Expression

type Production struct {
  Name        string      `@Ident '='`
  Expressions Expressions `@@ { @@ } '.'`
}

type EBNF struct {
  Productions []*Production `{ @@ }`
}

func main() {
  parser, err := participle.Build(&EBNF{})
  if err != nil { panic(err) }

  ebnf := &EBNF{}
  err = parser.Parse(os.Stdin, ebnf)
  if err != nil { panic(err) }

  json.NewEncoder(os.Stdout).Encode(ebnf)
}

Performance

One of the included examples is a complete Thrift parser (shell-style comments are not supported). This gives a convenient baseline for comparing to the PEG based pigeon, which is the parser used by go-thrift. Additionally, the pigeon parser is utilising a generated parser, while the participle parser is built at run time.

You can run the benchmarks yourself, but here's the output on my machine:

BenchmarkParticipleThrift-4        10000      221818 ns/op     48880 B/op     1240 allocs/op
BenchmarkGoThriftParser-4           2000      804709 ns/op    170301 B/op     3086 allocs/op

On a real life codebase of 47K lines of Thrift, Participle takes 200ms and go- thrift takes 630ms, which aligns quite closely with the benchmarks.

Documentation ¶

Overview ¶

Package participle constructs parsers from definitions in struct tags and parses directly into those structs. The approach is philosophically similar to how other marshallers work in Go, "unmarshalling" an instance of a grammar into a struct.

The supported annotation syntax is:

`@<expr>` Capture expression into the field.
`@@` Recursively capture using the fields own type.
`<identifier>` Match named lexer token.
`{ ... }` Match 0 or more times.
`( ... )` Group.
`[ ... ]` Optional.
`"..."[:<identifier>]` Match the literal, optionally specifying the exact lexer token type to match.
`<expr> <expr> ...` Match expressions.
`<expr> | <expr>` Match one of the alternatives.

Here's an example of an EBNF grammar.

type Group struct {
    Expression *Expression `"(" @@ ")"`
}

type Option struct {
    Expression *Expression `"[" @@ "]"`
}

type Repetition struct {
    Expression *Expression `"{" @@ "}"`
}

type Literal struct {
    Start string `@String` // lexer.Lexer token "String"
    End   string `[ "…" @String ]`
}

type Term struct {
    Name       string      `@Ident |`
    Literal    *Literal    `@@ |`
    Group      *Group      `@@ |`
    Option     *Option     `@@ |`
    Repetition *Repetition `@@`
}

type Sequence struct {
    Terms []*Term `@@ { @@ }`
}

type Expression struct {
    Alternatives []*Sequence `@@ { "|" @@ }`
}

type Expressions []*Expression

type Production struct {
    Name        string      `@Ident "="`
    Expressions Expressions `@@ { @@ } "."`
}

type EBNF struct {
    Productions []*Production `{ @@ }`
}

Index ¶

Variables
type Capture
type Error
- func (e Error) Error() string
type Mapper
type Option
type Parseable
type Parser
- func Build(grammar interface{}, options ...Option) (parser *Parser, err error)
- func MustBuild(grammar interface{}, options ...Option) *Parser

Constants ¶

This section is empty.

Variables ¶

View Source

var (

	// NextMatch should be returned by Parseable.Parse() method implementations to indicate
	// that the node did not match and that other matches should be attempted, if appropriate.
	NextMatch = errors.New("no match") // nolint: golint
)

Functions ¶

This section is empty.

Types ¶

type Capture ¶

type Capture interface {
	Capture(values []string) error
}

Capture can be implemented by fields in order to transform captured tokens into field values.

type Error ¶

type Error string

func (Error) Error ¶

func (e Error) Error() string

type Mapper ¶

type Mapper func(token lexer.Token) lexer.Token

Mapper function for mutating tokens before being applied to the AST.

type Option ¶

type Option func(p *Parser) error

An Option to modify the behaviour of the Parser.

func ClearMappers ¶

func ClearMappers() Option

ClearMappers is an Option that resets all existing (including default) mappers.

func Lexer ¶

func Lexer(def lexer.Definition) Option

Lexer is an Option that sets the lexer to use with the given grammar.

func Map ¶

func Map(mappers ...Mapper) Option

Map is an Option that configures the Parser to apply a mapping function to each Token from the lexer.

This can be useful to eg. upper-case all tokens of a certain type, or dequote strings.

func Unquote ¶

func Unquote(def lexer.Definition, types ...string) Option

Unquote applies strconv.Unquote() to tokens of the given types.

Tokens of type "String" will be unquoted if no other types are provided.

func Upper ¶

func Upper(def lexer.Definition, types ...string) Option

Upper is an Option that upper-cases all tokens of the given type. Useful for case normalisation.

func UseLookahead ¶

func UseLookahead() Option

UseLookahead builds lookahead tables for disambiguating branches.

NOTE: This is an experimental feature.

type Parseable ¶

type Parseable interface {
	// Parse into the receiver.
	//
	// Should return NextMatch if no tokens matched and parsing should continue.
	// Nil should be returned if parsing was successful.
	Parse(lex lexer.PeekingLexer) error
}

The Parseable interface can be implemented by any element in the grammar to provide custom parsing.

type Parser ¶

type Parser struct {
	// contains filtered or unexported fields
}

A Parser for a particular grammar and lexer.

func Build ¶

func Build(grammar interface{}, options ...Option) (parser *Parser, err error)

Build constructs a parser for the given grammar.

If "Lexer()" is not provided as an option, a default lexer based on text/scanner will be used. This scans typical Go- like tokens.

See documentation for details

func MustBuild ¶

func MustBuild(grammar interface{}, options ...Option) *Parser

MustBuild calls Build(grammar, options...) and panics if an error occurs.

func (*Parser) Lex ¶

func (p *Parser) Lex(r io.Reader) ([]lexer.Token, error)

Lex uses the parser's lexer to tokenise input.

func (*Parser) Parse ¶

func (p *Parser) Parse(r io.Reader, v interface{}) (err error)

Parse from r into grammar v which must be of the same type as the grammar passed to participle.Build().

func (*Parser) ParseBytes ¶

func (p *Parser) ParseBytes(b []byte, v interface{}) error

ParseBytes is a convenience around Parse().

func (*Parser) ParseString ¶

func (p *Parser) ParseString(s string, v interface{}) error

ParseString is a convenience around Parse().

func (*Parser) String ¶

func (p *Parser) String() string

String representation of the grammar.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
_examples
ebnf
expr nolint: govet	nolint: govet
expr2
hcl Package main implements a parser for HashiCorp's HCL configuration syntax.	Package main implements a parser for HashiCorp's HCL configuration syntax.
ini
protobuf nolint: govet	nolint: govet
sql nolint: govet	nolint: govet
thrift Package main implements a parser for Thrift files (https://thrift.apache.org/) It parses namespaces, exceptions, services, structs, consts, typedefs and enums, but is easily extensible to more.	Package main implements a parser for Thrift files (https://thrift.apache.org/) It parses namespaces, exceptions, services, structs, consts, typedefs and enums, but is easily extensible to more.
toml
lexer Package lexer defines interfaces and implementations used by Participle to perform lexing.	Package lexer defines interfaces and implementations used by Participle to perform lexing.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL