mexpr

package module
v1.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 30, 2024 License: MIT Imports: 9 Imported by: 4

README

MicroExpr

Go Reference Go Report Card GitHub tag (latest by date)

A small & fast dependency-free library for parsing micro expressions.

This library was originally built for use in templating languages (e.g. for-loop variable selection, if-statement evaluation) so is minimal in what it supports by design. If you need a more full-featured expression parser, check out antonmedv/expr instead.

Features:

  • Fast, low-allocation parser and runtime
    • Many simple expressions are zero-allocation
  • Type checking during parsing
  • Simple
    • Easy to learn
    • Easy to read
    • No hiding complex branching logic in expressions
  • Intuitive, e.g. "id" + 1 => "id1"
  • Useful error messages, example:
    missing right operand
      not (1- <= 5)
      ......^
    
  • Fuzz tested to prevent crashes

Usage

Try it out on the Go Playground! You can find many example expressions in the tests.

import "github.com/danielgtaylor/mexpr"

// Convenience for lexing/parsing/running in one step:
result, err := mexpr.Eval("a > b", map[string]interface{}{
	"a": 2,
	"b": 1,
})

// Manual method with type checking and fast AST re-use. Error handling is
// omitted for brevity.
l := mexpr.NewLexer("a > b")
p := mexpr.NewParser(l)
ast, err := mexpr.Parse()
typeExamples = map[string]interface{}{
	"a": 2,
	"b": 1,
}
err := mexpr.TypeCheck(ast, typeExamples)
interpreter := mexpr.NewInterpreter(ast)
result1, err := interpreter.Run(map[string]interface{}{
	"a": 1,
	"b": 2,
})
result2, err := interpreter.Run(map[string]interfae{}{
	"a": 150,
	"b": 30,
})

Pretty errors use the passed-in input along with the error's offset to display an arrow of where within the expression the error occurs.

inputStr := "2 * foo"
_, err := mexpr.Eval(inputStr, nil)
if err != nil {
	fmt.Println(err.Pretty(inputStr))
}
Options

When running the interpreter a set of options can be passed in to change behavior. Available options:

Option Default Description
StrictMode false Be more strict, for example return an error when an identifier is not found rather than nil
UnquotedStrings false Enable the use of unquoted strings, i.e. return a string instead of nil for undefined parameters
// Using the top-level eval
mexpr.Eval(expression, inputObj, StrictMode)

// Using an interpreter instance
interpreter.Run(inputObj, StrictMode)

Syntax

Literals
  • strings double quoted e.g. "hello"
  • numbers e.g. 123, 2.5, 1_000_000

Internally all numbers are treated as float64, which means fewer conversions/casts when taking arbitrary JSON/YAML inputs.

Accessing properties
  • Use . between property names
  • Use [ and ] for indexes, which can be negative
foo.bar[0].value
Arithmetic operators
  • + (addition)
  • - (subtration)
  • * (multiplication)
  • / (division)
  • % (modulus)
  • ^ (power)
(1 + 2) * 3^2

Math operations between constants are precomputed when possible, so it is efficient to write meaningful operations like size <= 4 * 1024 * 1024. The interpreter will see this as size <= 4194304.

Comparison operators
  • == (equal)
  • != (not equal)
  • < (less than)
  • > (greater than)
  • <= (less than or equal to)
  • >= (greater than or equal to)
100 >= 42
Logical operators
  • not (negation)
  • and
  • or
1 < 2 and 3 < 4

Non-boolean values are converted to booleans. The following result in true:

  • numbers greater than zero
  • non-empty string
  • array with at least one item
  • map with at least one key/value pair
String operators
  • Indexing, e.g. foo[0]
  • Slicing, e.g. foo[1:2] or foo[2:]
  • .length pseudo-property, e.g. foo.length
  • .lower pseudo-property for lowercase, e.g. foo.lower
  • .upper pseudo-property for uppercase, e.g. foo.upper
  • + (concatenation)
  • in e.g. "f" in "foo"
  • contains e.g. "foo" contains "f"
  • startsWith e.g. "foo" startsWith "f"
  • endsWith e.g. "foo" endsWith "o"

Indexes are zero-based. Slice indexes are optional and are inclusive. foo[1:2] returns el if the foo is hello. Indexes can be negative, e.g. foo[-1] selects the last item in the array.

Any value concatenated with a string will result in a string. For example "id" + 1 will result in "id1".

There is no distinction between strings, bytes, or runes. Everything is treated as a string.

Date Comparisons

String dates & times can be compared if they follow RFC 3339 / ISO 8601 with or without timezones.

  • before, e.g. start before "2020-01-01"
  • after, e.g. created after "2020-01-01T12:00:00Z"
Array/slice operators
  • Indexing, e.g. foo[1]
  • Slicing, e.g. foo[1:2] or foo[2:]
  • .length pseudo-property, e.g. foo.length
  • + (concatenation)
  • in (has item), e.g. 1 in foo
  • contains e.g. foo contains 1

Indexes are zero-based. Slice indexes are optional and are inclusive. foo[1:2] returns [2, 3] if the foo is [1, 2, 3, 4]. Indexes can be negative, e.g. foo[-1] selects the last item in the array.

Array/slice filtering

A where clause can be used to filter the items in an array. The left side of the clause is the array to be filtered, while the right side is an expression to run on each item of the array. If the right side expression evaluates to true then the item is added to the result slice. For example:

// Get a list of items where the item.id is bigger than 3
items where id > 3

// More complex example
items where (id > 3 and labels contains "best")

This also makes it possible to implement one/any/all/none logic:

// One
(items where id > 3).length == 1

// Any
items where id > 3
(items where id > 3).length > 0

// All
(items where id > 3).length == items.length

// None
not (items where id > 3)
(items where id > 3).length == 0
Map operators
  • Accessing values, e.g. foo.bar.baz
  • in (has key), e.g. "key" in foo
  • contains e.g. foo contains "key"
Map wildcard filtering

A where clause can be used as a wildcard key to filter values for all keys in a map. The left side of the clause is the map to be filtered, while the right side is an expression to run on each value of the map. If the right side expression evaluates to true then the value is added to the result slice. For example, given:

{
  "operations": {
    "id1": { "method": "GET", "path": "/op1" },
    "id2": { "method": "PUT", "path": "/op2" },
    "id3": { "method": "DELETE", "path": "/op3" }
  }
}

You can run:

// Get all operations where the HTTP method is GET
operations where method == "GET"

And the result would be a slice of matched values:

[{ "method": "GET", "path": "/op1" }]

Performance

Performance compares favorably to antonmedv/expr for both Eval(...) and cached program performance, which is expected given the more limited feature set. The slow benchmarks include lexing/parsing/interpreting while the cached ones are just the interpreting step. The complex example expression used is non-trivial: foo.bar / (1 * 1024 * 1024) >= 1.0 and "v" in baz and baz.length > 3 and arr[2:].length == 1.

goos: darwin
goarch: amd64
pkg: github.com/danielgtaylor/mexpr
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Benchmark/mexpr-field-slow-12           3673572       286.5 ns/op    144 B/op      6 allocs/op
Benchmark/_expr-field-slow-12            956689      1276 ns/op     1096 B/op     23 allocs/op

Benchmark/mexpr-comparison-slow-12      1000000      1020 ns/op      656 B/op     16 allocs/op
Benchmark/_expr-comparison-slow-12       383491      3069 ns/op     2224 B/op     38 allocs/op

Benchmark/mexpr-logical-slow-12         1000000      1063 ns/op      464 B/op     17 allocs/op
Benchmark/_expr-logical-slow-12          292824      4148 ns/op     2336 B/op     38 allocs/op

Benchmark/mexpr-math-slow-12            1000000      1035 ns/op      656 B/op     16 allocs/op
Benchmark/_expr-math-slow-12             399708      3004 ns/op     2184 B/op     38 allocs/op

Benchmark/mexpr-string-slow-12          1822945       655.6 ns/op    258 B/op     10 allocs/op
Benchmark/_expr-string-slow-12           428604      2508 ns/op     1640 B/op     35 allocs/op

Benchmark/mexpr-index-slow-12           2015856       592.0 ns/op    280 B/op     10 allocs/op
Benchmark/_expr-index-slow-12            517360      2301 ns/op     1872 B/op     30 allocs/op

Benchmark/mexpr-complex-slow-12          244039      5078 ns/op     2232 B/op     64 allocs/op
Benchmark/_expr-complex-slow-12           69387     16825 ns/op    14378 B/op    107 allocs/op

Benchmark/mexpr-field-cached-12       100000000        11.37 ns/op     0 B/op      0 allocs/op
Benchmark/_expr-field-cached-12         7761153       146.5 ns/op     48 B/op      2 allocs/op

Benchmark/mexpr-comparison-cached-12   38098502        30.93 ns/op     0 B/op      0 allocs/op
Benchmark/_expr-comparison-cached-12    4563463       251.0 ns/op     64 B/op      3 allocs/op

Benchmark/mexpr-logical-cached-12      37563720        31.35 ns/op     0 B/op      0 allocs/op
Benchmark/_expr-logical-cached-12      11000991       105.9 ns/op     32 B/op      1 allocs/op

Benchmark/mexpr-math-cached-12         24463279        47.41 ns/op     8 B/op      1 allocs/op
Benchmark/_expr-math-cached-12          4531693       268.0 ns/op     72 B/op      4 allocs/op

Benchmark/mexpr-string-cached-12       43399368        26.83 ns/op     0 B/op      0 allocs/op
Benchmark/_expr-string-cached-12        7302940       162.0 ns/op     48 B/op      2 allocs/op

Benchmark/mexpr-index-cached-12        45289230        25.67 ns/op     0 B/op      0 allocs/op
Benchmark/_expr-index-cached-12         6057562       180.0 ns/op     48 B/op      2 allocs/op

Benchmark/mexpr-complex-cached-12       4271955       278.7 ns/op     40 B/op      3 allocs/op
Benchmark/_expr-complex-cached-12       1456266       818.7 ns/op    208 B/op      9 allocs/op

On average mexpr is around 3-10x faster for both full parsing and cached performance.

References

These were a big help in understanding how Pratt parsers work:

Documentation

Overview

Package mexpr provides a simple expression parser.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Parse

func Parse(expression string, types any, options ...InterpreterOption) (*Node, Error)

Parse an expression and return the abstract syntax tree. If `types` is passed, it should be a set of representative example values for the input which will be used to type check the expression against.

Types

type Error

type Error interface {
	Error() string

	// Offset returns the character offset of the error within the experssion.
	Offset() uint16

	// Length returns the length in bytes after the offset where the error ends.
	Length() uint8

	// Pretty prints out a message with a pointer to the source location of the
	// error.
	Pretty(source string) string
}

Error represents an error at a specific location.

func Eval

func Eval(expression string, input any, options ...InterpreterOption) (any, Error)

Eval is a convenience function which lexes, parses, and executes an expression with the given input. If you plan to execute the expression multiple times consider caching the output of `Parse(...)` instead for a big speed improvement.

func NewError

func NewError(offset uint16, length uint8, format string, a ...interface{}) Error

NewError creates a new error at a specific location.

func Run

func Run(ast *Node, input any, options ...InterpreterOption) (any, Error)

Run executes an AST with the given input and returns the output.

func TypeCheck

func TypeCheck(ast *Node, types any, options ...InterpreterOption) Error

TypeCheck will take a parsed AST and type check against the given input structure with representative example values.

type Interpreter

type Interpreter interface {
	Run(value any) (any, Error)
}

Interpreter executes expression AST programs.

func NewInterpreter

func NewInterpreter(ast *Node, options ...InterpreterOption) Interpreter

NewInterpreter returns an interpreter for the given AST.

type InterpreterOption

type InterpreterOption int

InterpreterOption passes configuration settings when creating a new interpreter instance.

const (
	// StrictMode does extra checks like making sure identifiers exist.
	StrictMode InterpreterOption = iota

	// UnqoutedStrings enables the use of unquoted string values rather than
	// returning nil or a missing identifier error. Identifiers get priority
	// over unquoted strings.
	UnquotedStrings
)

type Lexer

type Lexer interface {
	// Next returns the next token from the expression. The returned token may
	// be changed in-place on subsequent calls and should not be stored.
	Next() (*Token, Error)
}

Lexer returns tokens from an input expression.

func NewLexer

func NewLexer(expression string) Lexer

NewLexer creates a new lexer for the given expression.

type Node

type Node struct {
	Type   NodeType
	Length uint8
	Offset uint16
	Left   *Node
	Right  *Node
	Value  interface{}
}

Node is a unit of the binary tree that makes up the abstract syntax tree.

func (Node) Dot added in v1.5.0

func (n Node) Dot(prefix string) string

Dot returns a graphviz-compatible dot output, which can be used to render the parse tree at e.g. https://dreampuf.github.io/GraphvizOnline/ or locally. You must wrap the output with `graph G {` and `}`.

func (Node) String added in v1.5.0

func (n Node) String() string

String converts the node to a string representation (basically the node name or the node's value for identifiers/literals).

type NodeType

type NodeType uint8

NodeType defines the type of the abstract syntax tree node.

const (
	NodeUnknown NodeType = iota
	NodeIdentifier
	NodeLiteral
	NodeAdd
	NodeSubtract
	NodeMultiply
	NodeDivide
	NodeModulus
	NodePower
	NodeEqual
	NodeNotEqual
	NodeLessThan
	NodeLessThanEqual
	NodeGreaterThan
	NodeGreaterThanEqual
	NodeAnd
	NodeOr
	NodeNot
	NodeFieldSelect
	NodeArrayIndex
	NodeSlice
	NodeSign
	NodeIn
	NodeContains
	NodeStartsWith
	NodeEndsWith
	NodeBefore
	NodeAfter
	NodeWhere
)

Possible node types

type Parser

type Parser interface {
	// Parse the expression and return the root node.
	Parse() (*Node, Error)
}

Parser takes a lexer and parses its tokens into an abstract syntax tree.

func NewParser

func NewParser(lexer Lexer) Parser

NewParser creates a new parser that uses the given lexer to get and process tokens into an abstract syntax tree.

type Token

type Token struct {
	Type   TokenType
	Length uint8
	Offset uint16
	Value  string
}

Token describes a single token produced by the lexer.

func (*Token) String

func (t *Token) String() string

type TokenType

type TokenType uint8

TokenType defines the type of token produced by the lexer.

const (
	TokenUnknown TokenType = iota
	TokenIdentifier
	TokenDot
	TokenNumber
	TokenString
	TokenLeftParen
	TokenRightParen
	TokenLeftBracket
	TokenRightBracket
	TokenSlice
	TokenAddSub
	TokenMulDiv
	TokenPower
	TokenComparison
	TokenAnd
	TokenOr
	TokenNot
	TokenStringCompare
	TokenWhere
	TokenEOF
)

Token

func (TokenType) String added in v1.5.0

func (t TokenType) String() string

type TypeChecker added in v1.2.0

type TypeChecker interface {
	Run(value any) Error
}

TypeChecker checks to ensure types used for operations will work.

func NewTypeChecker added in v1.2.0

func NewTypeChecker(ast *Node, options ...InterpreterOption) TypeChecker

NewTypeChecker returns a type checker for the given AST.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL