README

Rematch

Rematch is a basic, stripped down query language that performs order-independent matching against strings.

A Rematch expression is composed of alphanumeric, case sensitive words & patterns to be matched against an arbitrary string. This matching occurs in linear time.

A "word" is identified as a token delimited by whitespaces, and behaves as a Regex word boundary \b would.

  • Word order is disregarded unlike most Regex flavors.
  • When word matching, only alphanumeric tokens are compared with one another. Before matching occurs, any invalid characters present in the string will be replaced with whitespaces before being split with whitespace delimiters.

A "pattern" is simply a string with wildcard operators present.

  • Unlike a word, it is matched against the entire string rather than word tokens.
  • This can allow matching more complex patterns such as URLs or words that may have punctuation or other non-alphanumeric characters present.

Rematch supports the following grammar:

  • | OR operator, used between words. (This word OR this word must be present in any order)
  • + AND operator, used between words. (This word AND this word must be present in any order)
  • * wildcard (0 to n). When evaluating, * gets converted into a lazy match wildcard in regex: [\s\S]*?.
  • ? wildcard (0 to 1). When evaluating, ? gets converted into a regex [\s\S]?.
  • _ whitespace wildcard (0 to n). When evaluating, _ gets converted into a lazy whitespace match in regex: [\s]*?. Works like an asterisk * wildcard, but only captures whitespaces instead of all characters.
  • () grouping to override standard operator precedence, which is left to right.
  • ! NOT operator, used before words. Use this with caution, as you may end up with broad query matches.
  • Excluding wildcards, words must be alphanumeric; no whitespaces (as it is captured by _).

Implementation

Rematch uses the Shunting-yard algorithm to parse a Rematch expression into tokens. These tokens are arranged in Reverse Polish notation, and are then evaluated into a boolean result when compared against an arbitrary string.

Rematch is only partially dependent on Go's Regexp package for matching word tokens with wildcards. It does not transpile an expression from Rematch into Regex as Go's Regex flavor does not support lookaheads and non-order dependent word matching.

Getting Started

Installing

You can get the latest release of Rematch by using:

go get github.com/pixeltopic/rematch
import "github.com/pixeltopic/rematch"
Usage
// the words "moon" and "cow" must both be present in the string.
res, _ := rematch.EvalRawExpr("moon+cow", "The cow jumped over the moon.") 
fmt.Println(res)

See /examples for more.

License

This project is licensed under the BSD 3-Clause License

Acknowledgments

These resources were invaluable for the implementation of Rematch.

Expand ▾ Collapse ▴

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Eval

func Eval(expr *Expr, text *Text) (bool, error)

Eval matches an expression against text

func EvalExpr

func EvalExpr(expr *Expr, s string) (bool, error)

EvalExpr matches an expression against a string.

func EvalRawExpr

func EvalRawExpr(expr, s string) (bool, error)

EvalRawExpr matches a raw expression against a string

Types

type EvalError

type EvalError string

EvalError occurs when an expression fails to evaluate because it is in improper RPN

func (EvalError) Error

func (e EvalError) Error() string

type Expr

type Expr struct {
	// contains filtered or unexported fields
}

Expr represents a Rematch expression.

func NewExpr

func NewExpr(rawExpr string) *Expr

NewExpr returns a new Expression for evaluation.

func (*Expr) Compile

func (e *Expr) Compile() error

Compile an expression. A compiled expression will not be recompiled. This is useful when reusing an expression multiple times against different texts

func (*Expr) Compiled

func (e *Expr) Compiled() bool

Compiled returns if the expression has been compiled into Reverse Polish notation.

func (*Expr) MarshalJSON

func (e *Expr) MarshalJSON() ([]byte, error)

MarshalJSON implements JSON marshalling

func (*Expr) RPN

func (e *Expr) RPN() []string

RPN returns the expression in Reverse Polish notation.

func (*Expr) Raw

func (e *Expr) Raw() string

Raw returns the raw expression string before conversion into Reverse Polish notation. Validation of a raw expression is not confirmed until it is compiled.

func (*Expr) UnmarshalJSON

func (e *Expr) UnmarshalJSON(data []byte) error

UnmarshalJSON implements JSON unmarshalling

type Result

type Result struct {
	Match   bool
	Strings []string
}

Result is the output after evaluating a query.

Strings contains a non-unique/non-ordered collection of token matches from the given expression.

func ExprFindAll

func ExprFindAll(expr *Expr, s string) (*Result, error)

ExprFindAll matches an expression against a string, returning all matched tokens if true

func FindAll

func FindAll(expr *Expr, text *Text) (*Result, error)

FindAll matches an expression against text, returning all matched tokens if true

func RawExprFindAll

func RawExprFindAll(expr, s string) (*Result, error)

RawExprFindAll matches a raw expression against a string, returning all matched tokens if true

type SyntaxError

type SyntaxError string

SyntaxError occurs when an expression is malformed.

func (SyntaxError) Error

func (e SyntaxError) Error() string

type Text

type Text struct {
	// contains filtered or unexported fields
}

Text contains text to match against an Expression. This may be helpful if you want to match many different expressions against the same block of text without reprocessing it

func NewText

func NewText(s string) *Text

NewText returns a text instance to match against an Expression.

Directories

Path Synopsis
examples
internal/set
internal/stack