sregx

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 31, 2021 License: MIT Imports: 3 Imported by: 0

README

Structural Regular Expressions

Go Reference Go Report Card MIT License

sregx is a package and tool for using structural regular expressions as described by Rob Pike (link). sregx provides a very simple Go package for creating structural regular expression commands as well as a library for parsing and compiling sregx commands from the text format used in Pike's description. A CLI tool for using structural regular expressions is also provided in ./cmd/sregx, allowing you to perform advanced text manipulation from the command-line.

In a structural regular expression, regular expressions are composed using commands to perform tasks like advanced search and replace. A command has an input string and produces an output string. The following commands are supported:

  • p: prints the input string, and then returns the input string.
  • d: returns the empty string.
  • c/<s>/: returns the string <s>.
  • s/<p>/<s>/: returns a string where substrings matching the regular expression <p> have been replaced with <s>.
  • g/<p>/<cmd>: if <p> matches the input, returns the result of <cmd> evaluated on the input. Otherwise returns the input with no modification.
  • v/<p>/<cmd>: if <p> does not match the input, returns the result of <cmd> evaluated on the input. Otherwise returns the input with no modification.
  • x/<p>/<cmd>: returns a string where all substrings matching the regular expression <p> have been replaced with the return value of <cmd> applied to the particular substring.
  • y/<p>/<cmd>: returns a string where each part of the string that is not matched by <cmd> is replaced by applying <cmd> to the particular unmatched string.
  • n[N:M]<cmd>: returns the application of <cmd> to the input sliced from [N:M). Accepts negative numbers to refer to offsets from the end of the input. Offsets are zero-indexed.
  • l[N:M]<cmd>: returns the application of <cmd> to the input sliced from line N to line M (exclusive). Assumes newlines are represented with the \n character. Accepts negative numbers to refer to offsets from the last line of the input. Lines are zero-indexed.
  • u/<sh>/: executes the shell command <sh> with the input as stdin and returns the resulting stdout of the command. Shell commands use a simple syntax where single or double quotes can be used to group arguments, and environment variables are accessible with $. This command is only directly available as part of the sregx CLI tool.

The commands n[...], m[...], and u are additions to the original description of structural regular expressions.

The sregx tool also provides another augmentation to the original sregx description from Pike: command pipelines. A command may be given as <cmd> | <cmd> | ... where the input of each command is the output of the previous one.

Examples

Most of these examples are from Pike's description, so you can look there for more detailed explanation. Since p is the only command that prints, technically you must append | p to commands that search and replace, because otherwise nothing will be printed. However, since you will probably forget to do this, the sregx tool will print the result of the final command before terminating if there were no uses of p anywhere within the command. Thus when using the CLI tool you can omit the | p in the following commands and still see the result.

Print all lines that contain "string":

x/.*\n/ g/string/p

Delete all occurrences of "string" and print the result:

x/string/d | p

Replace all occurrences of "foo" with "bar" in the range of lines 5-10 (zero-indexed):

l[5:10]s/foo/bar/ | p

Print all lines containing "rob" but not "robot":

x/.*\n/ g/rob v/robot/p

Capitalize all occurrences of the word "i":

x/[A-Za-z]+/ g/i/ v/../ c/I/ | p

or (more simply)

x/[A-Za-z]+/ g/^i$/ c/I/ | p

Print the last line of every paragraph that begins with "foo", where a paragraph is defined as text with no empty lines:

x/(.+\n)+/ g/^foo/ l[-2:-1]p

Change all occurrences of the complete word "foo" to "bar" except those occurring in double or single quoted strings:

y/".*"/ y/'.*'/ x/[a-zA-Z]+/ g/^foo$/ c/bar/ | p

Replace the complete word "TODAY" with the current date:

x/[A-Z]+/ g/^TODAY$/ u/date/ | p

Capitalize all words:

x/[a-zA-Z]+/ x/^./ u/tr a-z A-Z/ | p

Note: it is highly recommended when using the CLI tool that you enclose expressions in single or double quotes to prevent your shell from interpreting special characters.

Installation

There are three ways to install sregx.

  1. Download the prebuilt binary from the releases page (comes with man file).

  2. Install from source:

git clone https://github.com/zyedidia/sregx
cd sregx
make build # or make install to install to $GOBIN
  1. Install with go get (version info will be missing):
go get github.com/zyedidia/sregx/cmd/sregx
Usage

To use the CLI tool, first pass the expression and then the input file. If no file is given, stdin will be used. Here is an example to capitalize all occurrences of the word 'i' in file.txt:

sregx 'x/[A-Za-z]+/ g/^i$/ c/I/' file.txt

The tool tries to provide high quality error messages when you make a mistake in the expression syntax.

Base library

The base library is very simple and small (roughly 100 lines of code). In fact, it is surprisingly simple and elegant for something that can provide such powerful text manipulation, and I recommend reading the code if you are interested. Each type of command may be manually created directly in tree form. See the Go documentation for details.

Syntax library

The syntax library supports parsing and compiling a string into a structural regular expression command. The syntax follows certain rules, such as using "/" as a delimiter. The backslash (\) may be used to escape / or \, or to create special characters such as \n, \r, or \t. The syntax also supports specifying arbitrary bytes using octal, for example \14. Regular expressions use the Go syntax described here.

Future Work

Here are some ideas for some features that could be implemented in the future.

  • Internal manipulation language. Currently the u command runs shell commands. This is very flexible but can be costly because a new process is run to perform each transformation. For better performance we could provide a small language that has some string manipulation functions like toupper. A good candidate for this language would be Lua. This would also improve Windows support since most Windows environments lack utilities like tr.
  • Different regex engine. The Go regex engine is pretty good, but isn't especially performant. We could switch to Oniguruma (see the oniguruma branch), although this would mean using cgo.
  • Structural PEGs. Use PEGs instead of regular expressions.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IndexN

func IndexN(b, sep []byte, n int) (index int)

IndexN find index of n-th sep in b

func ReplaceAllComplementFunc

func ReplaceAllComplementFunc(re *regexp.Regexp, b []byte, repl func([]byte) []byte) []byte

ReplaceAllComplementFunc returns a copy of b in which all parts that are not matched by re have been replaced by the return value of the function repl applied to the unmatched byte slice. In other words, b is split according to re, and all components of the split are replaced according to repl.

func ReplaceSlice

func ReplaceSlice(b []byte, start, end int, repl []byte) []byte

ReplaceSlice returns a copy of b where the range start:end has been replaced with repl.

Types

type C

type C struct {
	Change []byte
}

C performs changes. No matter the input, it always returns the Change slice.

func (C) Evaluate

func (c C) Evaluate(b []byte) []byte

Evaluate returns Change.

type Command

type Command interface {
	Evaluate(b []byte) []byte
}

A Command modifies an input byte slice in some way and returns the new one.

type CommandPipeline

type CommandPipeline []Command

A CommandPipeline represents a list of commands chained together in a pipeline.

func (CommandPipeline) Evaluate

func (cp CommandPipeline) Evaluate(b []byte) []byte

Evaluate runs each command in the pipeline, passing the previous command's output as the next command's input.

type D

type D struct{}

D performs deletion. No matter the input, evaluation returns an empty slice.

func (D) Evaluate

func (d D) Evaluate(b []byte) []byte

Evaluate deletes the input by returning nothing.

type Evaluator

type Evaluator func(b []byte) []byte

Evaluator is a function that performs a transformation.

type G

type G struct {
	Patt *regexp.Regexp
	Cmd  Command
}

G performs conditional evaluation. If Patt matches the input, the entire input text is evaluated using Cmd (not just the part that matched).

func (G) Evaluate

func (g G) Evaluate(b []byte) []byte

Evaluate applies Cmd if Patt matches b.

type L

type L struct {
	Start int
	End   int
	Cmd   Command
}

L extracts a slice of lines from the input and replaces that slice with the return value of Cmd evaluated on it.

func (L) Evaluate

func (l L) Evaluate(b []byte) []byte

Evaluate calculates the offsets for the line range Start:End and replaces that part of the input with the application of Cmd to it.

type N

type N struct {
	Start int
	End   int
	Cmd   Command
}

N extracts a slice of the input and replaces that slice with the return value of Cmd evaluated on it.

func (N) Evaluate

func (n N) Evaluate(b []byte) []byte

Evaluate calculates slices the input with [start:end] and replaces that part of the input with the application of Cmd to it.

type P

type P struct {
	W io.Writer
}

P writes the input to W.

func (P) Evaluate

func (p P) Evaluate(b []byte) []byte

Evaluate returns b without modification and prints it.

type S

type S struct {
	Patt    *regexp.Regexp
	Replace []byte
}

S performs substitution. All occurrences of Patt in the input are replaced with Replace. Inside Replace, $ signs are expanded so for instance $1 represents the text of the first submatch.

func (S) Evaluate

func (s S) Evaluate(b []byte) []byte

Evaluate performs substitution on b.

type U

type U struct {
	Evaluator Evaluator
}

U is a user-defined command. The user provides the evaluator function that is used to perform the transformation.

func (U) Evaluate

func (u U) Evaluate(b []byte) []byte

Evaluate applies the evaluator function.

type V

type V struct {
	Patt *regexp.Regexp
	Cmd  Command
}

V performs complement conditional evaluation. If Patt does not match the input text the entire input is evaluated using Cmd.

func (V) Evaluate

func (v V) Evaluate(b []byte) []byte

Evaluate applies Cmd if Patt does not match b.

type X

type X struct {
	Patt *regexp.Regexp
	Cmd  Command
}

X performs extraction. On every match of Patt in the input it replaces the match with the output of evaluating Cmd on the match.

func (X) Evaluate

func (x X) Evaluate(b []byte) []byte

Evaluate replaces all parts of b that are matached by Patt with the application of Cmd to those substrings.

type Y

type Y struct {
	Patt *regexp.Regexp
	Cmd  Command
}

Y performs complement extraction. It is the same as X but extracts the pieces in the source between Patt and applies Cmd to those.

func (Y) Evaluate

func (y Y) Evaluate(b []byte) []byte

Evaluate replaces all parts of b that aren't matched by Patt with the application of Cmd to those substrings.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL