revregex

package module

v0.3.1 Latest Latest Go to latest Published: May 9, 2022 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/xavier268/revregex

Links

Open Source Insights

README ¶

revregex

A package you can use to generate all possible strings matching a given regular expression.

How to use


import (
    "fmt"
    "github.com/xavier268/revregex"
    )

    // start from a POSIX regular expression pattern
    pattern := "a(b|c)a?"
    // create a generator for this pattern
	generator, err := NewGen(pattern)
    if err != nil { 
        ... 
        }
    // create a source of entropy
	chooser := NewRandChooser() 

	
	// Now, you can ask the generator to create a string using the chooser to make its decisions.
	result := generator.Next(chooser) // for instance, result will get "aca" or "ab".
		
    // if you don't trust the package and want to verify that the string actually matches ...
    err = generator.Verify(result)
    if err != nil {
        ...
        }

See full example file here

Regex syntax

Any POSIX format regexes valid in Golang is accepted.

No flags are available. Grouping or named captures have no effect. Word, line or text boundaries are meaningless here and should be avoided.

Unbounded metacharacteres such as * or + are accepted. They generate shorter strings first, giving an exponentially decreasing priority to the longer strings.

Beware of negative classes, such as [^a] or the dot "." operator, because they will likely generate a lot of strange unicode caracters ( up to 4 bytes characters ! ). Prefer to use positive classes such as [[:digit:]] or [a-zA-Z] to limit unprintable characters.

See reference here

Deterministic or random ?

To choose which string to generate among the many - possibly unlimited - strings matching the provided pattern, one needs to make choices. These choices are driven by something that fullfils the Chooser interface.


type Chooser interface {
	// Intn provides a number between 0 and (n-1) included.
	Intn(n int) int
}

Two ways to construct a Chooser are provided :

NewRandChooser to make decision randomly. If you computer has a good random generator, you most likely will endup generating all the shorter possible strings.
NewBytesChooser, which takes a []byte as input, will use the information contained in this array to make its choices. There is a one-to-one relation between a given byte array and the sequence of strings generated. However, the information from the provided byte array is consumed as we generate strings and make choices, and, at some point, once there is no more information (underlying array is nil or contains only zeros), the defaults choices will always be made, generating from that point always the same default answer.

Use for Fuzzing

This can be useful for fuzzing if you want to fuzz a function whose argument matches a specific regex pattern. You could, of course, check if the pattern is matched when starting the test, and skip if not matching. However, even with medium complexity pattern, the likelyhood to find relevant pattern is very low, and the likelyhood to explore the pattern space is even lower.

With this library, instead of fuzzing based on the function argument itself, you can fuzz an int64 that will be used as the random seed to generate a few strings matching the required pattern. The benefit is that you would explore the valid entry set more thoroughly, and faster. You could still, however, fuzz the direct way as well to check how your function behaves hen the input pattern is invalid.

Deduplicate generated strings

The Deduper interface can be used to detect duplicated string. Querying for uniqueness will both register the string and return a uniqueness response. Querying twice will always return false ...

It is currently available in two flavors :

NewDedupMap is based on a map[string]bool. Is is always exact, but the memory footprint is not bounded.
NewDedupBloom is a bloom-filter implementation. It has a bounded memory footprint and query time, but when a certain volume is reached, false positive (wrongly flaging original strings as duplicates) will start to occur. However, if a string is flaged as unique, it is really unique. The default size for the bloom filter is a reasonable value up to a few hundred thousands different strings.

Documentation ¶

Index ¶

Constants
Variables
type Chooser
type Deduper
- func NewDedupBloom(bloomsize int) Deduper
- func NewDedupMap() Deduper
type Gen

Examples ¶

NewGen

Constants ¶

View Source

const DefaultBloomSize = 256 * 256 * 256 * 16 // total filter size in bits

Reasonable default value up to a few hundred thousands different strings.

View Source

const MaxUnicode = '\U0010ffff'

MaxUnicode is the maximum Unicode character that can be generated.

View Source

const VERSION = "0.3.1"

Variables ¶

View Source

var ErrVerificationFailed = fmt.Errorf("verification failed")

Functions ¶

This section is empty.

Types ¶

type Chooser ¶

type Chooser interface {
	// Intn provides a number between 0 and (n-1) included.
	Intn(n int) int
}

Chooser defines an interface that allows to make choices.

func NewBytesChooser ¶

func NewBytesChooser(buf []byte) Chooser

NewBytesChooser uses buf as a source of information for decision. This makes the exploration of all possible strings perfectily deterministic. Using a chooser to make a decision "consumes" the available information. When all information is "consumed", defaults or first choices will always be the one chosen.

func NewRandChooser ¶

func NewRandChooser() Chooser

NewRandChooser uses random as the source for decision. It is guaranteed that no string has a zero probability, but longer strings have a much lower chance of appearing.

func NewRandChooserSeed ¶

func NewRandChooserSeed(seed int64) Chooser

NewRandChooserSeed uses random as the source for decision. It is guaranteed that no string has a zero probability, but longer strings have a much lower chance of appearing. Setting the seed allows for reproductibility in tests.

type Deduper ¶ added in v0.3.0

type Deduper interface {
	// Unique is true if string is seen for the first time.
	Unique(st string) bool
}

Deduper is an object that can be used to deduplicate strings.

func NewDedupBloom ¶ added in v0.3.0

func NewDedupBloom(bloomsize int) Deduper

Creates a bloom-filter based Deduper. Memory footprint is fixed, driven by bloomsize setting in bits. True result are always right, but after a while, False result are only most likely right. The larger the bloomsize, the less false positive.

func NewDedupMap ¶ added in v0.3.0

func NewDedupMap() Deduper

Creates a map-based Deduper. Memory footprint will grow indefinitely, but response is always exact.

type Gen ¶

type Gen struct {
	// contains filtered or unexported fields
}

Gen can generate deterministic or random strings that match a given regexp. Gen is thread safe.

func Must ¶ added in v0.2.0

func Must(g *Gen, e error) *Gen

Must is a utilty to panic on error, when creating a Gen. Typicla use is : g := Must(NewGen(pattern))

func NewGen ¶

func NewGen(source string) (*Gen, error)

NewGen creates a new generator. It returns an error if the regexp provided is not syntaxicaly correct. Use POSIX syntax. No implicit parse tree simplification.

Example ¶

pattern := "a(b|c)a?"
generator, err := NewGen(pattern)
if err != nil {
	fmt.Println(err)
}
entropy := NewRandChooserSeed(42) // or use NewRandInter() for real randomness ...

// Generate 5 strings that match "a(b|c)a?"
for i := 0; i < 5; i++ {
	result := generator.Next(entropy)
	fmt.Println(result)

	// Verify each generated string actually matches ?
	if err = generator.Verify(result); err != nil {
		fmt.Println("Verification failed with error : ", err)
	}
}

Output:

aca
ab
aca
ac
aba

func NewGenSimpl ¶

func NewGenSimpl(source string) (*Gen, error)

Same as NewGen, but in addition, the tree is simplified.

func (*Gen) Next ¶

func (g *Gen) Next(it Chooser) string

Next generate a string that match the provided regexp, using the provided Chooser to make its choices.

func (*Gen) String ¶

func (g *Gen) String() string

func (*Gen) Verify ¶

func (g *Gen) Verify(s string) error

Verify if a string match the regexp used to create g.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL