revregex

package module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 9, 2022 License: MIT Imports: 8 Imported by: 0

README

revregex

Go Reference Go Report Card

A package you can use to generate all possible strings matching a given regular expression.

How to use


import (
    "fmt"
    "github.com/xavier268/revregex"
    )

    // start from a POSIX regular expression pattern
    pattern := "a(b|c)a?"
    // create a generator for this pattern
	generator, err := NewGen(pattern)
    if err != nil { 
        ... 
        }
    // create a source of entropy
	chooser := NewRandChooser() 

	
	// Now, you can ask the generator to create a string using the chooser to make its decisions.
	result := generator.Next(chooser) // for instance, result will get "aca" or "ab".
		
    // if you don't trust the package and want to verify that the string actually matches ...
    err = generator.Verify(result)
    if err != nil {
        ...
        }
    

See full example file here

Regex syntax

Any POSIX format regexes valid in Golang is accepted.

No flags are available. Grouping or named captures have no effect. Word, line or text boundaries are meaningless here and should be avoided.

Unbounded metacharacteres such as * or + are accepted. They generate shorter strings first, giving an exponentially decreasing priority to the longer strings.

Beware of negative classes, such as [^a] or the dot "." operator, because they will likely generate a lot of strange unicode caracters ( up to 4 bytes characters ! ). Prefer to use positive classes such as [[:digit:]] or [a-zA-Z] to limit unprintable characters.

See reference here

Deterministic or random ?

To choose which string to generate among the many - possibly unlimited - strings matching the provided pattern, one needs to make choices. These choices are driven by something that fullfils the Chooser interface.


type Chooser interface {
	// Intn provides a number between 0 and (n-1) included.
	Intn(n int) int
}

Two ways to construct a Chooser are provided :

  • NewRandChooser to make decision randomly. If you computer has a good random generator, you most likely will endup generating all the shorter possible strings.

  • NewBytesChooser, which takes a []byte as input, will use the information contained in this array to make its choices. There is a one-to-one relation between a given byte array and the sequence of strings generated. However, the information from the provided byte array is consumed as we generate strings and make choices, and, at some point, once there is no more information (underlying array is nil or contains only zeros), the defaults choices will always be made, generating from that point always the same default answer.

Use for Fuzzing

This can be useful for fuzzing if you want to fuzz a function whose argument matches a specific regex pattern. You could, of course, check if the pattern is matched when starting the test, and skip if not matching. However, even with medium complexity pattern, the likelyhood to find relevant pattern is very low, and the likelyhood to explore the pattern space is even lower.

With this library, instead of fuzzing based on the function argument itself, you can fuzz an int64 that will be used as the random seed to generate a few strings matching the required pattern. The benefit is that you would explore the valid entry set more thoroughly, and faster. You could still, however, fuzz the direct way as well to check how your function behaves hen the input pattern is invalid.

Deduplicate generated strings

The Deduper interface can be used to detect duplicated string. Querying for uniqueness will both register the string and return a uniqueness response. Querying twice will always return false ...

It is currently available in two flavors :

  • NewDedupMap is based on a map[string]bool. Is is always exact, but the memory footprint is not bounded.
  • NewDedupBloom is a bloom-filter implementation. It has a bounded memory footprint and query time, but when a certain volume is reached, false positive (wrongly flaging original strings as duplicates) will start to occur. However, if a string is flaged as unique, it is really unique. The default size for the bloom filter is a reasonable value up to a few hundred thousands different strings.

Documentation

Index

Examples

Constants

View Source
const DefaultBloomSize = 256 * 256 * 256 * 16 // total filter size in bits

Reasonable default value up to a few hundred thousands different strings.

View Source
const MaxUnicode = '\U0010ffff'

MaxUnicode is the maximum Unicode character that can be generated.

View Source
const VERSION = "0.3.1"

Variables

View Source
var ErrVerificationFailed = fmt.Errorf("verification failed")

Functions

This section is empty.

Types

type Chooser

type Chooser interface {
	// Intn provides a number between 0 and (n-1) included.
	Intn(n int) int
}

Chooser defines an interface that allows to make choices.

func NewBytesChooser

func NewBytesChooser(buf []byte) Chooser

NewBytesChooser uses buf as a source of information for decision. This makes the exploration of all possible strings perfectily deterministic. Using a chooser to make a decision "consumes" the available information. When all information is "consumed", defaults or first choices will always be the one chosen.

func NewRandChooser

func NewRandChooser() Chooser

NewRandChooser uses random as the source for decision. It is guaranteed that no string has a zero probability, but longer strings have a much lower chance of appearing.

func NewRandChooserSeed

func NewRandChooserSeed(seed int64) Chooser

NewRandChooserSeed uses random as the source for decision. It is guaranteed that no string has a zero probability, but longer strings have a much lower chance of appearing. Setting the seed allows for reproductibility in tests.

type Deduper added in v0.3.0

type Deduper interface {
	// Unique is true if string is seen for the first time.
	Unique(st string) bool
}

Deduper is an object that can be used to deduplicate strings.

func NewDedupBloom added in v0.3.0

func NewDedupBloom(bloomsize int) Deduper

Creates a bloom-filter based Deduper. Memory footprint is fixed, driven by bloomsize setting in bits. True result are always right, but after a while, False result are only most likely right. The larger the bloomsize, the less false positive.

func NewDedupMap added in v0.3.0

func NewDedupMap() Deduper

Creates a map-based Deduper. Memory footprint will grow indefinitely, but response is always exact.

type Gen

type Gen struct {
	// contains filtered or unexported fields
}

Gen can generate deterministic or random strings that match a given regexp. Gen is thread safe.

func Must added in v0.2.0

func Must(g *Gen, e error) *Gen

Must is a utilty to panic on error, when creating a Gen. Typicla use is : g := Must(NewGen(pattern))

func NewGen

func NewGen(source string) (*Gen, error)

NewGen creates a new generator. It returns an error if the regexp provided is not syntaxicaly correct. Use POSIX syntax. No implicit parse tree simplification.

Example
pattern := "a(b|c)a?"
generator, err := NewGen(pattern)
if err != nil {
	fmt.Println(err)
}
entropy := NewRandChooserSeed(42) // or use NewRandInter() for real randomness ...

// Generate 5 strings that match "a(b|c)a?"
for i := 0; i < 5; i++ {
	result := generator.Next(entropy)
	fmt.Println(result)

	// Verify each generated string actually matches ?
	if err = generator.Verify(result); err != nil {
		fmt.Println("Verification failed with error : ", err)
	}
}
Output:

aca
ab
aca
ac
aba

func NewGenSimpl

func NewGenSimpl(source string) (*Gen, error)

Same as NewGen, but in addition, the tree is simplified.

func (*Gen) Next

func (g *Gen) Next(it Chooser) string

Next generate a string that match the provided regexp, using the provided Chooser to make its choices.

func (*Gen) String

func (g *Gen) String() string

func (*Gen) Verify

func (g *Gen) Verify(s string) error

Verify if a string match the regexp used to create g.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL