matchgo

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 4, 2024 License: MIT Imports: 3 Imported by: 0

README

Banner

matchgo

Go version License GoLang Logo

A simple and experimental regex engine written in Go. This library is in development, so use it with caution!

Overview

matchgo is a minimalistic regex engine that allows you to compile regex patterns, check strings for matches, and extract matched groups. While this implementation doesn't fully utilize all traditional regex construction techniques, it is inspired by the resources listed below.

Diagram

Installation

To add matchgo as a dependency:

go get github.com/Ravikisha/matchgo

Usage

Here’s how to use matchgo:

import "github.com/Ravikisha/matchgo"

pattern, err := matchgo.Compile("your-regex-pattern")
if err != nil {
    // handle error
}

result := pattern.Test("your-string")
if result.Matches {
    // Access matched groups by name
    groupMatchString := result.Groups["group-name"]
}

To find all matches in a string, use FindMatches:

matches := pattern.FindMatches("your-string")
for _, match := range matches {
    // Process each match
    if match.Matches {
        fmt.Println("Match found:", match.Groups)
    }
}

Example

Here’s an example of how to use matchgo:

package main

import (
  "fmt"
  "github.com/Ravikisha/matchgo"
)

func main() {
  pattern, err := matchgo.Compile("([a-z]+) ([0-9]+)")
  if err != nil {
    fmt.Println("Error compiling pattern:", err)
    return
  }

  result := pattern.Test("hello 123")
  if result.Matches {
    fmt.Println("Match found:", result.Groups)
  }
}

This code will output:

Match found: map[0:hello 123 1:hello 2:123]

Features

  • ^ beginning of the string
  • $ end of the string
  • . any single character/wildcard
  • bracket notation
  • [ ] bracket notation/ranges
  • [^ ] bracket negation notation
  • better handling of bracket expressions, e.g., [ab-exy12]
  • support for special characters in brackets - escape character support
  • quantifiers
  • * none or more times
  • + one or more times
  • ? optional
  • {m,n} between m and n times
  • capturing groups
  • ( ) capturing groups
  • \n backreference (e.g., (dog)\1)
  • \k<name> named backreference
  • string extraction for matches
  • \ escape character
  • special character support (context-dependent)
  • improved error handling
  • multiline support (tested with Alice in Wonderland corpus)
  • . does not match newlines (\n)
  • $ matches newlines (\n)
  • multiple full matches in one text

Notes

  • Escape sequences (\) turn the next character into a literal without extended combinations like \d or \b.
  • Backreferences (\n) are limited to single-digit references, so \10 would be interpreted as group 1 followed by a literal 0.

Resources

These resources were consulted while developing matchgo:

This project was inspired by rgx.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Check

func Check(regexString string, inputString string) (Result, *RegexError)

Check compiles the regexString and tests the inputString against it

func Compile

func Compile(regexString string) (*State, *RegexError)

Compile compiles the given regex string

Types

type ParseErrorCode

type ParseErrorCode string
const (
	SyntaxError      ParseErrorCode = "SyntaxError"
	CompilationError                = "CompilationError"
)

type RegexError

type RegexError struct {
	Code    ParseErrorCode
	Message string
	Pos     int
}

func (*RegexError) Error

func (p *RegexError) Error() string

type Result

type Result struct {
	Matches bool
	Groups  map[string]string
}

type State

type State struct {
	// contains filtered or unexported fields
}

func (*State) FindMatches

func (s *State) FindMatches(inputString string) []Result

func (*State) Test

func (s *State) Test(inputString string) Result

Test checks if the given input string conforms to this NFA

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL