gofind

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 10, 2025 License: Apache-2.0 Imports: 2 Imported by: 0

README

gofind

Go Reference Go Report Card

gofind is a lightweight optimized Go package for pattern prefix matching. It supports multiple input types through Go generics and is designed for performance-critical applications requiring fast pattern matching on ASCII, Unicode, or binary data.

Features

  • Fast: Optimized algorithms
  • Flexible Wildcards: Supports *, ?, and . wildcards with character classes
  • Type-Safe Generics: Single API supporting string and []byte types
  • Unicode Support: Full Unicode support with proper UTF-8 character handling
  • Case-Insensitive Matching: Built-in case-folding with Unicode support
  • Zero Allocations: Direct []byte support to minimize memory overhead
  • Efficient Complexity: Avoids the exponential runtime of naive solutions. The time complexity is O(m*n) for pattern length m and string length n.
Supported Wildcards
  • *: Matches zero or more characters
  • ?: Matches zero or one character (any character)
  • .: Matches any single character except newline
  • [abc]: Character class matching any character in the set
  • [!abc] or [^abc]: Negated character class
  • [a-z]: Character range matching
  • \*, \?, \., \[: Escape sequences for literal characters

Installation

go get github.com/twinfer/gofind

Usage

package main

import (
    "github.com/twinfer/gofind"
)

func main() {

    // ### Basic Prefix Matching The `gofind.Match` function returns position and error:

    // String matching with ? wildcard (matches zero or one character)
    pos, err := gofind.Match("h?llo*world", "hello beautiful world")    // Output: pos=21, err=nil (? matches 'e')
    
    // ? can also match zero characters
    pos, err = gofind.Match("h?llo*world", "hllo beautiful world")  // Output: pos=20, err=nil (? matches zero characters)

    // Does not match
    pos, err = gofind.Match("h?llo*world", "goodbye world") // Output: pos=0, err=ErrNoMatch

    // Unicode-aware matching with strings - . matches any char except newline
    pos, err := gofind.Match("café.", "café1") // Output: pos=6, err=nil (. matches '1')

    // . does not match newlines
    pos, err = gofind.Match("café.", "café\n")   // Output: pos=0, err=ErrNoMatch (. does not match newline)
    
    // ? wildcard with Unicode characters (matches zero or one)
    pos, err = gofind.Match("caf?", "café") // Output: pos=4, err=nil (? matches 'é')
    
    pos, err = gofind.Match("caf?", "caf")  // Output: pos=3, err=nil (? matches zero characters)
    
    pos, err = gofind.Match("café*", "café au lait")    // Output: pos=13, err=nil (* matches ' au lait')

    // **### Byte Slice Matching** Zero-allocation matching with `[]byte`:

    pattern := []byte("*.txt")
    filename := []byte("document.txt")
    pos, err := gofind.Match(pattern, filename) // Output: pos=12, err=nil

    // ### Case-Insensitive Matching Use `MatchFold` for case-insensitive matching:
        // ASCII case-insensitive
    pos, err := gofind.MatchFold("HELLO*", "hello world")   // Output: pos=11, err=nil

    // Unicode case-insensitive with strings
    pos, err = gofind.MatchFold("CAFÉ*", "café au lait")    // Output: pos=13, err=nil
    
    // ? wildcard with case-insensitive matching (zero or one character)
    pos, err = gofind.MatchFold("caf?", "CAFÉ") // Output: pos=4, err=nil (? matches 'É')
    
    pos, err = gofind.MatchFold("caf?", "CAF")  // Output: pos=3, err=nil (? matches zero characters)

    // ### Dot Wildcard (Any Character Except Newline) The `.` wildcard is useful for matching any character while avoiding newlines:

       // . matches any character except newline
    pos, err := gofind.Match("file.txt", "file1.txt")   // Output: pos=9, err=nil (. matches '1')
    
    pos, err = gofind.Match("file.txt", "file .txt")    // Output: pos=9, err=nil (. matches space)
    
    // Useful for identifiers and filenames
    pos, err = gofind.Match("var.", "var_") // Output: pos=4, err=nil (. matches '_')
    
    pos, err = gofind.Match("user.name", "user_name")   // Output: pos=9, err=nil (. matches '_')
    
    pos, err = gofind.Match("user.name", "user name")   // Output: pos=9, err=nil (. matches space)

    // ### Character Classes

        // Match any vowel
    pos, err := gofind.Match("h[aeiou]llo", "hello")    // Output: pos=5, err=nil

    // Match anything except digits
    pos, err = gofind.Match("file[!0-9].txt", "fileA.txt")  // Output: pos=9, err=nil
    
    // Range matching
    pos, err = gofind.Match("[a-z][0-9]", "a5") // Output: pos=2, err=nil

}

API Overview

The simplified generic API provides two main functions:

Function Signature Description
Match[T] (pattern, s T) (int, error) Case-sensitive prefix matching for string or []byte
MatchFold[T] (pattern, s T) (int, error) Case-insensitive prefix matching for string or []byte
Return Values
  • (position, nil): Pattern consumed successfully, returns position where pattern matching completed
  • (0, ErrBadPattern): Pattern is malformed (syntax error)
  • (0, ErrNoMatch): Pattern does not match the input

Zero-allocation matching for binary & string data with full Unicode support

Performance

Contributing

Contributions are welcome! Please feel free to submit a pull request.

To run the tests for the project:

go test ./...                    # Run all tests
go test -v ./...                 # Run tests with verbose output  
go test -fuzz=FuzzMatch          # Run fuzz testing for Match function

License

This project is licensed under the Apache License 2.0.

Documentation

Overview

Package gofind provides highly optimized functions for prefix matching strings against patterns containing wildcards. It features a dual-stream architecture that delivers maximum performance for both ASCII-only and Unicode scenarios.

Performance Architecture:

The package uses two specialized implementations:

  • ASCII-only stream: Zero UTF-8 overhead, direct byte operations (2-5x faster)
  • Unicode stream: Full UTF-8 support with case-insensitive matching

Core Functions:

  • Match: ASCII-optimized case-sensitive prefix matching
  • MatchFold: Unicode-aware case-insensitive prefix matching

The functions automatically route to the appropriate implementation for optimal performance.

Supported Wildcards:

  • `*`: Matches any sequence of characters (including zero characters)
  • `?`: Matches zero or one character (any character)
  • `.`: Matches any single character except newline
  • `[abc]`: Matches any character in the set (a, b, or c)
  • `[!abc]` or `[^abc]`: Matches any character not in the set
  • `[a-z]`: Matches any character in the range a to z
  • `\*`, `\?`, `\.`, `\[`: Matches the literal character

Type Support:

The package supports two input types through Go generics:

  • `string`: Automatic routing to ASCII or Unicode implementation
  • `[]byte`: Zero-allocation matching for binary data and performance-critical code

Character Classes:

Character classes ([abc], [a-z], [!xyz]) are always case-sensitive, even when using MatchFold. This maintains compatibility with standard glob behavior.

Performance Guidance:

- Use Match() for ASCII-only patterns when maximum speed is needed - Use MatchFold() for Unicode patterns or when case-insensitive matching is required - ASCII-only matching provides 2-5x performance improvement over Unicode-aware matching

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Match

func Match[T ~string | ~[]byte](pattern, s T) (int, error)

Match returns the position where the pattern was fully consumed from the input data using case-sensitive comparison with prefix matching behavior. It supports two types:

  • string: Optimized ASCII-only string matching for maximum performance
  • []byte: Zero-allocation matching for byte slices

Return values:

  • (position, nil): Position where the pattern was consumed (prefix match successful)
  • (0, ErrBadPattern): Pattern is malformed (syntax error)
  • (0, ErrNoMatch): Pattern does not match the input

This function performs prefix matching, meaning it succeeds if the beginning of the input matches the pattern, even if there are extra characters at the end.

This function uses an optimized ASCII-only algorithm for maximum performance. For Unicode support, use MatchFold instead.

Examples:

Match("hello", "hello world")           // Returns (5, nil) - prefix match
Match([]byte("GET "), []byte("GET /api")) // Returns (4, nil) - prefix match
Match("file?.txt", "file.txt")           // Returns (8, nil) - ? matches zero characters
Match("file?.txt", "fileX.txt")          // Returns (9, nil) - ? matches one character
Match("a*c", "axcde")                    // Returns (3, nil) - prefix match successful

func MatchFold

func MatchFold[T ~string | ~[]byte](pattern, s T) (int, error)

MatchFold returns the position where the pattern was fully consumed from the input data using case-insensitive comparison with Unicode folding and prefix matching behavior. It supports two types:

  • string: UTF-8 aware case-insensitive matching with Unicode folding
  • []byte: Zero-allocation case-insensitive matching for byte slices

Return values:

  • (position, nil): Position where the pattern was consumed (prefix match successful)
  • (0, ErrBadPattern): Pattern is malformed (syntax error)
  • (0, ErrNoMatch): Pattern does not match the input

This function performs prefix matching, meaning it succeeds if the beginning of the input matches the pattern, even if there are extra characters at the end.

The function uses the Unicode-aware matching algorithm with case folding. Character classes remain case-sensitive to maintain standard glob behavior.

Examples:

MatchFold("HELLO", "hello world")           // Returns (5, nil) - case-insensitive prefix match
MatchFold("CAFÉ", "café au lait")           // Returns (4, nil) - Unicode case-insensitive prefix
MatchFold("FILE?.TXT", "file.txt")           // Returns (8, nil) - ? matches zero characters
MatchFold("FILE?.TXT", "fileX.txt")          // Returns (9, nil) - ? matches one character

Types

type MatchResult

type MatchResult struct {
	Position int
	Err      error
}

MatchResult holds the result of a pattern match operation.

func MatchFoldMultiple

func MatchFoldMultiple[S ~string | ~[]byte](patterns []S, s S) []MatchResult

MatchFoldMultiple concurrently matches a single input against multiple patterns(case-insensitive). It returns a slice of MatchResult structs where each element corresponds to the pattern at the same index.

The order of results corresponds to the order of input patterns.

Example:

patterns := []string{"Foo*", "foo*", "baz[0-9]"}
matches := MatchFoldMultiple(patterns, "foobar")
// matches[0] = {Position: 6, Err: nil} (first pattern matches fully)
// matches[1] = {Position: 6, Err: nil} (second pattern matches fully)
// matches[2] = {Position: 0, Err: ErrNoMatch} (third pattern doesn't match)

func MatchMultiple

func MatchMultiple[S ~string | ~[]byte](patterns []S, s S) []MatchResult

Directories

Path Synopsis
internal
find
Package find contains optimized wildcard matching implementations.
Package find contains optimized wildcard matching implementations.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL