Documentation
¶
Overview ¶
Package simd provides SIMD-accelerated string operations for high-performance byte searching. The package automatically selects the best implementation based on available CPU features (AVX2/SSE4.2 on x86-64) and falls back to optimized pure Go implementations on other platforms.
The primary use case is accelerating regex engine prefilters by quickly finding literal bytes/substrings in large text buffers.
Index ¶
- Variables
- func ByteRank(b byte) byte
- func CountNonASCII(data []byte) int
- func FirstNonASCII(data []byte) int
- func IsASCII(data []byte) bool
- func Memchr(haystack []byte, needle byte) int
- func Memchr2(haystack []byte, needle1, needle2 byte) int
- func Memchr3(haystack []byte, needle1, needle2, needle3 byte) int
- func MemchrDigit(haystack []byte) int
- func MemchrDigitAt(haystack []byte, at int) int
- func MemchrInTable(haystack []byte, table *[256]bool) int
- func MemchrNotInTable(haystack []byte, table *[256]bool) int
- func MemchrNotWord(haystack []byte) int
- func MemchrPair(haystack []byte, byte1, byte2 byte, offset int) int
- func MemchrWord(haystack []byte) int
- func Memmem(haystack, needle []byte) int
- type RareByteInfo
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ByteFrequencies = [256]byte{}/* 256 elements not displayed */
ByteFrequencies contains empirical byte frequency ranks based on analysis of English text, source code, and binary data.
Lower rank = rarer byte (better candidate for SIMD search). Higher rank = more common byte (worse candidate).
The table is derived from:
- English text corpus analysis
- Source code repositories (Go, Rust, C, Python)
- Binary file sampling
This matches the approach used by Rust's memchr crate for optimal rare byte selection in substring search.
Reference: https://github.com/BurntSushi/memchr
Functions ¶
func ByteRank ¶ added in v0.9.0
ByteRank returns the frequency rank of a byte. Lower values indicate rarer bytes (better for search optimization).
func CountNonASCII ¶ added in v0.11.0
CountNonASCII returns the number of non-ASCII bytes in the slice. This is useful for statistics and deciding whether to use UTF-8 or ASCII paths.
Performance: Same as IsASCII (processes 8 bytes at a time using SWAR).
func FirstNonASCII ¶ added in v0.11.0
FirstNonASCII returns the index of the first non-ASCII byte, or -1 if all bytes are ASCII. This is useful for finding where UTF-8 sequences begin.
Performance: Same as IsASCII but returns early on first non-ASCII byte.
func IsASCII ¶ added in v0.11.0
IsASCII checks if all bytes in the slice are ASCII (< 0x80). Returns true if all bytes have the high bit clear (values 0x00-0x7F).
This function is critical for UTF-8 optimization in the regex engine. When input is ASCII-only, the engine can skip UTF-8 decoding overhead and use simplified automata (1 NFA state for '.' instead of 39).
Performance characteristics (on x86-64 with AVX2):
- Small inputs (< 32 bytes): uses optimized SWAR (8 bytes at a time)
- Medium inputs (32B - 4KB): 10-15 GB/s throughput
- Large inputs (> 4KB): 20-30 GB/s throughput (memory bandwidth limited)
Algorithm (AVX2):
- Load 32 bytes into YMM register
- Use VPMOVMSKB to extract high bit of each byte
- If mask != 0, non-ASCII byte found
- Handle tail with scalar loop
Example:
data := []byte("hello world")
if simd.IsASCII(data) {
// Use ASCII-optimized path
}
Example with non-ASCII:
data := []byte("héllo")
if !simd.IsASCII(data) {
// Use UTF-8 path
}
func Memchr ¶
Memchr returns the index of the first instance of needle in haystack, or -1 if needle is not present in haystack.
This function is equivalent to bytes.IndexByte but uses SIMD instructions (AVX2/SSE4.2) when available on x86-64 platforms. It automatically falls back to pure Go implementation on other architectures.
Performance characteristics (on x86-64 with AVX2):
- Small inputs (< 64 bytes): approximately same as bytes.IndexByte
- Medium inputs (64B - 4KB): 2-5x faster than bytes.IndexByte
- Large inputs (> 4KB): 8-15x faster than bytes.IndexByte
The function uses aligned vector loads and processes 32 bytes per iteration when AVX2 is available, or 16 bytes with SSE4.2.
Example:
haystack := []byte("hello world")
pos := simd.Memchr(haystack, 'o')
if pos != -1 {
fmt.Printf("Found 'o' at position %d\n", pos) // Output: Found 'o' at position 4
}
Example with not found:
haystack := []byte("hello world")
pos := simd.Memchr(haystack, 'x')
if pos == -1 {
fmt.Println("Character 'x' not found")
}
func Memchr2 ¶
Memchr2 returns the index of the first instance of either needle1 or needle2 in haystack, or -1 if neither is present.
This function is significantly faster than calling Memchr twice or using a loop, as it checks both bytes simultaneously using SIMD instructions. The order of needles does not affect performance.
Performance characteristics (on x86-64 with AVX2):
- Processes both needle comparisons in parallel using vector operations
- Same throughput as Memchr (no overhead for checking two bytes)
- 5-10x faster than sequential search on large inputs
The function returns the position of whichever needle appears first in haystack. If both needles appear at the same position (impossible unless needle1 == needle2), the result is deterministic but unspecified.
Example:
haystack := []byte("hello world")
pos := simd.Memchr2(haystack, 'o', 'w')
if pos != -1 {
fmt.Printf("Found 'o' or 'w' at position %d\n", pos) // Output: Found 'o' or 'w' at position 4
}
Example searching for punctuation:
text := []byte("Hello, world!")
pos := simd.Memchr2(text, ',', '!')
if pos != -1 {
fmt.Printf("Found punctuation at position %d\n", pos) // Output: Found punctuation at position 5
}
func Memchr3 ¶
Memchr3 returns the index of the first instance of needle1, needle2, or needle3 in haystack, or -1 if none are present.
This function checks three bytes simultaneously using SIMD instructions, making it much faster than multiple sequential searches or a loop with three comparisons per iteration.
Performance characteristics (on x86-64 with AVX2):
- Processes all three needle comparisons in parallel
- Same throughput as Memchr and Memchr2
- Ideal for searching character classes (e.g., whitespace, delimiters)
The function returns the position of whichever needle appears first in haystack. The order of needle parameters does not affect performance.
Example searching for whitespace:
text := []byte("hello\tworld\nfoo")
pos := simd.Memchr3(text, ' ', '\t', '\n')
if pos != -1 {
fmt.Printf("Found whitespace at position %d\n", pos) // Output: Found whitespace at position 5
}
Example searching for multiple delimiters:
csv := []byte("name,age;city")
pos := simd.Memchr3(csv, ',', ';', '|')
if pos != -1 {
fmt.Printf("Found delimiter at position %d\n", pos) // Output: Found delimiter at position 4
}
func MemchrDigit ¶ added in v0.9.0
MemchrDigit returns the index of the first ASCII digit [0-9] in haystack, or -1 if no digit is found.
On AMD64 with AVX2 support and haystack >= 32 bytes, uses SIMD acceleration. Otherwise falls back to optimized scalar loop.
func MemchrDigitAt ¶ added in v0.9.0
MemchrDigitAt returns the index of the first ASCII digit [0-9] at or after position 'at' in haystack, or -1 if no digit is found.
func MemchrInTable ¶ added in v0.9.4
MemchrInTable finds the first byte where table[byte] is true. This is a general-purpose character class search using a 256-byte lookup table.
For specific classes like \w, use MemchrWord instead for better performance.
Returns position or -1 if not found.
func MemchrNotInTable ¶ added in v0.9.4
MemchrNotInTable finds the first byte where table[byte] is false. This is the complement of MemchrInTable.
Returns position or -1 if all bytes have table[byte] == true.
func MemchrNotWord ¶ added in v0.9.4
MemchrNotWord finds the first non-word character in haystack. Returns position or -1 if all bytes are word characters.
This is the complement of MemchrWord: it finds the first byte that is NOT in [A-Za-z0-9_].
Example:
haystack := []byte("hello world")
pos := simd.MemchrNotWord(haystack)
// pos == 5 (position of space)
func MemchrPair ¶ added in v0.9.0
MemchrPair finds the first position where byte1 appears at offset 0 and byte2 appears at the specified offset from byte1. This enables highly selective substring searching by verifying two bytes at their correct relative positions.
Parameters:
- haystack: the byte slice to search
- byte1: the first byte to find (anchor byte)
- byte2: the second byte to find
- offset: the distance from byte1 to byte2 (byte2 position = byte1 position + offset)
Returns the position of byte1 where both conditions are met, or -1 if not found.
This function is significantly more selective than single-byte search because false positives require both bytes to appear at exactly the right distance apart.
Example:
// Searching for pattern "ex" where 'e' is at offset 0 and 'x' is at offset 1
haystack := []byte("hello example world")
pos := simd.MemchrPair(haystack, 'e', 'x', 1)
// pos == 6 (position of 'e' in "example")
Example with larger offset:
// Searching for "@...com" where '@' is at offset 0 and 'c' is at offset 4
haystack := []byte("contact@test.com for info")
pos := simd.MemchrPair(haystack, '@', 'c', 9)
// pos == 7 (position of '@')
func MemchrWord ¶ added in v0.9.4
MemchrWord finds the first word character [A-Za-z0-9_] in haystack. Returns position or -1 if not found.
This function uses AVX2 SIMD instructions to process 32 bytes per iteration when available, achieving ~10x speedup over scalar search on large inputs.
Performance characteristics:
- Small inputs (< 32 bytes): scalar fallback
- Large inputs (>= 32 bytes): AVX2 processes 32 bytes/iteration
Example:
haystack := []byte(" hello123")
pos := simd.MemchrWord(haystack)
// pos == 3 (position of 'h')
func Memmem ¶
Memmem returns the index of the first instance of needle in haystack, or -1 if needle is not present in haystack.
This is equivalent to bytes.Index but uses SIMD acceleration via paired-byte search, followed by fast verification. The implementation combines a rare byte frequency heuristic with SIMD-accelerated scanning to achieve significant speedup over stdlib.
Performance characteristics (vs bytes.Index):
- Short needles (2-8 bytes): 5-20x faster
- Medium needles (8-32 bytes): 10-100x faster
- Long needles (> 32 bytes): 5-50x faster
Algorithm:
The function uses paired-byte SIMD search with frequency-based rare byte selection:
- Identify the two rarest bytes in needle using empirical frequency table
- Use MemchrPair to find candidates where both bytes appear at correct distance
- For each candidate, verify the full needle match
- Return position of first match or -1 if not found
The paired-byte approach dramatically reduces false positives compared to single-byte search, since matches require two specific bytes at exactly the right distance apart. For example, in "@example.com", both '@' (rank 25) and 'x' (rank 45) are used, requiring them to appear exactly 2 positions apart.
For longer needles (> 32 bytes), a simplified Two-Way string matching approach is used to maintain O(n+m) complexity and avoid pathological cases.
Example:
haystack := []byte("hello world")
needle := []byte("world")
pos := simd.Memmem(haystack, needle)
// pos == 6
Example with not found:
haystack := []byte("hello world")
needle := []byte("xyz")
pos := simd.Memmem(haystack, needle)
// pos == -1
Example with repeated patterns:
haystack := []byte("aaaaaabaaaa")
needle := []byte("aab")
pos := simd.Memmem(haystack, needle)
// pos == 5
Example ¶
Example demonstrates basic substring search
package main
import (
"fmt"
"github.com/coregx/coregex/simd"
)
func main() {
haystack := []byte("hello world")
needle := []byte("world")
pos := simd.Memmem(haystack, needle)
if pos != -1 {
fmt.Printf("Found at position %d\n", pos)
} else {
fmt.Println("Not found")
}
}
Output: Found at position 6
Example (EmptyNeedle) ¶
Example_emptyNeedle demonstrates that empty needle matches at start
package main
import (
"fmt"
"github.com/coregx/coregex/simd"
)
func main() {
haystack := []byte("hello")
needle := []byte("")
pos := simd.Memmem(haystack, needle)
fmt.Printf("Empty needle found at position %d\n", pos)
}
Output: Empty needle found at position 0
Example (HttpHeader) ¶
Example_httpHeader demonstrates searching in HTTP headers
package main
import (
"fmt"
"github.com/coregx/coregex/simd"
)
func main() {
header := []byte("Content-Type: application/json\r\nContent-Length: 1234\r\n")
needle := []byte("Content-Length:")
pos := simd.Memmem(header, needle)
if pos != -1 {
fmt.Printf("Found header at position %d\n", pos)
}
}
Output: Found header at position 32
Example (JsonKey) ¶
Example_jsonKey demonstrates searching for JSON keys
package main
import (
"fmt"
"github.com/coregx/coregex/simd"
)
func main() {
json := []byte(`{"name":"John","age":30,"city":"New York"}`)
needle := []byte(`"age"`)
pos := simd.Memmem(json, needle)
if pos != -1 {
fmt.Printf("Found 'age' key at position %d\n", pos)
}
}
Output: Found 'age' key at position 15
Example (NotFound) ¶
Example_notFound demonstrates the case when needle is not present
package main
import (
"fmt"
"github.com/coregx/coregex/simd"
)
func main() {
haystack := []byte("hello world")
needle := []byte("xyz")
pos := simd.Memmem(haystack, needle)
if pos == -1 {
fmt.Println("Not found")
}
}
Output: Not found
Example (RepeatedPattern) ¶
Example_repeatedPattern demonstrates finding patterns in repeated data
package main
import (
"fmt"
"github.com/coregx/coregex/simd"
)
func main() {
dna := []byte("ATATATGCGCGC")
pattern := []byte("GCGC")
pos := simd.Memmem(dna, pattern)
if pos != -1 {
fmt.Printf("Pattern found at position %d\n", pos)
fmt.Printf("Context: %s\n", dna[pos:pos+len(pattern)])
}
}
Output: Pattern found at position 6 Context: GCGC
Types ¶
type RareByteInfo ¶ added in v0.9.0
type RareByteInfo struct {
// Byte1 is the rarest byte found in the needle.
Byte1 byte
// Index1 is the position of Byte1 in the needle.
Index1 int
// Byte2 is the second rarest byte (different from Byte1).
Byte2 byte
// Index2 is the position of Byte2 in the needle.
Index2 int
}
RareByteInfo holds information about selected rare bytes for search.
func SelectRareBytes ¶ added in v0.9.0
func SelectRareBytes(needle []byte) RareByteInfo
SelectRareBytes finds the two rarest bytes in needle using the frequency table. This enables paired-byte SIMD search which is more selective than single-byte.
The algorithm:
- Start with first two bytes as candidates
- Iterate through needle, tracking the two rarest bytes seen
- Ensure Byte1 is always the rarest (lowest rank)
- Ensure Byte2 is different from Byte1
Returns RareByteInfo with the two rarest bytes and their positions. For needles shorter than 2 bytes, Byte2/Index2 may equal Byte1/Index1.