simd

package

v0.11.4 Latest Latest Go to latest Published: Jan 16, 2026 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/coregx/coregex

Links

Open Source Insights

Documentation ¶

Rendered for

Overview ¶

Package simd provides SIMD-accelerated string operations for high-performance byte searching. The package automatically selects the best implementation based on available CPU features (AVX2/SSE4.2 on x86-64) and falls back to optimized pure Go implementations on other platforms.

The primary use case is accelerating regex engine prefilters by quickly finding literal bytes/substrings in large text buffers.

Index ¶

Variables
func ByteRank(b byte) byte
func CountNonASCII(data []byte) int
func FirstNonASCII(data []byte) int
func IsASCII(data []byte) bool
func Memchr(haystack []byte, needle byte) int
func Memchr2(haystack []byte, needle1, needle2 byte) int
func Memchr3(haystack []byte, needle1, needle2, needle3 byte) int
func MemchrDigit(haystack []byte) int
func MemchrDigitAt(haystack []byte, at int) int
func MemchrInTable(haystack []byte, table *[256]bool) int
func MemchrNotInTable(haystack []byte, table *[256]bool) int
func MemchrNotWord(haystack []byte) int
func MemchrPair(haystack []byte, byte1, byte2 byte, offset int) int
func MemchrWord(haystack []byte) int
func Memmem(haystack, needle []byte) int
type RareByteInfo
- func SelectRareBytes(needle []byte) RareByteInfo

Constants ¶

This section is empty.

Variables ¶

View Source

var ByteFrequencies = [256]byte{}/* 256 elements not displayed */

ByteFrequencies contains empirical byte frequency ranks based on analysis of English text, source code, and binary data.

Lower rank = rarer byte (better candidate for SIMD search). Higher rank = more common byte (worse candidate).

The table is derived from:

English text corpus analysis
Source code repositories (Go, Rust, C, Python)
Binary file sampling

This matches the approach used by Rust's memchr crate for optimal rare byte selection in substring search.

Reference: https://github.com/BurntSushi/memchr

Functions ¶

func ByteRank ¶ added in v0.9.0

func ByteRank(b byte) byte

ByteRank returns the frequency rank of a byte. Lower values indicate rarer bytes (better for search optimization).

func CountNonASCII ¶ added in v0.11.0

func CountNonASCII(data []byte) int

CountNonASCII returns the number of non-ASCII bytes in the slice. This is useful for statistics and deciding whether to use UTF-8 or ASCII paths.

Performance: Same as IsASCII (processes 8 bytes at a time using SWAR).

func FirstNonASCII ¶ added in v0.11.0

func FirstNonASCII(data []byte) int

FirstNonASCII returns the index of the first non-ASCII byte, or -1 if all bytes are ASCII. This is useful for finding where UTF-8 sequences begin.

Performance: Same as IsASCII but returns early on first non-ASCII byte.

func IsASCII ¶ added in v0.11.0

func IsASCII(data []byte) bool

IsASCII checks if all bytes in the slice are ASCII (< 0x80). Returns true if all bytes have the high bit clear (values 0x00-0x7F).

This function is critical for UTF-8 optimization in the regex engine. When input is ASCII-only, the engine can skip UTF-8 decoding overhead and use simplified automata (1 NFA state for '.' instead of 39).

Performance characteristics (on x86-64 with AVX2):

Small inputs (< 32 bytes): uses optimized SWAR (8 bytes at a time)
Medium inputs (32B - 4KB): 10-15 GB/s throughput
Large inputs (> 4KB): 20-30 GB/s throughput (memory bandwidth limited)

Algorithm (AVX2):

Load 32 bytes into YMM register
Use VPMOVMSKB to extract high bit of each byte
If mask != 0, non-ASCII byte found
Handle tail with scalar loop

Example:

data := []byte("hello world")
if simd.IsASCII(data) {
    // Use ASCII-optimized path
}

Example with non-ASCII:

data := []byte("héllo")
if !simd.IsASCII(data) {
    // Use UTF-8 path
}

func Memchr ¶

func Memchr(haystack []byte, needle byte) int

Memchr returns the index of the first instance of needle in haystack, or -1 if needle is not present in haystack.

This function is equivalent to bytes.IndexByte but uses SIMD instructions (AVX2/SSE4.2) when available on x86-64 platforms. It automatically falls back to pure Go implementation on other architectures.

Performance characteristics (on x86-64 with AVX2):

Small inputs (< 64 bytes): approximately same as bytes.IndexByte
Medium inputs (64B - 4KB): 2-5x faster than bytes.IndexByte
Large inputs (> 4KB): 8-15x faster than bytes.IndexByte

The function uses aligned vector loads and processes 32 bytes per iteration when AVX2 is available, or 16 bytes with SSE4.2.

Example:

haystack := []byte("hello world")
pos := simd.Memchr(haystack, 'o')
if pos != -1 {
    fmt.Printf("Found 'o' at position %d\n", pos) // Output: Found 'o' at position 4
}

Example with not found:

haystack := []byte("hello world")
pos := simd.Memchr(haystack, 'x')
if pos == -1 {
    fmt.Println("Character 'x' not found")
}

func Memchr2 ¶

func Memchr2(haystack []byte, needle1, needle2 byte) int

Memchr2 returns the index of the first instance of either needle1 or needle2 in haystack, or -1 if neither is present.

This function is significantly faster than calling Memchr twice or using a loop, as it checks both bytes simultaneously using SIMD instructions. The order of needles does not affect performance.

Performance characteristics (on x86-64 with AVX2):

Processes both needle comparisons in parallel using vector operations
Same throughput as Memchr (no overhead for checking two bytes)
5-10x faster than sequential search on large inputs

The function returns the position of whichever needle appears first in haystack. If both needles appear at the same position (impossible unless needle1 == needle2), the result is deterministic but unspecified.

Example:

haystack := []byte("hello world")
pos := simd.Memchr2(haystack, 'o', 'w')
if pos != -1 {
    fmt.Printf("Found 'o' or 'w' at position %d\n", pos) // Output: Found 'o' or 'w' at position 4
}

Example searching for punctuation:

text := []byte("Hello, world!")
pos := simd.Memchr2(text, ',', '!')
if pos != -1 {
    fmt.Printf("Found punctuation at position %d\n", pos) // Output: Found punctuation at position 5
}

func Memchr3 ¶

func Memchr3(haystack []byte, needle1, needle2, needle3 byte) int

Memchr3 returns the index of the first instance of needle1, needle2, or needle3 in haystack, or -1 if none are present.

This function checks three bytes simultaneously using SIMD instructions, making it much faster than multiple sequential searches or a loop with three comparisons per iteration.

Performance characteristics (on x86-64 with AVX2):

Processes all three needle comparisons in parallel
Same throughput as Memchr and Memchr2
Ideal for searching character classes (e.g., whitespace, delimiters)

The function returns the position of whichever needle appears first in haystack. The order of needle parameters does not affect performance.

Example searching for whitespace:

text := []byte("hello\tworld\nfoo")
pos := simd.Memchr3(text, ' ', '\t', '\n')
if pos != -1 {
    fmt.Printf("Found whitespace at position %d\n", pos) // Output: Found whitespace at position 5
}

Example searching for multiple delimiters:

csv := []byte("name,age;city")
pos := simd.Memchr3(csv, ',', ';', '|')
if pos != -1 {
    fmt.Printf("Found delimiter at position %d\n", pos) // Output: Found delimiter at position 4
}

func MemchrDigit ¶ added in v0.9.0

func MemchrDigit(haystack []byte) int

MemchrDigit returns the index of the first ASCII digit [0-9] in haystack, or -1 if no digit is found.

On AMD64 with AVX2 support and haystack >= 32 bytes, uses SIMD acceleration. Otherwise falls back to optimized scalar loop.

func MemchrDigitAt ¶ added in v0.9.0

func MemchrDigitAt(haystack []byte, at int) int

MemchrDigitAt returns the index of the first ASCII digit [0-9] at or after position 'at' in haystack, or -1 if no digit is found.

func MemchrInTable ¶ added in v0.9.4

func MemchrInTable(haystack []byte, table *[256]bool) int

MemchrInTable finds the first byte where table[byte] is true. This is a general-purpose character class search using a 256-byte lookup table.

For specific classes like \w, use MemchrWord instead for better performance.

Returns position or -1 if not found.

func MemchrNotInTable ¶ added in v0.9.4

func MemchrNotInTable(haystack []byte, table *[256]bool) int

MemchrNotInTable finds the first byte where table[byte] is false. This is the complement of MemchrInTable.

Returns position or -1 if all bytes have table[byte] == true.

func MemchrNotWord ¶ added in v0.9.4

func MemchrNotWord(haystack []byte) int

MemchrNotWord finds the first non-word character in haystack. Returns position or -1 if all bytes are word characters.

This is the complement of MemchrWord: it finds the first byte that is NOT in [A-Za-z0-9_].

Example:

haystack := []byte("hello world")
pos := simd.MemchrNotWord(haystack)
// pos == 5 (position of space)

func MemchrPair ¶ added in v0.9.0

func MemchrPair(haystack []byte, byte1, byte2 byte, offset int) int

MemchrPair finds the first position where byte1 appears at offset 0 and byte2 appears at the specified offset from byte1. This enables highly selective substring searching by verifying two bytes at their correct relative positions.

Parameters:

haystack: the byte slice to search
byte1: the first byte to find (anchor byte)
byte2: the second byte to find
offset: the distance from byte1 to byte2 (byte2 position = byte1 position + offset)

Returns the position of byte1 where both conditions are met, or -1 if not found.

This function is significantly more selective than single-byte search because false positives require both bytes to appear at exactly the right distance apart.

Example:

// Searching for pattern "ex" where 'e' is at offset 0 and 'x' is at offset 1
haystack := []byte("hello example world")
pos := simd.MemchrPair(haystack, 'e', 'x', 1)
// pos == 6 (position of 'e' in "example")

Example with larger offset:

// Searching for "@...com" where '@' is at offset 0 and 'c' is at offset 4
haystack := []byte("contact@test.com for info")
pos := simd.MemchrPair(haystack, '@', 'c', 9)
// pos == 7 (position of '@')

func MemchrWord ¶ added in v0.9.4

func MemchrWord(haystack []byte) int

MemchrWord finds the first word character [A-Za-z0-9_] in haystack. Returns position or -1 if not found.

This function uses AVX2 SIMD instructions to process 32 bytes per iteration when available, achieving ~10x speedup over scalar search on large inputs.

Performance characteristics:

Small inputs (< 32 bytes): scalar fallback
Large inputs (>= 32 bytes): AVX2 processes 32 bytes/iteration

Example:

haystack := []byte("   hello123")
pos := simd.MemchrWord(haystack)
// pos == 3 (position of 'h')

func Memmem ¶

func Memmem(haystack, needle []byte) int

Memmem returns the index of the first instance of needle in haystack, or -1 if needle is not present in haystack.

This is equivalent to bytes.Index but uses SIMD acceleration via paired-byte search, followed by fast verification. The implementation combines a rare byte frequency heuristic with SIMD-accelerated scanning to achieve significant speedup over stdlib.

Performance characteristics (vs bytes.Index):

Short needles (2-8 bytes): 5-20x faster
Medium needles (8-32 bytes): 10-100x faster
Long needles (> 32 bytes): 5-50x faster

Algorithm:

The function uses paired-byte SIMD search with frequency-based rare byte selection:

Identify the two rarest bytes in needle using empirical frequency table
Use MemchrPair to find candidates where both bytes appear at correct distance
For each candidate, verify the full needle match
Return position of first match or -1 if not found

The paired-byte approach dramatically reduces false positives compared to single-byte search, since matches require two specific bytes at exactly the right distance apart. For example, in "@example.com", both '@' (rank 25) and 'x' (rank 45) are used, requiring them to appear exactly 2 positions apart.

For longer needles (> 32 bytes), a simplified Two-Way string matching approach is used to maintain O(n+m) complexity and avoid pathological cases.

Example:

haystack := []byte("hello world")
needle := []byte("world")
pos := simd.Memmem(haystack, needle)
// pos == 6

Example with not found:

haystack := []byte("hello world")
needle := []byte("xyz")
pos := simd.Memmem(haystack, needle)
// pos == -1

Example with repeated patterns:

haystack := []byte("aaaaaabaaaa")
needle := []byte("aab")
pos := simd.Memmem(haystack, needle)
// pos == 5

Example ¶

Example demonstrates basic substring search

package main

import (
	"fmt"

	"github.com/coregx/coregex/simd"
)

func main() {
	haystack := []byte("hello world")
	needle := []byte("world")

	pos := simd.Memmem(haystack, needle)
	if pos != -1 {
		fmt.Printf("Found at position %d\n", pos)
	} else {
		fmt.Println("Not found")
	}
}

Output:

Found at position 6

Example (EmptyNeedle) ¶

Example_emptyNeedle demonstrates that empty needle matches at start

package main

import (
	"fmt"

	"github.com/coregx/coregex/simd"
)

func main() {
	haystack := []byte("hello")
	needle := []byte("")

	pos := simd.Memmem(haystack, needle)
	fmt.Printf("Empty needle found at position %d\n", pos)
}

Output:

Empty needle found at position 0

Example (HttpHeader) ¶

Example_httpHeader demonstrates searching in HTTP headers

package main

import (
	"fmt"

	"github.com/coregx/coregex/simd"
)

func main() {
	header := []byte("Content-Type: application/json\r\nContent-Length: 1234\r\n")
	needle := []byte("Content-Length:")

	pos := simd.Memmem(header, needle)
	if pos != -1 {
		fmt.Printf("Found header at position %d\n", pos)
	}
}

Output:

Found header at position 32

Example (JsonKey) ¶

Example_jsonKey demonstrates searching for JSON keys

package main

import (
	"fmt"

	"github.com/coregx/coregex/simd"
)

func main() {
	json := []byte(`{"name":"John","age":30,"city":"New York"}`)
	needle := []byte(`"age"`)

	pos := simd.Memmem(json, needle)
	if pos != -1 {
		fmt.Printf("Found 'age' key at position %d\n", pos)
	}
}

Output:

Found 'age' key at position 15

Example (NotFound) ¶

Example_notFound demonstrates the case when needle is not present

package main

import (
	"fmt"

	"github.com/coregx/coregex/simd"
)

func main() {
	haystack := []byte("hello world")
	needle := []byte("xyz")

	pos := simd.Memmem(haystack, needle)
	if pos == -1 {
		fmt.Println("Not found")
	}
}

Output:

Not found

Example (RepeatedPattern) ¶

Example_repeatedPattern demonstrates finding patterns in repeated data

package main

import (
	"fmt"

	"github.com/coregx/coregex/simd"
)

func main() {
	dna := []byte("ATATATGCGCGC")
	pattern := []byte("GCGC")

	pos := simd.Memmem(dna, pattern)
	if pos != -1 {
		fmt.Printf("Pattern found at position %d\n", pos)
		fmt.Printf("Context: %s\n", dna[pos:pos+len(pattern)])
	}
}

Output:

Pattern found at position 6
Context: GCGC

Types ¶

type RareByteInfo ¶ added in v0.9.0

type RareByteInfo struct {
	// Byte1 is the rarest byte found in the needle.
	Byte1 byte
	// Index1 is the position of Byte1 in the needle.
	Index1 int
	// Byte2 is the second rarest byte (different from Byte1).
	Byte2 byte
	// Index2 is the position of Byte2 in the needle.
	Index2 int
}

RareByteInfo holds information about selected rare bytes for search.

func SelectRareBytes ¶ added in v0.9.0

func SelectRareBytes(needle []byte) RareByteInfo

SelectRareBytes finds the two rarest bytes in needle using the frequency table. This enables paired-byte SIMD search which is more selective than single-byte.

The algorithm:

Start with first two bytes as candidates
Iterate through needle, tracking the two rarest bytes seen
Ensure Byte1 is always the rarest (lowest rank)
Ensure Byte2 is different from Byte1

Returns RareByteInfo with the two rarest bytes and their positions. For needles shorter than 2 bytes, Byte2/Index2 may equal Byte1/Index1.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL