swar

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 21, 2025 License: MIT Imports: 1 Imported by: 0

README

adc63e1f-a7e3-4272-b3d4-8e60f88c9b92

swar: Faster byte processing Go Reference Go Report Card

Process 8 bytes at a time using an old technique called Simd Within a Register.

  • 🚀 Up to 6x faster than optimized byte-by-byte code
  • 🔌 Zero dependencies - no CGO or assembly required
  • 🧩 Dead simple API - works with your existing code
  • Fully portable - runs anywhere Go runs
chunks, remainder := swar.BytesToLanes(text)
for _, chunk := range chunks {
    // We can work with 8 bytes at a time!
    matches := swar.HighBitWhereEqual(chunk, spaces)
}

Installation

go get github.com/dans-stuff/swar

Core Operations

Category Operations Use Cases
Comparison Equal, Less, Greater Pattern matching, thresholds
Math Add, Subtract, Min/Max, Average Signal processing, stats
Bit Ops Swap nibbles, Reverse bits, Count ones Encoding, hashing
Selection Branchless conditional select Transformations

Real Performance

Operation Standard Go SWAR Speedup
Count character occurrences 19.30 ns 7.58 ns 2.55x
Find uppercase letters 31.41 ns 20.32 ns 1.55x
Convert case 61.96 ns 31.53 ns 1.96x
Detect anomalies 6.99 ns 4.17 ns 1.68x

Full Example: Character Counter

This example counts spaces in a string. For short strings it even outperforms stdlib bytes.Count, which is written in assembly! Find more examples in the examples_test.go file.

package main

import (
    "fmt"
    "github.com/dans-stuff/swar"
)

func main() {
    text := []byte("Hello, World!")
    
    // Process in 8-byte chunks
    lanes, remainder := swar.BytesToLanes(text)
    
    // Find spaces in parallel
    spaces := swar.Dupe(' ')
    count := 0
    
    for _, lane := range lanes {
        // Sets high bit in bytes equal to space
        matches := swar.HighBitWhereEqual(lane, spaces)
        // Count matches
        count += bits.OnesCount64(matches >> 7)
    }
    
    // Process any leftover bytes
    for _, c := range text[remainder:] {
        if c == ' ' {
            count++
        }
    }
    
    fmt.Printf("Found %d spaces\n", count)
}

Perfect For

  • Text Processing: UTF-8 validation, parser tokenization
  • Network Protocols: Header parsing, packet filtering
  • Image Processing: Thresholding, pixel transformations
  • Data Analysis: Time series anomaly detection

How It Works

SWAR treats a 64-bit integer as 8 parallel lanes, using clever bit manipulation to perform the same operation on all bytes simultaneously without branching.

License & Contributing

MIT Licensed. Contributions welcome!

Documentation

Index

Constants

View Source
const (
	// HighBits is a mask with the high bit set in all 8 bytes of a uint64
	HighBits uint64 = 0x8080_8080_8080_8080
)
View Source
const (
	// LowBits has the lowest bit set in each byte for value duplication
	LowBits uint64 = 0x0101_0101_0101_0101
)

Variables

View Source
var Lookup = struct {
	OnesPositions [256][]int
}{
	func() (res [256][]int) {
		for b := range res {
			for i := 0; i < 8; i++ {
				if b>>i&1 == 1 {
					res[b] = append(res[b], i)
				}
			}
		}
		return
	}()}

Lookup provides precomputed data for optimized operations OnesPositions maps byte values to positions of their set bits

Functions

func AbsoluteDifferenceBetweenBytes

func AbsoluteDifferenceBetweenBytes(a, b uint64) uint64

AbsoluteDifferenceBetweenBytes calculates |a-b| for each byte Computes unsigned distances for metrics and signal processing

func AddBytesWithMaximum

func AddBytesWithMaximum(a, b uint64) uint64

AddBytesWithMaximum performs byte-wise addition clamped at 255 Saturating addition to prevent overflow in all 8 bytes

func AddBytesWithWrapping

func AddBytesWithWrapping(a, b uint64) uint64

AddBytesWithWrapping performs byte-wise addition with wrap-around Parallel addition across all 8 bytes with overflow wrapping to zero

func AverageBytes

func AverageBytes(a, b uint64) uint64

AverageBytes calculates (a+b)/2 for each byte without overflow Perfect for signal processing, image manipulation, and smoothing

func BytesToLanes

func BytesToLanes(b []byte) ([]uint64, int)

BytesToLanes converts a []byte to []uint64 for SWAR processing Returns uint64 lanes and index where unused bytes begin

func CountOnesPerByte

func CountOnesPerByte(v uint64) uint64

CountOnesPerByte counts set bits in each byte Parallel population count for hamming distance and feature extraction

func Dupe

func Dupe(c byte) uint64

Dupe duplicates a byte across all 8 bytes of a uint64 Creates comparison values for parallel operations

func ExtractLowBits

func ExtractLowBits(v uint64) byte

ExtractLowBits packs the low bit from each byte into a single byte Compacts 8 comparison results into a single byte

func HighBitWhereEqual

func HighBitWhereEqual(v, cm uint64) uint64

HighBitWhereEqual sets the high bit (0x80) in each byte where v == cm Ideal for pattern matching and finding specific values in data

func HighBitWhereGreater

func HighBitWhereGreater(v, cm uint64) uint64

HighBitWhereGreater sets the high bit (0x80) in each byte where v > cm Perfect for threshold detection across multiple values

func HighBitWhereLess

func HighBitWhereLess(v, cm uint64) uint64

HighBitWhereLess sets the high bit (0x80) in each byte where v < cm Enables parallel comparison of 8 bytes simultaneously

func IntToLanes

func IntToLanes(i uint64) [8]byte

IntToLanes converts a uint64 to an 8-byte array Access individual bytes for mixed SWAR/byte-level operations

func LanesToBytes

func LanesToBytes(lanes []uint64) []byte

LanesToBytes converts []uint64 back to []byte Zero-copy conversion for optimal performance

func LanesToInt

func LanesToInt(lanes [8]byte) uint64

LanesToInt converts an 8-byte array to uint64 Zero-copy conversion from byte-level to SWAR format

func ReverseEachByte

func ReverseEachByte(v uint64) uint64

ReverseEachByte reverses the bit order within each byte Useful for endianness conversion and bit-level manipulations

func SelectByLowBit

func SelectByLowBit(a, b, mask uint64) uint64

SelectByLowBit selects values from a or b based on mask bits Branchless selection between values based on conditions

func SelectLargerBytes

func SelectLargerBytes(a, b uint64) uint64

SelectLargerBytes returns max(a,b) for each byte Ideal for peak detection, ceiling operations, and filtering

func SelectSmallerBytes

func SelectSmallerBytes(a, b uint64) uint64

SelectSmallerBytes returns min(a,b) for each byte Efficient for clipping, filtering, and data preprocessing

func SubtractBytesWithMinimum

func SubtractBytesWithMinimum(a, b uint64) uint64

SubtractBytesWithMinimum performs byte-wise subtraction clamped at zero Provides saturating subtraction to prevent underflow in all 8 bytes

func SubtractBytesWithWrapping

func SubtractBytesWithWrapping(a, b uint64) uint64

SubtractBytesWithWrapping performs byte-wise subtraction with wrapping Parallel subtraction across all 8 bytes with wrap-around behavior

func SwapByteHalves

func SwapByteHalves(v uint64) uint64

SwapByteHalves swaps the high and low nibbles in each byte Useful for BCD encoding/decoding and nibble-level transforms

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL