locksoupasm

package

v0.1.2 Latest Latest Go to latest Published: May 22, 2026 License: MIT Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/everanium/itb

Links

Open Source Insights

Documentation ¶

Rendered for

Overview ¶

Package locksoupasm holds the BMI2 PEXT/PDEP assembly implementation of the per-chunk Lock Soup keyed bit-permutation kernels. It lives in an internal subpackage by the same convention as `internal/areionasm/` because the parent `itb` package uses CGO (Go's build system does not allow Go assembly files in CGO-using packages).

Only the assembly entry points and the runtime BMI2 capability flag are exported. The parent package `itb` dispatches between the BMI2 path and the pure-Go softPEXT24 / softPDEP24 fallback based on HasBMI2.

Index ¶

Variables
func Chunk24Lock(x, m0, m1, m2 uint32) (l0, l1, l2 uint32)
func Permute24Avx512(x uint32, perm *[32]byte) (y uint32)
func Unchunk24Lock(l0, l1, l2, m0, m1, m2 uint32) (x uint32)

Constants ¶

This section is empty.

Variables ¶

View Source

var HasAVX512Permute = cpu.X86.HasAVX512F &&
	cpu.X86.HasAVX512BW &&
	cpu.X86.HasAVX512VL &&
	cpu.X86.HasAVX512VBMI

HasAVX512Permute caches whether the runtime CPU supports the AVX-512 feature set required by Permute24Avx512:

AVX-512 F — base 512-bit infrastructure
AVX-512 BW — byte-wise mask ops (VPMOVM2B, VPABSB, VPTESTMB)
AVX-512 VL — 256-bit (YMM) variants of those instructions
AVX-512 VBMI — VPERMB byte-shuffle

Resolved once at init time from golang.org/x/sys/cpu's CPUID-driven detection. Available on Intel Ice Lake / Tiger Lake / Rocket Lake / Sapphire Rapids+, AMD Zen 4 / Zen 5. Same gate as the Tier A per-pixel kernel in process_pixels.c.

View Source

var HasBMI2 = cpu.X86.HasBMI2

HasBMI2 caches whether the runtime CPU supports BMI2 (PEXT, PDEP). Resolved once at init time from the upstream cpu package's CPUID-driven detection. Available on Intel Haswell+ and AMD Excavator+ (Zen 1+); essentially every modern x86 SKU shipped after 2013.

Functions ¶

func Chunk24Lock ¶

func Chunk24Lock(x, m0, m1, m2 uint32) (l0, l1, l2 uint32)

Chunk24Lock applies the Lock Soup keyed bit-permutation to a 24-bit input x under three balanced 8-of-24 masks (m0, m1, m2) and returns the three 8-bit lane outputs in the low byte of (l0, l1, l2). The caller is responsible for packing the input bytes — typically `x = uint32(a) | uint32(b)<<8 | uint32(c)<<16` — and for unpacking the lane bytes via `byte(...)`.

The assembly body issues three BMI2 PEXTL instructions, one per lane, ~3 cycles each on modern x86. Total ~10 cycles including argument loads — vs ~450 cycles for three pure-Go softPEXT24 calls.

Caller must supply popcount(m_i) == 8 with m0|m1|m2 == 0xFFFFFF and pairwise-disjoint masks (the Lock Soup balanced-triple invariant). Behaviour is undefined for malformed mask triples, matching the existing chunk24lock contract.

func Permute24Avx512 ¶

func Permute24Avx512(x uint32, perm *[32]byte) (y uint32)

Permute24Avx512 applies an arbitrary bit permutation π: {0..23} → {0..23} to the low 24 bits of x and returns the permuted result. perm must be a stack-resident 32-byte buffer; perm[0..23] are the source bit positions for output bits 0..23 (output[i] = bit perm[i] of x), perm[24..31] must be zero (they are not consumed but VPERMB uses them as gather indices into the bit-spread of x; nonzero values would contaminate the result before the final 24-bit mask).

The assembly body is ~7 vector instructions plus argument loads, totalling ~30–40 cycles per call including Go ABI overhead — vs ~80–100 cycles for the pure-Go softPermute24 fallback.

Caller is responsible for the AVX-512 VBMI runtime gate via HasAVX512Permute. Calling on a host without VBMI raises #UD.

func Unchunk24Lock ¶

func Unchunk24Lock(l0, l1, l2, m0, m1, m2 uint32) (x uint32)

Unchunk24Lock is the inverse of Chunk24Lock. Given three lane bytes (l0, l1, l2 in the low byte of each uint32) and the same mask triple, returns the original packed 24-bit input. Three BMI2 PDEPL instructions plus two ORs.

Types ¶

This section is empty.

Source Files ¶

View all Source files

locksoupasm_amd64.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL