Documentation
¶
Overview ¶
Package hashes provides cached, pre-keyed wrappers around the nine PRF-grade hash primitives that ITB ships with as built-in factories for the C / FFI / mobile shared-library distribution.
Every factory in this package returns a function value compatible with one of itb.HashFunc128 / itb.HashFunc256 / itb.HashFunc512 (matching the primitive's native intermediate-state width). The returned closure carries a per-instance random fixed key plus pre-computed primitive state (an AES cipher.Block, a BLAKE3 keyed template, a sync.Pool of scratch buffers); subsequent invocations allocate nothing on the heap.
Without these cached wrappers per-pixel hashing would re-derive every primitive's keyed state on every call, which is the dominant cost in ITB's encrypt / decrypt path. The factories are taken directly from the bench-validated reference wrappers — see BENCH.md for measured throughput across all nine primitives × three ITB key widths (512 / 1024 / 2048).
Canonical names and ordering (used by Registry, Find, Make128, Make256, Make512 and exposed through the FFI surface as the public hash identifier):
areion256, areion512, siphash24, aescmac, blake2b256, blake2b512, blake2s, blake3, chacha20
Each factory has an optional WithKey variant accepting a fixed key of the primitive's native key length, intended for serialization / deserialization of long-lived seeds across processes (Encrypt today, Decrypt tomorrow). SipHash-2-4 has no WithKey variant — the seed components themselves are the entire SipHash key, and no fixed-key state lives in the factory closure.
All primitives in this package are PRF-grade. The below-spec lab stress controls (CRC128, FNV-1a, MD5) used in REDTEAM.md / SCIENCE.md are intentionally absent here — they are research instruments, not shippable cipher primitives.
Custom-primitive builders ¶
Beyond the nine shipped primitives, the package exposes three builder families for safely wrapping user-supplied PRFs:
- BuildCBCMACChainAbsorb128 / BuildCBCMACChainAbsorb256 / BuildCBCMACChainAbsorb512 — wrap a keyed block cipher into a CBC-MAC chain-absorb HashFunc closure.
- BuildSpongeChainAbsorb128 / BuildSpongeChainAbsorb256 / BuildSpongeChainAbsorb512 — wrap an unkeyed permutation + fixed-key into a keyed-sponge HashFunc closure.
- BuildARXChainAbsorb128 / BuildARXChainAbsorb256 / BuildARXChainAbsorb512 — wrap a full hash function (such as crypto/sha256.Sum256) + fixed-key into a Merkle-Damgard-style HashFunc closure.
These builders exist primarily to close the silent-nonce-truncation trap that a naive user wrapper falls into. A user who writes
func(data []byte, seed [8]uint64) [8]uint64 {
h := sha256.Sum256(data)
// ... zero-pad upper 32 bytes ... return [8]uint64{...}
}
silently truncates the upper half of ITB's 512-bit intermediate state to a constant value, destroying half the entropy of ChainHash's per-call XOR chain. The builders absorb the full ITB nonce width into the digest through the appropriate chain pattern; the user only writes a primitive call.
Performance trade-off: the builders dispatch through interface callbacks and []byte state buffers, losing 5-15% throughput vs the inline per-primitive closures shipped in this package. The trade is correctness-by-construction for any user primitive vs peak throughput for the nine built-in primitives. See [CONSTRUCTIONS.md] "Why use builders for custom user primitives" for the silent- truncation failure modes the builders prevent.
Index ¶
- Variables
- func AESCMAC(key ...[16]byte) (itb.HashFunc128, [16]byte)
- func AESCMACPair(key ...[16]byte) (itb.HashFunc128, itb.BatchHashFunc128, [16]byte)
- func AESCMACPairWithKey(aesKey [16]byte) (itb.HashFunc128, itb.BatchHashFunc128)
- func AESCMACWithKey(aesKey [16]byte) itb.HashFunc128
- func Areion256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
- func Areion256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
- func Areion512Pair(key ...[64]byte) (itb.HashFunc512, itb.BatchHashFunc512, [64]byte)
- func Areion512PairWithKey(fixedKey [64]byte) (itb.HashFunc512, itb.BatchHashFunc512)
- func BLAKE2b256(key ...[32]byte) (itb.HashFunc256, [32]byte)
- func BLAKE2b256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
- func BLAKE2b256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
- func BLAKE2b256WithKey(b2key [32]byte) itb.HashFunc256
- func BLAKE2b512(key ...[64]byte) (itb.HashFunc512, [64]byte)
- func BLAKE2b512Pair(key ...[64]byte) (itb.HashFunc512, itb.BatchHashFunc512, [64]byte)
- func BLAKE2b512PairWithKey(fixedKey [64]byte) (itb.HashFunc512, itb.BatchHashFunc512)
- func BLAKE2b512WithKey(b2key [64]byte) itb.HashFunc512
- func BLAKE2s(key ...[32]byte) (itb.HashFunc256, [32]byte)
- func BLAKE2s256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
- func BLAKE2s256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
- func BLAKE2sWithKey(b2key [32]byte) itb.HashFunc256
- func BLAKE3(key ...[32]byte) (itb.HashFunc256, [32]byte)
- func BLAKE3WithKey(key [32]byte) itb.HashFunc256
- func BLAKE3256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
- func BLAKE3256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
- func BuildARXChainAbsorb128(hashFn Hash256Fn, fixedKey []byte) itb.HashFunc128
- func BuildARXChainAbsorb256(hashFn Hash256Fn, fixedKey []byte) itb.HashFunc256
- func BuildARXChainAbsorb512(hashFn Hash512Fn, fixedKey []byte) itb.HashFunc512
- func BuildCBCMACChainAbsorb128(block cipher.Block) itb.HashFunc128
- func BuildCBCMACChainAbsorb256(block cipher.Block) itb.HashFunc256
- func BuildCBCMACChainAbsorb512(block cipher.Block) itb.HashFunc512
- func BuildSpongeChainAbsorb128(permute Permute, rate, capacity int, fixedKey []byte) itb.HashFunc128
- func BuildSpongeChainAbsorb256(permute Permute, rate, capacity int, fixedKey []byte) itb.HashFunc256
- func BuildSpongeChainAbsorb512(permute Permute, rate, capacity int, fixedKey []byte) itb.HashFunc512
- func ChaCha20(key ...[32]byte) (itb.HashFunc256, [32]byte)
- func ChaCha20WithKey(fixedKey [32]byte) itb.HashFunc256
- func ChaCha20256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
- func ChaCha20256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
- func Make128(name string, key ...[]byte) (itb.HashFunc128, []byte, error)
- func Make128Pair(name string, key ...[]byte) (itb.HashFunc128, itb.BatchHashFunc128, []byte, error)
- func Make256(name string, key ...[]byte) (itb.HashFunc256, []byte, error)
- func Make256Pair(name string, key ...[]byte) (itb.HashFunc256, itb.BatchHashFunc256, []byte, error)
- func Make512(name string, key ...[]byte) (itb.HashFunc512, []byte, error)
- func Make512Pair(name string, key ...[]byte) (itb.HashFunc512, itb.BatchHashFunc512, []byte, error)
- func SipHash24() itb.HashFunc128
- func SipHash24Pair() (itb.HashFunc128, itb.BatchHashFunc128)
- type Hash256Fn
- type Hash512Fn
- type Permute
- type Spec
- type Width
Constants ¶
This section is empty.
Variables ¶
var Registry = [9]Spec{ {"areion256", W256}, {"areion512", W512}, {"siphash24", W128}, {"aescmac", W128}, {"blake2b256", W256}, {"blake2b512", W512}, {"blake2s", W256}, {"blake3", W256}, {"chacha20", W256}, }
Registry lists every shippable PRF-grade primitive in canonical order. The same order is used by the FFI iteration surface (ITB_HashName, ITB_HashWidth) so that index 0..8 is stable across releases.
Functions ¶
func AESCMAC ¶
func AESCMAC(key ...[16]byte) (itb.HashFunc128, [16]byte)
AESCMAC returns a cached itb.HashFunc128 backed by AES along with the 16-byte fixed key the closure is bound to. With no argument the key is freshly generated via crypto/rand; passing a single caller-supplied [16]byte uses that key instead.
The returned key is always the actual key in use — callers on the persistence path must save it (encrypt today, decrypt tomorrow); test fixtures and other throw-away usages can discard via `_`.
Construction:
- key (16 bytes) is loaded once into a cipher.Block (AES-NI hardware path on amd64 / arm64 hosts that expose the AES round instructions; software AES fallback otherwise);
- per call: seed0||seed1 is XOR'd into the first 16 data bytes, then encrypted in-place; remaining 16-byte data chunks are XOR'd into state and encrypted; the final 16-byte block is returned as (lo64, hi64).
The cipher.Block is shared across all invocations of the closure (it carries no per-call state), so concurrent goroutines may call the returned function in parallel — Go's stdlib AES Encrypt path is reentrant.
func AESCMACPair ¶
func AESCMACPair(key ...[16]byte) (itb.HashFunc128, itb.BatchHashFunc128, [16]byte)
AESCMACPair returns a fresh (single, batched) AES-CMAC-128 hash pair for itb.Seed128 integration. The two arms share the same internally-generated random 16-byte AES key so per-pixel hashes computed via the batched dispatch match the single-call path bit-exact (the parity invariant required by itb.BatchHashFunc128).
On amd64 with VAES + AVX-512 the batched arm dispatches to a fused ZMM-batched chain-absorb kernel for ITB's three SetNonceBits buf shapes (20 / 36 / 68 byte inputs) — VAESENC on ZMM operates on four independent AES blocks per instruction, so the per-pixel AES-CMAC chain advances four lanes through one VAESENC instead of four serial cipher.Block.Encrypt calls. On hosts without VAES + AVX-512, and for non-{20,36,68} input lengths, the batched arm falls back to four single-call invocations and remains bit-exact.
With no argument a fresh 16-byte AES key is generated via crypto/rand; passing a single caller-supplied [16]byte uses that key instead. The returned key (random or supplied) is always emitted as the third return value — save it for cross-process persistence.
Realistic uplift target: substantial over the upstream crypto/aes-driven scalar dispatch on Rocket Lake; higher on AMD Zen 5 / Sapphire Rapids+ where full-width 512-bit ALUs and VAESENC per-cycle throughput (4 AES rounds/cycle on Zen 5 vs ~2-3 on Rocket Lake) widen the envelope. The gain is a mix of 4-lane parallelism (four independent AES-CMAC chains advancing through one VAESENC) and per-call cipher.Block.Encrypt interface-dispatch amortisation across the lanes.
func AESCMACPairWithKey ¶
func AESCMACPairWithKey(aesKey [16]byte) (itb.HashFunc128, itb.BatchHashFunc128)
AESCMACPairWithKey returns the (single, batched) AES-CMAC-128 pair built around a caller-supplied 16-byte AES key, for the persistence-restore path where the original key has been saved across processes (encrypt today, decrypt tomorrow).
The single arm is identical to AESCMACWithKey(aesKey). The batched arm hot-dispatches to the fused ZMM-batched chain-absorb kernel when all four lanes share an input length in {20, 36, 68}; for any other lane-length configuration it falls back to four single-call invocations of the single arm.
The AES-128 round-key schedule (11 × 16-byte round keys = 176 bytes) is pre-expanded once via aescmacasm.ExpandKeyAES128 and captured by the batched closure; the kernels broadcast each round key to all 4 lanes via VBROADCASTI32X4 at function entry.
func AESCMACWithKey ¶
func AESCMACWithKey(aesKey [16]byte) itb.HashFunc128
AESCMACWithKey returns the AESCMAC closure built around a caller- supplied 16-byte key, intended for serialization paths where the fixed key must persist across processes (Encrypt today / Decrypt tomorrow).
func Areion256Pair ¶
func Areion256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
Areion256Pair returns a fresh (single, batched) Areion-SoEM-256 hash pair for itb.Seed256 integration. The two arms share the same internally-generated random fixed key so that per-pixel hashes computed via the batched dispatch match the single-call path bit-exact (the parity invariant required by itb.BatchHashFunc256).
On amd64 with VAES + AVX-512 the batched arm routes per-pixel hashing four pixels per call through AreionSoEM256x4, yielding ~2× throughput over the single-call path. On hosts without those extensions the batched arm falls back to four single-call invocations and remains bit-exact.
This is a thin wrapper over the in-package itb.MakeAreionSoEM256Hash helper; it exists so that Areion-SoEM-256 fits the same name-keyed factory shape as the rest of the hashes/ package.
With no argument a fresh 32-byte fixed key is generated via crypto/rand; passing a single caller-supplied [32]byte uses that key instead. The returned key (random or supplied) is always emitted as the third return value — save it for cross-process persistence.
func Areion256PairWithKey ¶
func Areion256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
Areion256PairWithKey returns the (single, batched) Areion-SoEM-256 pair built around a caller-supplied 32-byte fixed key. Same role as the WithKey variants on the other hashes/ primitives — meant for the persistence-restore path where the original fixed key has been saved across processes (encrypt today, decrypt tomorrow).
Thin wrapper over itb.MakeAreionSoEM256HashWithKey for symmetry with the rest of the hashes/ package's WithKey factories.
func Areion512Pair ¶
func Areion512Pair(key ...[64]byte) (itb.HashFunc512, itb.BatchHashFunc512, [64]byte)
Areion512Pair returns a fresh (single, batched) Areion-SoEM-512 hash pair for itb.Seed512 integration. Same construction principle as Areion256Pair: a fresh random 64-byte fixed key shared between the single-call and batched arms, ensuring bit-exact agreement between the two dispatch paths.
On amd64 with VAES + AVX-512 the batched arm uses the AreionSoEM512x4 ASM kernel; on other hosts both arms degrade to the portable Go fallback while remaining bit-identical. With no argument a fresh 64-byte fixed key is generated via crypto/rand; passing a single caller-supplied [64]byte uses that key instead. The returned key (random or supplied) is always emitted as the third return value — save it for cross-process persistence.
func Areion512PairWithKey ¶
func Areion512PairWithKey(fixedKey [64]byte) (itb.HashFunc512, itb.BatchHashFunc512)
Areion512PairWithKey returns the (single, batched) Areion-SoEM-512 pair built around a caller-supplied 64-byte fixed key. Same role as the WithKey variants on the other hashes/ primitives — meant for the persistence-restore path where the original fixed key has been saved across processes (encrypt today, decrypt tomorrow).
Thin wrapper over itb.MakeAreionSoEM512HashWithKey.
func BLAKE2b256 ¶
func BLAKE2b256(key ...[32]byte) (itb.HashFunc256, [32]byte)
BLAKE2b256 returns a cached BLAKE2b-256 itb.HashFunc256 with a freshly-generated 32-byte fixed key.
Construction prepends the fixed key as a 32-byte prefix to the hash input and mixes seed components by XOR over the next 32 bytes: H(key || data ^ seed). blake2b.Sum256 is the entry point (no allocation, no keyed-mode handle), so the closure has zero per-call allocations modulo the pooled scratch buffer.
The payload region is zero-padded out to 32 bytes when len(data) is shorter, ensuring all four seed uint64's contribute regardless of how short the caller's data is — important for ITB which hashes 20-byte (pixel_le + nonce) inputs in the inner loop. BLAKE2b256 returns a cached BLAKE2b-256 itb.HashFunc256 along with the 32-byte fixed key the closure is bound to. With no argument a fresh key is generated via crypto/rand; passing a single caller-supplied [32]byte uses that key instead. Save the returned key for cross-process persistence.
func BLAKE2b256Pair ¶
func BLAKE2b256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
BLAKE2b256Pair returns a fresh (single, batched) BLAKE2b-256 hash pair for itb.Seed256 integration. The two arms share the same internally-generated random 32-byte fixed key so per-pixel hashes computed via the batched dispatch match the single-call path bit-exact (the parity invariant required by itb.BatchHashFunc256).
On amd64 with AVX-512+VL the batched arm dispatches to a fused ZMM-batched chain-absorb kernel for ITB's three SetNonceBits buf shapes (20 / 36 / 68 byte inputs). On hosts without AVX-512+VL, and for non-{20,36,68} input lengths, the batched arm falls back to four single-call invocations and remains bit-exact.
With no argument a fresh 32-byte fixed key is generated via crypto/rand; passing a single caller-supplied [32]byte uses that key instead. The returned key (random or supplied) is always emitted as the third return value — save it for cross-process persistence.
func BLAKE2b256PairWithKey ¶
func BLAKE2b256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
BLAKE2b256PairWithKey returns the (single, batched) BLAKE2b-256 pair built around a caller-supplied 32-byte fixed key. Same role as the WithKey variants on the other hashes/ primitives — meant for the persistence-restore path where the original fixed key has been saved across processes (encrypt today, decrypt tomorrow).
The single arm is identical to BLAKE2b256WithKey(fixedKey). The batched arm hot-dispatches to the fused ZMM-batched chain-absorb kernel when all four lanes share an input length in {20, 36, 68}; for any other lane-length configuration it falls back to four single-call invocations of the single arm.
func BLAKE2b256WithKey ¶
func BLAKE2b256WithKey(b2key [32]byte) itb.HashFunc256
BLAKE2b256WithKey returns the BLAKE2b-256 closure built around a caller-supplied 32-byte fixed key, for serialization paths.
The closure runs on the upstream golang.org/x/crypto/blake2b path (which itself uses the BLAKE2b AVX2 kernel on amd64). For ITB throughput-critical use, prefer BLAKE2b256Pair: the batched arm of the pair dispatches to a 4-pixel-parallel AVX-512 ZMM kernel that amortises the per-call overhead the upstream single-pixel path cannot.
func BLAKE2b512 ¶
func BLAKE2b512(key ...[64]byte) (itb.HashFunc512, [64]byte)
BLAKE2b512 returns a cached BLAKE2b-512 itb.HashFunc512 with a freshly-generated 64-byte fixed key.
BLAKE2b natively supports 512-bit output and up to a 64-byte key. The construction is identical to BLAKE2b256 modulo widths: H(key || data ^ seed) where the payload is zero-padded out to 64 bytes when shorter, ensuring all eight seed uint64's contribute regardless of how short the caller's data is. BLAKE2b512 returns a cached BLAKE2b-512 itb.HashFunc512 along with the 64-byte fixed key the closure is bound to. With no argument a fresh key is generated via crypto/rand; passing a single caller-supplied [64]byte uses that key instead. Save the returned key for cross-process persistence.
func BLAKE2b512Pair ¶
func BLAKE2b512Pair(key ...[64]byte) (itb.HashFunc512, itb.BatchHashFunc512, [64]byte)
BLAKE2b512Pair returns a fresh (single, batched) BLAKE2b-512 hash pair for itb.Seed512 integration. The two arms share the same internally-generated random 64-byte fixed key so per-pixel hashes computed via the batched dispatch match the single-call path bit-exact (the parity invariant required by itb.BatchHashFunc512).
On amd64 with AVX-512+VL the batched arm dispatches to a fused ZMM-batched chain-absorb kernel for ITB's three SetNonceBits buf shapes (20 / 36 / 68 byte inputs), holding four lane-isolated BLAKE2b states in 16 ZMM registers across all 12 mixing rounds. On hosts without AVX-512+VL, and for non-{20,36,68} input lengths, the batched arm falls back to four single-call invocations and remains bit-exact.
With no argument a fresh 64-byte fixed key is generated via crypto/rand; passing a single caller-supplied [64]byte uses that key instead. The returned key (random or supplied) is always emitted as the third return value — save it for cross-process persistence.
func BLAKE2b512PairWithKey ¶
func BLAKE2b512PairWithKey(fixedKey [64]byte) (itb.HashFunc512, itb.BatchHashFunc512)
BLAKE2b512PairWithKey returns the (single, batched) BLAKE2b-512 pair built around a caller-supplied 64-byte fixed key. Same role as the WithKey variants on the other hashes/ primitives — meant for the persistence-restore path where the original fixed key has been saved across processes (encrypt today, decrypt tomorrow).
The single arm is identical to BLAKE2b512WithKey(fixedKey). The batched arm hot-dispatches to the fused ZMM-batched chain-absorb kernel when all four lanes share an input length in {20, 36, 68}; for any other lane-length configuration it falls back to four single-call invocations of the single arm.
func BLAKE2b512WithKey ¶
func BLAKE2b512WithKey(b2key [64]byte) itb.HashFunc512
BLAKE2b512WithKey returns the BLAKE2b-512 closure built around a caller-supplied 64-byte fixed key, for serialization paths.
The closure runs on the upstream golang.org/x/crypto/blake2b path (which itself uses the BLAKE2b AVX2 kernel on amd64). For ITB throughput-critical use, prefer BLAKE2b512Pair: the batched arm of the pair dispatches to a 4-pixel-parallel AVX-512 ZMM kernel that amortises the per-call overhead the upstream single-pixel path cannot.
func BLAKE2s ¶
func BLAKE2s(key ...[32]byte) (itb.HashFunc256, [32]byte)
BLAKE2s returns a cached BLAKE2s-256 itb.HashFunc256 with a freshly-generated 32-byte fixed key.
Same construction as BLAKE2b256: H(key || data ^ seed) using blake2s.Sum256 (no allocation, no keyed-mode handle). The payload region is zero-padded to 32 bytes for short inputs so all four seed uint64's contribute to the digest. BLAKE2s returns a cached BLAKE2s-256 itb.HashFunc256 along with the 32-byte fixed key the closure is bound to. With no argument a fresh key is generated via crypto/rand; passing a single caller-supplied [32]byte uses that key instead. Save the returned key for cross-process persistence.
func BLAKE2s256Pair ¶
func BLAKE2s256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
BLAKE2s256Pair returns a fresh (single, batched) BLAKE2s-256 hash pair for itb.Seed256 integration. The two arms share the same internally-generated random 32-byte fixed key so per-pixel hashes computed via the batched dispatch match the single-call path bit-exact (the parity invariant required by itb.BatchHashFunc256).
On amd64 with AVX-512+VL the batched arm dispatches to a fused ZMM-batched chain-absorb kernel for ITB's three SetNonceBits buf shapes (20 / 36 / 68 byte inputs). On hosts without AVX-512+VL, and for non-{20,36,68} input lengths, the batched arm falls back to four single-call invocations and remains bit-exact.
With no argument a fresh 32-byte fixed key is generated via crypto/rand; passing a single caller-supplied [32]byte uses that key instead. The returned key (random or supplied) is always emitted as the third return value — save it for cross-process persistence.
func BLAKE2s256PairWithKey ¶
func BLAKE2s256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
BLAKE2s256PairWithKey returns the (single, batched) BLAKE2s-256 pair built around a caller-supplied 32-byte fixed key. Same role as the WithKey variants on the other hashes/ primitives — meant for the persistence-restore path where the original fixed key has been saved across processes (encrypt today, decrypt tomorrow).
The single arm is identical to BLAKE2sWithKey(fixedKey). The batched arm hot-dispatches to the fused ZMM-batched chain-absorb kernel when all four lanes share an input length in {20, 36, 68}; for any other lane-length configuration it falls back to four single-call invocations of the single arm.
The ASM kernel returns 8 × uint32 per lane (32 bytes of digest); the closure repacks each lane's 8 uint32 into 4 uint64 for the itb.BatchHashFunc256 contract (LE byte ordering).
func BLAKE2sWithKey ¶
func BLAKE2sWithKey(b2key [32]byte) itb.HashFunc256
BLAKE2sWithKey returns the BLAKE2s-256 closure built around a caller-supplied 32-byte fixed key, for serialization paths.
func BLAKE3 ¶
func BLAKE3(key ...[32]byte) (itb.HashFunc256, [32]byte)
BLAKE3 returns a cached BLAKE3-256 itb.HashFunc256 with a freshly- generated 32-byte BLAKE3 key.
The pre-keyed BLAKE3 hasher template is created once via blake3.NewKeyed; each call clones the template instead of re-keying, sidestepping the data race that Reset() on a shared hasher would cause when ITB's process256 dispatches multiple goroutines on the same seed. A sync.Pool of scratch buffers keeps per-call allocation at zero.
Seed components are mixed into the hashed payload as XOR over the first 32 bytes; the input is zero-padded out to 32 bytes when the caller's data is shorter, so all four seed uint64's contribute regardless of how short the caller's data is. BLAKE3 returns a cached BLAKE3-256 itb.HashFunc256 along with the 32-byte BLAKE3 key the closure is bound to. With no argument a fresh key is generated via crypto/rand; passing a single caller-supplied [32]byte uses that key instead. Save the returned key for cross-process persistence.
func BLAKE3WithKey ¶
func BLAKE3WithKey(key [32]byte) itb.HashFunc256
BLAKE3WithKey returns the BLAKE3 closure built around a caller- supplied 32-byte BLAKE3 key, for serialization across processes.
func BLAKE3256Pair ¶
func BLAKE3256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
BLAKE3256Pair returns a fresh (single, batched) BLAKE3-256 hash pair for itb.Seed256 integration. The two arms share the same internally-generated random 32-byte BLAKE3 key so per-pixel hashes computed via the batched dispatch match the single-call path bit-exact (the parity invariant required by itb.BatchHashFunc256).
On amd64 with AVX-512+VL the batched arm dispatches to a fused ZMM-batched chain-absorb kernel for ITB's three SetNonceBits buf shapes (20 / 36 / 68 byte inputs). On hosts without AVX-512+VL, and for non-{20,36,68} input lengths, the batched arm falls back to four single-call invocations and remains bit-exact.
With no argument a fresh 32-byte BLAKE3 key is generated via crypto/rand; passing a single caller-supplied [32]byte uses that key instead. The returned key (random or supplied) is always emitted as the third return value — save it for cross-process persistence.
Realistic uplift target: 1.3-2.0× over the upstream zeebo/blake3 per-call dispatch. github.com/zeebo/blake3 already carries hand-written AVX-512 assembly for the BLAKE3 compression, so the batched arm's gain over upstream is mostly from amortising the per-call Hasher.Clone / Write / Sum overhead across 4 lanes rather than from kernel-internal speedup.
func BLAKE3256PairWithKey ¶
func BLAKE3256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
BLAKE3256PairWithKey returns the (single, batched) BLAKE3-256 pair built around a caller-supplied 32-byte BLAKE3 key, for the persistence-restore path where the original key has been saved across processes (encrypt today, decrypt tomorrow).
The single arm is identical to BLAKE3WithKey(key). The batched arm hot-dispatches to the fused ZMM-batched chain-absorb kernel when all four lanes share an input length in {20, 36, 68}; for any other lane-length configuration it falls back to four single-call invocations of the single arm.
The ASM kernel returns 8 × uint32 per lane (32 bytes of digest); the closure repacks each lane's 8 uint32 into 4 uint64 for the itb.BatchHashFunc256 contract (LE byte ordering).
func BuildARXChainAbsorb128 ¶
func BuildARXChainAbsorb128(hashFn Hash256Fn, fixedKey []byte) itb.HashFunc128
BuildARXChainAbsorb128 wraps a 32-byte full hash function into an itb.HashFunc128 closure. The closure constructs a single absorb buffer of the form:
buf = fixedKey || lenTag(8) || seed0(8) || seed1(8) || domain(1) || data
then computes hashFn(buf) and returns the first 16 bytes of the digest as a (lo, hi) uint64 pair. The full ITB nonce is part of the data segment and reaches the hash through the underlying hash function's native variable-length absorption.
fixedKey is hashed into the prefix as a long-lived keying source; seed0/seed1 are the per-call PRF key supplied by ITB's ChainHash. Length tag and domain byte protect against length-extension and cross-call collision attacks within the construction.
For SHA-256 wrappers this is the canonical safe usage. The hash function does its own MD chaining internally, so no chain-absorb loop is needed at the builder level — the builder's role is to arrange the buffer correctly so all of {fixedKey, seed, length, domain, data} reach the digest with no silent truncation.
func BuildARXChainAbsorb256 ¶
func BuildARXChainAbsorb256(hashFn Hash256Fn, fixedKey []byte) itb.HashFunc256
BuildARXChainAbsorb256 wraps a 32-byte full hash function into an itb.HashFunc256 closure. Same construction as BuildARXChainAbsorb128 but returns the full 32-byte digest as [4]uint64.
func BuildARXChainAbsorb512 ¶
func BuildARXChainAbsorb512(hashFn Hash512Fn, fixedKey []byte) itb.HashFunc512
BuildARXChainAbsorb512 wraps a 64-byte full hash function into an itb.HashFunc512 closure. Same construction as BuildARXChainAbsorb128 but uses a Hash512Fn (e.g. crypto/sha512.Sum512) so the full 64-byte digest comes from a single hash call. The fixedKey + seed (4 of 8 components) + length + domain prefix is built once; the remaining 4 seed components are mixed via a second hash call with a different domain marker, and the two 32-byte halves are concatenated.
func BuildCBCMACChainAbsorb128 ¶
func BuildCBCMACChainAbsorb128(block cipher.Block) itb.HashFunc128
BuildCBCMACChainAbsorb128 wraps a keyed block cipher into an itb.HashFunc128 closure that absorbs arbitrary-length data via CBC-MAC chain. The full ITB nonce reaches the digest with no silent truncation: data is absorbed in BlockSize()-byte chunks via XOR followed by block.Encrypt; a length tag is folded into the initial state to break the trailing-zero collision class.
The block cipher must have BlockSize() >= 16. AES-128 / AES-192 / AES-256, Camellia, ARIA, SM4 — all qualify. The cipher's key is embedded inside the cipher.Block; the builder does not see the key material.
Construction:
- state := zeros(BlockSize())
- state[0:8] = seed0 ^ len(data)
- state[8:16] = seed1 ^ len(data)
- state[0:firstChunkLen] ^= data[:firstChunkLen]
- state = block.Encrypt(state)
- For each subsequent BlockSize()-byte chunk of data:
- state[0:chunkLen] ^= data[offset:offset+chunkLen]
- state = block.Encrypt(state)
- Output: (uint64_le(state[0:8]), uint64_le(state[8:16]))
The closure runs at least one block.Encrypt call (even for empty data) so the length-tagged initial state is always permuted before output extraction.
func BuildCBCMACChainAbsorb256 ¶
func BuildCBCMACChainAbsorb256(block cipher.Block) itb.HashFunc256
BuildCBCMACChainAbsorb256 wraps a keyed block cipher into an itb.HashFunc256 closure. Internally runs two independent CBC-MAC chain-absorb passes over the same data, each domain-separated by a distinct constant XOR'd into the initial state, and concatenates the 16-byte halves into a 32-byte digest. The construction inherits the full nonce absorption property of BuildCBCMACChainAbsorb128 and adds a 2x throughput cost.
func BuildCBCMACChainAbsorb512 ¶
func BuildCBCMACChainAbsorb512(block cipher.Block) itb.HashFunc512
BuildCBCMACChainAbsorb512 wraps a keyed block cipher into an itb.HashFunc512 closure. Internally runs four independent CBC-MAC chain-absorb passes over the same data, each domain-separated by a distinct constant XOR'd into the initial state, and concatenates the 16-byte quarters into a 64-byte digest. The construction inherits the full nonce absorption property of BuildCBCMACChainAbsorb128 and adds a 4x throughput cost.
For 64-byte ITB nonce (SetNonceBits(512)) configurations, this closure runs 4 * ceil(68 / BlockSize()) block.Encrypt calls per HashFunc512 invocation. For AES-128 (BlockSize=16) this is 20 Encrypt calls; for a hypothetical 32-byte block cipher 12 Encrypt calls.
func BuildSpongeChainAbsorb128 ¶
func BuildSpongeChainAbsorb128(permute Permute, rate, capacity int, fixedKey []byte) itb.HashFunc128
BuildSpongeChainAbsorb128 wraps an unkeyed permutation into a keyed- sponge itb.HashFunc128 closure. The permutation is invoked on a (rate + capacity)-byte state buffer. The fixedKey is XOR'd into the capacity region for keying (standard keyed-sponge pattern). Data is absorbed in rate-byte chunks; output is extracted from the first 16 bytes of the rate region.
Requirements:
- rate >= 16 (so output extraction is direct, no squeeze loop)
- capacity >= 16 (so the seed components fit in the capacity slot after fixedKey injection)
- len(fixedKey) <= capacity
Construction:
- state := zeros(rate + capacity)
- copy(state[rate:rate+len(fixedKey)], fixedKey)
- state[rate:rate+8] ^= LE(seed0)
- state[rate+8:rate+16] ^= LE(seed1)
- state[0:8] = LE(len(data) ^ domain)
- permute(state)
- For each rate-byte chunk of data:
- state[0:chunkLen] ^= data[offset:offset+chunkLen]
- permute(state)
- Output: (uint64_le(state[0:8]), uint64_le(state[8:16]))
The permutation runs at least once (over the initialized state) even for empty data, so the length-tagged state is always mixed before output.
func BuildSpongeChainAbsorb256 ¶
func BuildSpongeChainAbsorb256(permute Permute, rate, capacity int, fixedKey []byte) itb.HashFunc256
BuildSpongeChainAbsorb256 wraps an unkeyed permutation into a keyed- sponge itb.HashFunc256 closure. Internally runs two domain-separated sponge chain-absorb passes and concatenates 16-byte halves. Same guarantees as BuildSpongeChainAbsorb128 at 2x cost.
func BuildSpongeChainAbsorb512 ¶
func BuildSpongeChainAbsorb512(permute Permute, rate, capacity int, fixedKey []byte) itb.HashFunc512
BuildSpongeChainAbsorb512 wraps an unkeyed permutation into a keyed- sponge itb.HashFunc512 closure. Internally runs four domain-separated sponge chain-absorb passes and concatenates 16-byte quarters. Same guarantees as BuildSpongeChainAbsorb128 at 4x cost.
func ChaCha20 ¶
func ChaCha20(key ...[32]byte) (itb.HashFunc256, [32]byte)
ChaCha20 returns a cached ChaCha20 itb.HashFunc256 with a freshly-generated 32-byte fixed key.
Construction (ARX-only PRF, no S-box / no table lookups): the fixed key is XOR'd with the seed components to derive a per-call 256-bit ChaCha20 key. Data is absorbed CBC-MAC-style into a 32-byte state via repeated `state ← E_K(state ⊕ chunk)`-shaped rounds, where E_K is one ChaCha20 keystream block applied to the state and the counter advances automatically between rounds. A length-tag prefix in the initial state and a 24-byte data window per round (8 bytes of chaining feedback) ensure every byte of the input contributes to the digest regardless of input length — 128-, 256-, and 512-bit nonce configurations all reach the digest with full strength.
Per-call allocation is bounded by the cipher initialisation; the state, length tag, and chain feedback all live on the closure's stack frame. Concurrent goroutines may invoke the returned closure in parallel — there is no shared mutable state. ChaCha20 returns a cached ChaCha20 itb.HashFunc256 along with the 32-byte fixed key the closure is bound to. With no argument the key is freshly generated via crypto/rand; passing a single caller-supplied [32]byte uses that key instead. Save the returned key for cross-process persistence.
func ChaCha20WithKey ¶
func ChaCha20WithKey(fixedKey [32]byte) itb.HashFunc256
ChaCha20WithKey returns the ChaCha20 closure built around a caller-supplied 32-byte fixed key, for serialization paths.
func ChaCha20256Pair ¶
func ChaCha20256Pair(key ...[32]byte) (itb.HashFunc256, itb.BatchHashFunc256, [32]byte)
ChaCha20256Pair returns a fresh (single, batched) ChaCha20-256 hash pair for itb.Seed256 integration. The two arms share the same internally-generated random 32-byte fixed key so per-pixel hashes computed via the batched dispatch match the single-call path bit-exact (the parity invariant required by itb.BatchHashFunc256).
On amd64 with AVX-512+VL the batched arm dispatches to a fused ZMM-batched chain-absorb kernel for ITB's three SetNonceBits buf shapes (20 / 36 / 68 byte inputs). On hosts without AVX-512+VL, and for non-{20,36,68} input lengths, the batched arm falls back to four single-call invocations and remains bit-exact.
With no argument a fresh 32-byte fixed key is generated via crypto/rand; passing a single caller-supplied [32]byte uses that key instead. The returned key (random or supplied) is always emitted as the third return value — save it for cross-process persistence.
Realistic uplift target: 2.5×-4.5× over the upstream golang.org/x/crypto/chacha20 per-call dispatch on Rocket Lake; higher on AMD Zen 5 / Sapphire Rapids+ where full-width 512-bit ALUs and absent AVX-512 frequency throttle widen the envelope. The gain is a mix of 4-lane parallelism (four independent ChaCha20 state evolutions retiring through one ZMM dispatch) and per-call cipher.NewUnauthenticatedCipher / XORKeyStream amortisation across the lanes.
func ChaCha20256PairWithKey ¶
func ChaCha20256PairWithKey(fixedKey [32]byte) (itb.HashFunc256, itb.BatchHashFunc256)
ChaCha20256PairWithKey returns the (single, batched) ChaCha20-256 pair built around a caller-supplied 32-byte fixed key, for the persistence-restore path where the original key has been saved across processes (encrypt today, decrypt tomorrow).
The single arm is identical to ChaCha20WithKey(fixedKey). The batched arm hot-dispatches to the fused ZMM-batched chain-absorb kernel when all four lanes share an input length in {20, 36, 68}; for any other lane-length configuration it falls back to four single-call invocations of the single arm.
The ASM kernel returns 4 × uint64 per lane (32 bytes of state) directly — no intermediate [8]uint32 repacking is required since the ChaCha20 chain-absorb output is the 32-byte CBC-MAC-style state buffer in its native LE byte order.
func Make128 ¶
Make128 returns a fresh cached HashFunc128 for the named primitive along with the fixed key the closure is bound to. Pass a single caller-supplied key slice to use that key; pass nothing to generate a fresh random key (returned alongside the closure for persistence).
SipHash-2-4 has no internal fixed key (its keying material is the per-call seed components), so passing a key for "siphash24" is an error; the second return value is nil for siphash24.
Returns an error when name is unknown, its native width is not 128, or the supplied key size does not match the primitive's native key length.
func Make128Pair ¶
func Make128Pair(name string, key ...[]byte) (itb.HashFunc128, itb.BatchHashFunc128, []byte, error)
Make128Pair returns the (single, batched) HashFunc128 / BatchHashFunc128 pair for primitives with a 4-way batched implementation, plus the fixed key the pair is bound to. The batched arm is nil for primitives that do not implement a batched path. The single arm is bit-exact equivalent to Make128 for the same name and key.
Primitives currently returning a non-nil batched arm:
- "aescmac" — VAES + AVX-512 ZMM-batched AES-CMAC chain-absorb kernels
- "siphash24" — AVX-512 ZMM-batched SipHash-2-4 chain-absorb kernels
Variadic key arg follows the same pattern as Make128 / Make256Pair.
func Make256 ¶
Make256 returns a fresh cached HashFunc256 for the named primitive along with the fixed key the closure is bound to. Variadic key arg follows the same pattern as Make128: pass nothing for random key, pass one []byte of the primitive's native key length for explicit.
For "areion256" the batched arm is discarded; use Make256Pair if the per-pixel batched dispatch is needed.
Returns an error when name is unknown, width is not 256, or supplied key size is wrong.
func Make256Pair ¶
func Make256Pair(name string, key ...[]byte) (itb.HashFunc256, itb.BatchHashFunc256, []byte, error)
Make256Pair returns the (single, batched) HashFunc256 / BatchHashFunc256 pair for primitives that have a 4-way batched implementation, plus the fixed key the pair is bound to. The batched arm is nil for primitives that do not implement a batched path. The single arm is bit-exact equivalent to Make256 for the same name and key.
Primitives currently returning a non-nil batched arm:
- "areion256" — VAES + AVX-512 AreionSoEM256x4 ASM kernel
- "blake2b256" — AVX-512 ZMM-batched BLAKE2b chain-absorb kernels
- "blake2s" — AVX-512 ZMM-batched BLAKE2s chain-absorb kernels
- "blake3" — AVX-512 ZMM-batched BLAKE3 chain-absorb kernels
- "chacha20" — AVX-512 ZMM-batched ChaCha20 chain-absorb kernels
Variadic key arg follows the same pattern as Make256.
func Make512 ¶
Make512 returns a fresh cached HashFunc512 for the named primitive along with the fixed key the closure is bound to. Variadic key arg follows the same pattern as Make128 / Make256.
For "areion512" the batched arm is discarded; use Make512Pair if the per-pixel batched dispatch is needed.
Returns an error when name is unknown, width is not 512, or supplied key size is wrong.
func Make512Pair ¶
func Make512Pair(name string, key ...[]byte) (itb.HashFunc512, itb.BatchHashFunc512, []byte, error)
Make512Pair returns the (single, batched) HashFunc512 / BatchHashFunc512 pair for primitives with a 4-way batched implementation, plus the fixed key. The batched arm is nil when no batched path exists.
Primitives currently returning a non-nil batched arm:
- "areion512" — VAES + AVX-512 AreionSoEM512x4 ASM kernel
- "blake2b512" — AVX-512 ZMM-batched BLAKE2b chain-absorb kernels
func SipHash24 ¶
func SipHash24() itb.HashFunc128
SipHash24 returns a SipHash-2-4 itb.HashFunc128 closure.
SipHash-2-4 is a designed PRF whose 128-bit key is supplied per call as the (seed0, seed1) pair — exactly the shape ITB's Seed128 ChainHash128 produces from the seed components. There is no pre-keyed state to cache (no fixed key, no internal hasher object, no scratch buffer) so the closure is a direct call into siphash.
Returns: (low64, high64) of SipHash128(key=(seed0, seed1), data).
No WithKey variant — the seed components are the entire SipHash key. Long-lived seed serialization is a matter of saving Components only.
func SipHash24Pair ¶
func SipHash24Pair() (itb.HashFunc128, itb.BatchHashFunc128)
SipHash24Pair returns a (single, batched) SipHash-2-4-128 hash pair for itb.Seed128 integration. SipHash has no fixed key — the per-call (seed0, seed1) pair is the entire SipHash key — so the factory takes no arguments and returns no key, distinguishing it from the AESCMACPair / AESCMACPairWithKey shape used by the other W128 primitive in the registry.
On amd64 with AVX-512+VL the batched arm dispatches to a fused ZMM-batched chain-absorb kernel for ITB's three SetNonceBits buf shapes (20 / 36 / 68 byte inputs) — the 4 SipHash state words (v0..v3) are held in qwords 0..3 of four ZMM registers (Z0..Z3), and the SipRound body (VPADDQ / VPXORQ / VPROLQ on u64) advances four independent SipHash chains concurrently per instruction. On hosts without AVX-512+VL, and for non-{20,36,68} input lengths, the batched arm falls back to four single-call invocations of dchest/siphash and remains bit-exact.
Realistic uplift target: modest on Rocket Lake (the dchest/siphash scalar path is already very fast, leaving little headroom); larger on AMD Zen 5 / Sapphire Rapids+ where the 4-lane parallel SipRound retires through a full-width 512-bit ALU without the AVX-512 frequency throttle.
Types ¶
type Hash256Fn ¶
Hash256Fn represents a full hash function with 32-byte output, such as crypto/sha256.Sum256. The function must compute the full hash over the input byte slice in one call; it must not retain a reference to the input slice after returning.
type Hash512Fn ¶
Hash512Fn represents a full hash function with 64-byte output, such as crypto/sha512.Sum512.
type Permute ¶
type Permute func(state []byte)
Permute is the type for an unkeyed permutation operating on a state buffer. The implementation must mutate state in place; state length is always rate + capacity bytes. Standard sponge permutations like Keccak-f[1600] or Ascon-p match this signature trivially.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
internal
|
|
|
aescmacasm
Package aescmacasm holds the AVX-512 + VAES fused chain-absorb kernel implementation of AES-CMAC for the parent hashes/ package.
|
Package aescmacasm holds the AVX-512 + VAES fused chain-absorb kernel implementation of AES-CMAC for the parent hashes/ package. |
|
blake2basm
Package blake2basm holds the AVX-512 + VL fused chain-absorb kernel implementation of BLAKE2b for the parent hashes/ package.
|
Package blake2basm holds the AVX-512 + VL fused chain-absorb kernel implementation of BLAKE2b for the parent hashes/ package. |
|
blake2sasm
Package blake2sasm holds the AVX-512 + VL fused chain-absorb kernel implementation of BLAKE2s for the parent hashes/ package.
|
Package blake2sasm holds the AVX-512 + VL fused chain-absorb kernel implementation of BLAKE2s for the parent hashes/ package. |
|
blake3asm
Package blake3asm holds the AVX-512 + VL fused chain-absorb kernel implementation of BLAKE3 for the parent hashes/ package.
|
Package blake3asm holds the AVX-512 + VL fused chain-absorb kernel implementation of BLAKE3 for the parent hashes/ package. |
|
chacha20asm
Package chacha20asm holds the AVX-512 + VL fused chain-absorb kernel implementation of ChaCha20 for the parent hashes/ package.
|
Package chacha20asm holds the AVX-512 + VL fused chain-absorb kernel implementation of ChaCha20 for the parent hashes/ package. |
|
siphashasm
Package siphashasm holds the AVX-512 + VL fused chain-absorb kernel implementation of SipHash-2-4-128 for the parent hashes/ package.
|
Package siphashasm holds the AVX-512 + VL fused chain-absorb kernel implementation of SipHash-2-4-128 for the parent hashes/ package. |