Documentation
¶
Overview ¶
Package specialize demonstrates the //hwy:specializes and //hwy:targets directives for architecture-specific template specialization.
This example shows a fused multiply-add (MulAdd) with a primary generic implementation and a NEON:asm specialization for half-precision types:
Primary (this file): handles float32/float64 on all targets using generic hwy operations. On NEON:asm this compiles to assembly via GOAT; on AVX2/AVX-512 it uses Go's simd package; on fallback it uses scalar code.
Specialization (muladd_half_base.go): adds Float16/BFloat16 on NEON:asm only, providing a body that the GOAT transpiler compiles to native fp16/bf16 instructions. These types are NOT available on AVX2/AVX-512/fallback targets.
The dispatch group "MulAdd" unifies both under a single MulAdd[T]() entry point.
Usage:
go generate ./... GOEXPERIMENT=simd go build
Index ¶
Constants ¶
This section is empty.
Variables ¶
var MulAddFloat32 func(x []float32, y []float32, out []float32)
var MulAddFloat64 func(x []float64, y []float64, out []float64)
Functions ¶
func BaseMulAdd ¶
BaseMulAdd computes element-wise fused multiply-add: out[i] += x[i] * y[i].
Uses SIMD FMA instructions for vectorized throughput across all targets. This primary generates for float32 and float64 only. Float16/BFloat16 are added by the specialization in muladd_half_base.go (NEON:asm only).
func BaseMulAddHalf ¶
BaseMulAddHalf computes element-wise fused multiply-add for half-precision types.
Float16 and BFloat16 aren't native to Go's simd package, so this specialization restricts to NEON assembly where the GOAT transpiler can emit native fp16/bf16 instructions. The function loads half-precision values, widens to float32 for the FMA computation, then narrows back.
On non-NEON targets (AVX2, AVX-512, fallback), Float16/BFloat16 are not generated -- callers should promote to float32 before dispatch.
func BaseMulAdd_fallback ¶
func MulAdd ¶
MulAdd computes element-wise fused multiply-add: out[i] += x[i] * y[i].
Uses SIMD FMA instructions for vectorized throughput across all targets. This primary generates for float32 and float64 only. Float16/BFloat16 are added by the specialization in muladd_half_base.go (NEON:asm only).
This function dispatches to the appropriate SIMD implementation at runtime.
Types ¶
This section is empty.