specialize

package

v0.0.12 Latest Latest Go to latest Published: Mar 9, 2026 License: Apache-2.0 Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ajroetker/go-highway

Links

Open Source Insights

Documentation ¶

Overview ¶

Package specialize demonstrates the //hwy:specializes and //hwy:targets directives for architecture-specific template specialization.

This example shows a fused multiply-add (MulAdd) with a primary generic implementation and a NEON:asm specialization for half-precision types:

Primary (this file): handles float32/float64 on all targets using generic hwy operations. On NEON:asm this compiles to assembly via GOAT; on AVX2/AVX-512 it uses Go's simd package; on fallback it uses scalar code.
Specialization (muladd_half_base.go): adds Float16/BFloat16 on NEON:asm only, providing a body that the GOAT transpiler compiles to native fp16/bf16 instructions. These types are NOT available on AVX2/AVX-512/fallback targets.

The dispatch group "MulAdd" unifies both under a single MulAdd[T]() entry point.

Usage:

go generate ./...
GOEXPERIMENT=simd go build

Index ¶

Variables
func BaseMulAdd[T hwy.Floats](x, y, out []T)
func BaseMulAddHalf[T hwy.Floats](x, y, out []T)
func BaseMulAdd_fallback(x []float32, y []float32, out []float32)
func BaseMulAdd_fallback_Float64(x []float64, y []float64, out []float64)
func MulAdd[T hwy.Floats](x []T, y []T, out []T)

Constants ¶

This section is empty.

Variables ¶

View Source

var MulAddBFloat16 func(x []hwy.BFloat16, y []hwy.BFloat16, out []hwy.BFloat16)

View Source

var MulAddFloat16 func(x []hwy.Float16, y []hwy.Float16, out []hwy.Float16)

View Source

var MulAddFloat32 func(x []float32, y []float32, out []float32)

View Source

var MulAddFloat64 func(x []float64, y []float64, out []float64)

Functions ¶

func BaseMulAdd ¶

func BaseMulAdd[T hwy.Floats](x, y, out []T)

BaseMulAdd computes element-wise fused multiply-add: out[i] += x[i] * y[i].

Uses SIMD FMA instructions for vectorized throughput across all targets. This primary generates for float32 and float64 only. Float16/BFloat16 are added by the specialization in muladd_half_base.go (NEON:asm only).

func BaseMulAddHalf ¶

func BaseMulAddHalf[T hwy.Floats](x, y, out []T)

BaseMulAddHalf computes element-wise fused multiply-add for half-precision types.

Float16 and BFloat16 aren't native to Go's simd package, so this specialization restricts to NEON assembly where the GOAT transpiler can emit native fp16/bf16 instructions. The function loads half-precision values, widens to float32 for the FMA computation, then narrows back.

On non-NEON targets (AVX2, AVX-512, fallback), Float16/BFloat16 are not generated -- callers should promote to float32 before dispatch.

func BaseMulAdd_fallback ¶

func BaseMulAdd_fallback(x []float32, y []float32, out []float32)

func BaseMulAdd_fallback_Float64 ¶

func BaseMulAdd_fallback_Float64(x []float64, y []float64, out []float64)

func MulAdd ¶

func MulAdd[T hwy.Floats](x []T, y []T, out []T)

MulAdd computes element-wise fused multiply-add: out[i] += x[i] * y[i].

Uses SIMD FMA instructions for vectorized throughput across all targets. This primary generates for float32 and float64 only. Float16/BFloat16 are added by the specialization in muladd_half_base.go (NEON:asm only).

This function dispatches to the appropriate SIMD implementation at runtime.

Types ¶

This section is empty.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL