specialize

package
v0.0.12 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 9, 2026 License: Apache-2.0 Imports: 1 Imported by: 0

Documentation

Overview

Package specialize demonstrates the //hwy:specializes and //hwy:targets directives for architecture-specific template specialization.

This example shows a fused multiply-add (MulAdd) with a primary generic implementation and a NEON:asm specialization for half-precision types:

  • Primary (this file): handles float32/float64 on all targets using generic hwy operations. On NEON:asm this compiles to assembly via GOAT; on AVX2/AVX-512 it uses Go's simd package; on fallback it uses scalar code.

  • Specialization (muladd_half_base.go): adds Float16/BFloat16 on NEON:asm only, providing a body that the GOAT transpiler compiles to native fp16/bf16 instructions. These types are NOT available on AVX2/AVX-512/fallback targets.

The dispatch group "MulAdd" unifies both under a single MulAdd[T]() entry point.

Usage:

go generate ./...
GOEXPERIMENT=simd go build

Index

Constants

This section is empty.

Variables

View Source
var MulAddBFloat16 func(x []hwy.BFloat16, y []hwy.BFloat16, out []hwy.BFloat16)
View Source
var MulAddFloat16 func(x []hwy.Float16, y []hwy.Float16, out []hwy.Float16)
View Source
var MulAddFloat32 func(x []float32, y []float32, out []float32)
View Source
var MulAddFloat64 func(x []float64, y []float64, out []float64)

Functions

func BaseMulAdd

func BaseMulAdd[T hwy.Floats](x, y, out []T)

BaseMulAdd computes element-wise fused multiply-add: out[i] += x[i] * y[i].

Uses SIMD FMA instructions for vectorized throughput across all targets. This primary generates for float32 and float64 only. Float16/BFloat16 are added by the specialization in muladd_half_base.go (NEON:asm only).

func BaseMulAddHalf

func BaseMulAddHalf[T hwy.Floats](x, y, out []T)

BaseMulAddHalf computes element-wise fused multiply-add for half-precision types.

Float16 and BFloat16 aren't native to Go's simd package, so this specialization restricts to NEON assembly where the GOAT transpiler can emit native fp16/bf16 instructions. The function loads half-precision values, widens to float32 for the FMA computation, then narrows back.

On non-NEON targets (AVX2, AVX-512, fallback), Float16/BFloat16 are not generated -- callers should promote to float32 before dispatch.

func BaseMulAdd_fallback

func BaseMulAdd_fallback(x []float32, y []float32, out []float32)

func BaseMulAdd_fallback_Float64

func BaseMulAdd_fallback_Float64(x []float64, y []float64, out []float64)

func MulAdd

func MulAdd[T hwy.Floats](x []T, y []T, out []T)

MulAdd computes element-wise fused multiply-add: out[i] += x[i] * y[i].

Uses SIMD FMA instructions for vectorized throughput across all targets. This primary generates for float32 and float64 only. Float16/BFloat16 are added by the specialization in muladd_half_base.go (NEON:asm only).

This function dispatches to the appropriate SIMD implementation at runtime.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL