ops

package
v0.0.0-...-6093f96 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 24, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package ops provides optimized mathematical operations for neural network inference.

All functions operate on flat float32 slices. Functions with "InPlace" in the name modify their input directly. Functions requiring output buffers take them as the first parameter (dst convention).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Abs

func Abs(out, a []float32)

Abs computes element-wise absolute value: out[i] = |a[i]|.

func Add

func Add(out, a, b []float32)

Add performs element-wise addition: out[i] = a[i] + b[i].

func AddBias

func AddBias(dst, bias []float32)

AddBias adds a bias vector to dst in-place: dst[i] += bias[i].

func AddScaled

func AddScaled(dst []float32, scale float32, src []float32, n int)

AddScaled computes dst[i] += scale * src[i] with 8x unrolling.

func ApplyFrequencyPresencePenalty

func ApplyFrequencyPresencePenalty(logits []float32, recentTokens []int32, freqPenalty, presencePenalty float32)

ApplyFrequencyPresencePenalty applies frequency and presence penalties. Frequency penalty subtracts freq_penalty * count for each token. Presence penalty subtracts presence_penalty once for any token that appeared.

func ApplyMinP

func ApplyMinP(logits []float32, minP float32)

func ApplyNgramBlocking

func ApplyNgramBlocking(logits []float32, recentTokens []int32, ngramSize int)

ApplyNgramBlocking prevents generation of any n-gram that already appeared in recentTokens. Sets the logit of the token that would complete a repeated n-gram to -inf.

func ApplyRepetitionPenalty

func ApplyRepetitionPenalty(logits []float32, recentTokens []int32, penalty float32)

func ApplyRoPE

func ApplyRoPE(vec []float32, pos int, headDim int, freqBase float32, neox bool)

ApplyRoPE applies Rotary Positional Embedding to a single attention head vector.

vec:      [headDim] — one Q or K head vector (modified in-place)
pos:      absolute position of this token
headDim:  dimension per attention head
freqBase: RoPE frequency base (10000.0 for LLaMA, 1000000.0 for some Qwen)
neox:     if true, use split-half pairing (GPT-NeoX/Qwen style);
          if false, use interleaved pairing (LLaMA style)

func ApplyRoPEBatch

func ApplyRoPEBatch(qFlat []float32, numHeads int, kFlat []float32, numKVHeads int,
	pos, headDim int, freqBase float32, neox bool)

ApplyRoPEBatch applies RoPE to all Q and K heads at a given position.

qFlat: [numHeads * headDim]   — all Q heads concatenated
kFlat: [numKVHeads * headDim] — all K heads concatenated

func ApplyRoPEFromTable

func ApplyRoPEFromTable(vec []float32, pos int, headDim int, cosTable, sinTable []float32, neox bool)

ApplyRoPEFromTable applies RoPE using precomputed cos/sin tables. Faster than ApplyRoPE for repeated calls at different positions.

func ApplyTemperature

func ApplyTemperature(logits []float32, temp float32)

func ApplyTopK

func ApplyTopK(logits []float32, k int)

func ApplyTopP

func ApplyTopP(logits []float32, p float32)

func Argmax

func Argmax(x []float32) int

Argmax returns the index of the maximum value. Returns -1 for empty slices.

func BatchNormInference

func BatchNormInference(out, x, gamma, beta, runMean, runVar []float32, channels int, eps float32)

BatchNormInference applies Batch Normalization in inference mode. Uses precomputed running mean and variance (no batch statistics).

x, out: [channels] (or [channels * spatial] if applied per-channel)
gamma, beta: [channels] (learned scale/shift)
runMean, runVar: [channels] (running statistics from training)

func Clamp

func Clamp(x []float32, minVal, maxVal float32)

Clamp clamps all values in x to the range [minVal, maxVal] in-place.

func Clear

func Clear(s []float32)

Clear zeros out a float32 slice.

func Concat

func Concat(a, b []float32) []float32

Concat concatenates two slices.

func CopySlice

func CopySlice(dst, src []float32)

CopySlice copies src into dst (min of both lengths).

func DotProduct

func DotProduct(a, b []float32, n int) float32

DotProduct computes the dot product of two float32 slices with 8x loop unrolling.

func ELU

func ELU(x []float32, alpha float32)

ELU applies Exponential Linear Unit in-place.

x[i] = x[i]              if x[i] >= 0
x[i] = alpha*(exp(x)-1)  if x[i] < 0

func FastExp

func FastExp(x float32) float32

FastExp computes a fast approximation of exp(x) in float32 using Schraudolph's algorithm. ~0.3% max error, much faster than math.Exp(float64(x)).

func FastTanh

func FastTanh(x float32) float32

FastTanh computes a fast approximation of tanh(x) = (exp(2x) - 1) / (exp(2x) + 1).

func GELU

func GELU(x []float32)

GELU applies Gaussian Error Linear Unit activation in-place.

gelu(x) = 0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))

func GLU

func GLU(out, input []float32, dim int)

GLU applies Gated Linear Unit. Input has size 2*dim; it is split into two halves and computes first_half * sigmoid(second_half). out must have length >= dim.

func GeGLU

func GeGLU(out, gate, up []float32, n int)

GeGLU computes the GELU-gated activation used by Gemma.

out[i] = GELU(gate[i]) * up[i]

func GroupNorm

func GroupNorm(out, x, w, b []float32, numGroups, groupSize int, eps float32)

GroupNorm applies Group Normalization (used in diffusion models, U-Nets).

x: [numGroups * groupSize] flat
w, b: [numGroups * groupSize] (affine parameters)
out: [numGroups * groupSize]

Each group of channels is independently normalized.

func HardSigmoid

func HardSigmoid(x []float32)

HardSigmoid applies hard sigmoid in-place: clamp((x + 3) / 6, 0, 1). Fast approximation used in MobileNetV3.

func HardSwish

func HardSwish(x []float32)

HardSwish applies hard swish in-place: x * hard_sigmoid(x). Used in MobileNetV3 and EfficientNet.

func LayerNorm

func LayerNorm(out, x, w, b []float32, eps float32)

LayerNorm applies standard Layer Normalization.

out[i] = w[i] * (x[i] - mean) / sqrt(var + eps) + b[i]

func LeakyReLU

func LeakyReLU(x []float32, alpha float32)

LeakyReLU applies Leaky ReLU in-place: x[i] = max(alpha*x[i], x[i]). Commonly used in GANs and some discriminator networks.

func LogSoftmax

func LogSoftmax(x []float32)

LogSoftmax computes numerically stable log-softmax in-place.

func MatVecMul

func MatVecMul(out, W, x []float32, outDim, inDim int)

MatVecMul performs float32 matrix-vector multiply: out[r] = dot(W[r,:], x). W is stored row-major as [outDim * inDim].

func Max

func Max(out, a, b []float32)

Max performs element-wise maximum: out[i] = max(a[i], b[i]).

func Min

func Min(out, a, b []float32)

Min performs element-wise minimum: out[i] = min(a[i], b[i]).

func Mish

func Mish(x []float32)

Mish applies Mish activation in-place: x * tanh(softplus(x)). Used in YOLOv4 and some modern architectures.

func Mul

func Mul(out, a, b []float32)

Mul performs element-wise multiplication: out[i] = a[i] * b[i].

func Neg

func Neg(out, a []float32)

Neg negates all elements: out[i] = -a[i].

func Pow

func Pow(out, a []float32, p float32)

Pow computes element-wise power: out[i] = a[i] ^ p.

func RMSNorm

func RMSNorm(out, x, w []float32, eps float32)

RMSNorm applies Root Mean Square Layer Normalization.

out[i] = w[i] * x[i] / sqrt(mean(x²) + eps)

Used by LLaMA, Qwen, Gemma, Mistral, and most modern LLMs instead of LayerNorm. Simpler and faster than LayerNorm: no mean subtraction, no bias term.

func RMSNormInPlace

func RMSNormInPlace(x, w []float32, eps float32)

RMSNormInPlace applies RMS normalization in-place (modifies x). Used for QK-norm in models like Qwen3.

func ReLU

func ReLU(x []float32)

ReLU applies ReLU activation in-place: x[i] = max(0, x[i]).

func Reciprocal

func Reciprocal(out, a []float32)

Reciprocal computes element-wise reciprocal: out[i] = 1 / a[i].

func ReduceMax

func ReduceMax(a []float32) float32

ReduceMax returns the maximum value.

func ReduceMean

func ReduceMean(a []float32) float32

ReduceMean returns the arithmetic mean.

func ReduceMin

func ReduceMin(a []float32) float32

ReduceMin returns the minimum value.

func ReduceSum

func ReduceSum(a []float32) float32

ReduceSum returns the sum of all elements.

func ResidualLayerNorm

func ResidualLayerNorm(outNorm, residual, prev, sub, w, b []float32, scale, eps float32)

ResidualLayerNorm fuses residual addition and LayerNorm:

residual[i] = prev[i] + scale*sub[i]
outNorm = LayerNorm(residual)

func RoPEFrequencyTable

func RoPEFrequencyTable(maxLen, headDim int, freqBase float32) (cosTable, sinTable []float32)

RoPEFrequencyTable precomputes the cos/sin tables for RoPE. Returns (cosTable, sinTable) each of shape [maxLen * headDim/2]. Use with ApplyRoPEFromTable for faster inference when processing many positions.

func SampleToken

func SampleToken(logits []float32, cfg SamplerConfig, recentTokens []int32, rng *rand.Rand) int

func ScalarAdd

func ScalarAdd(out, a []float32, s float32)

ScalarAdd adds a scalar to all elements: out[i] = a[i] + s.

func ScalarMul

func ScalarMul(out, a []float32, s float32)

ScalarMul multiplies all elements by a scalar: out[i] = a[i] * s.

func Scale

func Scale(x []float32, s float32)

Scale multiplies all elements in x by s in-place.

func SiLU

func SiLU(x []float32)

SiLU applies SiLU (Swish) activation in-place: x[i] = x[i] * sigmoid(x[i]).

func Sigmoid

func Sigmoid(x float32) float32

Sigmoid computes 1 / (1 + exp(-x)) using exact math.

func SigmoidExact

func SigmoidExact(x []float32)

SigmoidExact applies exact sigmoid via math.Exp (for when precision matters more than speed).

func SigmoidFast

func SigmoidFast(x float32) float32

SigmoidFast computes 1 / (1 + exp(-x)) using Schraudolph approximation.

func Softmax

func Softmax(x []float32)

Softmax computes softmax in-place with numerical stability. Uses standard math.Exp matching llama.cpp's ggml_vec_soft_max_f32. Sum is accumulated in float64 for precision.

func Sqrt

func Sqrt(out, a []float32)

Sqrt computes element-wise square root: out[i] = sqrt(a[i]).

func Sub

func Sub(out, a, b []float32)

Sub performs element-wise subtraction: out[i] = a[i] - b[i].

func SwiGLU

func SwiGLU(out, gate, up []float32, n int)

SwiGLU computes the SwiGLU gated activation used by LLaMA, Qwen, Mistral.

out[i] = SiLU(gate[i]) * up[i]

gate and up are two parallel FFN projections; SwiGLU fuses the activation.

func SwiGLU_OAI

func SwiGLU_OAI(out, gate, up []float32, n int, alpha, limit float32)

SwiGLU_OAI computes the OpenAI MoE variant of SwiGLU (gpt-oss). Matches llama.cpp ggml_compute_forward_swiglu_oai_f32 exactly:

x = min(gate, limit)
y = clamp(up, -limit, limit)
out_glu = x / (1 + exp(alpha * (-x)))
out = out_glu * (y + 1)

func TanhExact

func TanhExact(x []float32)

TanhExact applies exact tanh via math.Tanh (for when precision matters more than speed).

func TopKIndices

func TopKIndices(vals []float32, k int) []int

TopKIndices returns the indices of the k largest values. Uses simple selection for small k; for production use with large k, a heap-based variant is preferred.

func Where

func Where(out []float32, cond []bool, trueVal, falseVal []float32)

Where performs element-wise conditional selection:

out[i] = trueVal[i]  if cond[i]
out[i] = falseVal[i] if !cond[i]

func YaRNRoPEFrequencyTable

func YaRNRoPEFrequencyTable(maxLen, headDim int, freqBase float32,
	factor float32, origMaxPos int, betaFast, betaSlow, extFactor, attnFactor float32) (cosTable, sinTable []float32)

YaRNRoPEFrequencyTable precomputes cos/sin tables with YaRN scaling. Matches llama.cpp rope_yarn exactly:

  • extFactor controls the ramp blend (0=pure interpolation, 1=full YaRN blend)
  • attnFactor is the base magnitude scaling (multiplied with log correction)
  • Dimensions below corrDims[0] keep original frequencies, above corrDims[1] are fully interpolated, in between smoothly blends via ramp * extFactor.

Types

type SamplerConfig

type SamplerConfig struct {
	Temperature       float32
	TopK              int
	TopP              float32
	MinP              float32
	RepetitionPenalty float32
	FrequencyPenalty  float32
	PresencePenalty   float32
	NoRepeatNgramSize int
}

func DefaultSamplerConfig

func DefaultSamplerConfig() SamplerConfig

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL