Documentation
¶
Overview ¶
Package ops provides optimized mathematical operations for neural network inference.
All functions operate on flat float32 slices. Functions with "InPlace" in the name modify their input directly. Functions requiring output buffers take them as the first parameter (dst convention).
Index ¶
- func Abs(out, a []float32)
- func Add(out, a, b []float32)
- func AddBias(dst, bias []float32)
- func AddScaled(dst []float32, scale float32, src []float32, n int)
- func ApplyFrequencyPresencePenalty(logits []float32, recentTokens []int32, freqPenalty, presencePenalty float32)
- func ApplyMinP(logits []float32, minP float32)
- func ApplyNgramBlocking(logits []float32, recentTokens []int32, ngramSize int)
- func ApplyRepetitionPenalty(logits []float32, recentTokens []int32, penalty float32)
- func ApplyRoPE(vec []float32, pos int, headDim int, freqBase float32, neox bool)
- func ApplyRoPEBatch(qFlat []float32, numHeads int, kFlat []float32, numKVHeads int, ...)
- func ApplyRoPEFromTable(vec []float32, pos int, headDim int, cosTable, sinTable []float32, neox bool)
- func ApplyTemperature(logits []float32, temp float32)
- func ApplyTopK(logits []float32, k int)
- func ApplyTopP(logits []float32, p float32)
- func Argmax(x []float32) int
- func BatchNormInference(out, x, gamma, beta, runMean, runVar []float32, channels int, eps float32)
- func Clamp(x []float32, minVal, maxVal float32)
- func Clear(s []float32)
- func Concat(a, b []float32) []float32
- func CopySlice(dst, src []float32)
- func DotProduct(a, b []float32, n int) float32
- func ELU(x []float32, alpha float32)
- func FastExp(x float32) float32
- func FastTanh(x float32) float32
- func GELU(x []float32)
- func GLU(out, input []float32, dim int)
- func GeGLU(out, gate, up []float32, n int)
- func GroupNorm(out, x, w, b []float32, numGroups, groupSize int, eps float32)
- func HardSigmoid(x []float32)
- func HardSwish(x []float32)
- func LayerNorm(out, x, w, b []float32, eps float32)
- func LeakyReLU(x []float32, alpha float32)
- func LogSoftmax(x []float32)
- func MatVecMul(out, W, x []float32, outDim, inDim int)
- func Max(out, a, b []float32)
- func Min(out, a, b []float32)
- func Mish(x []float32)
- func Mul(out, a, b []float32)
- func Neg(out, a []float32)
- func Pow(out, a []float32, p float32)
- func RMSNorm(out, x, w []float32, eps float32)
- func RMSNormInPlace(x, w []float32, eps float32)
- func ReLU(x []float32)
- func Reciprocal(out, a []float32)
- func ReduceMax(a []float32) float32
- func ReduceMean(a []float32) float32
- func ReduceMin(a []float32) float32
- func ReduceSum(a []float32) float32
- func ResidualLayerNorm(outNorm, residual, prev, sub, w, b []float32, scale, eps float32)
- func RoPEFrequencyTable(maxLen, headDim int, freqBase float32) (cosTable, sinTable []float32)
- func SampleToken(logits []float32, cfg SamplerConfig, recentTokens []int32, rng *rand.Rand) int
- func ScalarAdd(out, a []float32, s float32)
- func ScalarMul(out, a []float32, s float32)
- func Scale(x []float32, s float32)
- func SiLU(x []float32)
- func Sigmoid(x float32) float32
- func SigmoidExact(x []float32)
- func SigmoidFast(x float32) float32
- func Softmax(x []float32)
- func Sqrt(out, a []float32)
- func Sub(out, a, b []float32)
- func SwiGLU(out, gate, up []float32, n int)
- func SwiGLU_OAI(out, gate, up []float32, n int, alpha, limit float32)
- func TanhExact(x []float32)
- func TopKIndices(vals []float32, k int) []int
- func Where(out []float32, cond []bool, trueVal, falseVal []float32)
- func YaRNRoPEFrequencyTable(maxLen, headDim int, freqBase float32, factor float32, origMaxPos int, ...) (cosTable, sinTable []float32)
- type SamplerConfig
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AddBias ¶
func AddBias(dst, bias []float32)
AddBias adds a bias vector to dst in-place: dst[i] += bias[i].
func ApplyFrequencyPresencePenalty ¶
func ApplyFrequencyPresencePenalty(logits []float32, recentTokens []int32, freqPenalty, presencePenalty float32)
ApplyFrequencyPresencePenalty applies frequency and presence penalties. Frequency penalty subtracts freq_penalty * count for each token. Presence penalty subtracts presence_penalty once for any token that appeared.
func ApplyNgramBlocking ¶
ApplyNgramBlocking prevents generation of any n-gram that already appeared in recentTokens. Sets the logit of the token that would complete a repeated n-gram to -inf.
func ApplyRepetitionPenalty ¶
func ApplyRoPE ¶
ApplyRoPE applies Rotary Positional Embedding to a single attention head vector.
vec: [headDim] — one Q or K head vector (modified in-place)
pos: absolute position of this token
headDim: dimension per attention head
freqBase: RoPE frequency base (10000.0 for LLaMA, 1000000.0 for some Qwen)
neox: if true, use split-half pairing (GPT-NeoX/Qwen style);
if false, use interleaved pairing (LLaMA style)
func ApplyRoPEBatch ¶
func ApplyRoPEBatch(qFlat []float32, numHeads int, kFlat []float32, numKVHeads int, pos, headDim int, freqBase float32, neox bool)
ApplyRoPEBatch applies RoPE to all Q and K heads at a given position.
qFlat: [numHeads * headDim] — all Q heads concatenated kFlat: [numKVHeads * headDim] — all K heads concatenated
func ApplyRoPEFromTable ¶
func ApplyRoPEFromTable(vec []float32, pos int, headDim int, cosTable, sinTable []float32, neox bool)
ApplyRoPEFromTable applies RoPE using precomputed cos/sin tables. Faster than ApplyRoPE for repeated calls at different positions.
func ApplyTemperature ¶
func BatchNormInference ¶
BatchNormInference applies Batch Normalization in inference mode. Uses precomputed running mean and variance (no batch statistics).
x, out: [channels] (or [channels * spatial] if applied per-channel) gamma, beta: [channels] (learned scale/shift) runMean, runVar: [channels] (running statistics from training)
func CopySlice ¶
func CopySlice(dst, src []float32)
CopySlice copies src into dst (min of both lengths).
func DotProduct ¶
DotProduct computes the dot product of two float32 slices with 8x loop unrolling.
func ELU ¶
ELU applies Exponential Linear Unit in-place.
x[i] = x[i] if x[i] >= 0 x[i] = alpha*(exp(x)-1) if x[i] < 0
func FastExp ¶
FastExp computes a fast approximation of exp(x) in float32 using Schraudolph's algorithm. ~0.3% max error, much faster than math.Exp(float64(x)).
func GELU ¶
func GELU(x []float32)
GELU applies Gaussian Error Linear Unit activation in-place.
gelu(x) = 0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))
func GLU ¶
GLU applies Gated Linear Unit. Input has size 2*dim; it is split into two halves and computes first_half * sigmoid(second_half). out must have length >= dim.
func GroupNorm ¶
GroupNorm applies Group Normalization (used in diffusion models, U-Nets).
x: [numGroups * groupSize] flat w, b: [numGroups * groupSize] (affine parameters) out: [numGroups * groupSize]
Each group of channels is independently normalized.
func HardSigmoid ¶
func HardSigmoid(x []float32)
HardSigmoid applies hard sigmoid in-place: clamp((x + 3) / 6, 0, 1). Fast approximation used in MobileNetV3.
func HardSwish ¶
func HardSwish(x []float32)
HardSwish applies hard swish in-place: x * hard_sigmoid(x). Used in MobileNetV3 and EfficientNet.
func LayerNorm ¶
LayerNorm applies standard Layer Normalization.
out[i] = w[i] * (x[i] - mean) / sqrt(var + eps) + b[i]
func LeakyReLU ¶
LeakyReLU applies Leaky ReLU in-place: x[i] = max(alpha*x[i], x[i]). Commonly used in GANs and some discriminator networks.
func LogSoftmax ¶
func LogSoftmax(x []float32)
LogSoftmax computes numerically stable log-softmax in-place.
func MatVecMul ¶
MatVecMul performs float32 matrix-vector multiply: out[r] = dot(W[r,:], x). W is stored row-major as [outDim * inDim].
func Max ¶
func Max(out, a, b []float32)
Max performs element-wise maximum: out[i] = max(a[i], b[i]).
func Min ¶
func Min(out, a, b []float32)
Min performs element-wise minimum: out[i] = min(a[i], b[i]).
func Mish ¶
func Mish(x []float32)
Mish applies Mish activation in-place: x * tanh(softplus(x)). Used in YOLOv4 and some modern architectures.
func Mul ¶
func Mul(out, a, b []float32)
Mul performs element-wise multiplication: out[i] = a[i] * b[i].
func RMSNorm ¶
RMSNorm applies Root Mean Square Layer Normalization.
out[i] = w[i] * x[i] / sqrt(mean(x²) + eps)
Used by LLaMA, Qwen, Gemma, Mistral, and most modern LLMs instead of LayerNorm. Simpler and faster than LayerNorm: no mean subtraction, no bias term.
func RMSNormInPlace ¶
RMSNormInPlace applies RMS normalization in-place (modifies x). Used for QK-norm in models like Qwen3.
func Reciprocal ¶
func Reciprocal(out, a []float32)
Reciprocal computes element-wise reciprocal: out[i] = 1 / a[i].
func ResidualLayerNorm ¶
ResidualLayerNorm fuses residual addition and LayerNorm:
residual[i] = prev[i] + scale*sub[i] outNorm = LayerNorm(residual)
func RoPEFrequencyTable ¶
RoPEFrequencyTable precomputes the cos/sin tables for RoPE. Returns (cosTable, sinTable) each of shape [maxLen * headDim/2]. Use with ApplyRoPEFromTable for faster inference when processing many positions.
func SampleToken ¶
func SiLU ¶
func SiLU(x []float32)
SiLU applies SiLU (Swish) activation in-place: x[i] = x[i] * sigmoid(x[i]).
func SigmoidExact ¶
func SigmoidExact(x []float32)
SigmoidExact applies exact sigmoid via math.Exp (for when precision matters more than speed).
func SigmoidFast ¶
SigmoidFast computes 1 / (1 + exp(-x)) using Schraudolph approximation.
func Softmax ¶
func Softmax(x []float32)
Softmax computes softmax in-place with numerical stability. Uses standard math.Exp matching llama.cpp's ggml_vec_soft_max_f32. Sum is accumulated in float64 for precision.
func Sqrt ¶
func Sqrt(out, a []float32)
Sqrt computes element-wise square root: out[i] = sqrt(a[i]).
func Sub ¶
func Sub(out, a, b []float32)
Sub performs element-wise subtraction: out[i] = a[i] - b[i].
func SwiGLU ¶
SwiGLU computes the SwiGLU gated activation used by LLaMA, Qwen, Mistral.
out[i] = SiLU(gate[i]) * up[i]
gate and up are two parallel FFN projections; SwiGLU fuses the activation.
func SwiGLU_OAI ¶
SwiGLU_OAI computes the OpenAI MoE variant of SwiGLU (gpt-oss). Matches llama.cpp ggml_compute_forward_swiglu_oai_f32 exactly:
x = min(gate, limit) y = clamp(up, -limit, limit) out_glu = x / (1 + exp(alpha * (-x))) out = out_glu * (y + 1)
func TanhExact ¶
func TanhExact(x []float32)
TanhExact applies exact tanh via math.Tanh (for when precision matters more than speed).
func TopKIndices ¶
TopKIndices returns the indices of the k largest values. Uses simple selection for small k; for production use with large k, a heap-based variant is preferred.
func Where ¶
Where performs element-wise conditional selection:
out[i] = trueVal[i] if cond[i] out[i] = falseVal[i] if !cond[i]
func YaRNRoPEFrequencyTable ¶
func YaRNRoPEFrequencyTable(maxLen, headDim int, freqBase float32, factor float32, origMaxPos int, betaFast, betaSlow, extFactor, attnFactor float32) (cosTable, sinTable []float32)
YaRNRoPEFrequencyTable precomputes cos/sin tables with YaRN scaling. Matches llama.cpp rope_yarn exactly:
- extFactor controls the ramp blend (0=pure interpolation, 1=full YaRN blend)
- attnFactor is the base magnitude scaling (multiplied with log correction)
- Dimensions below corrDims[0] keep original frequencies, above corrDims[1] are fully interpolated, in between smoothly blends via ramp * extFactor.
Types ¶
type SamplerConfig ¶
type SamplerConfig struct {
Temperature float32
TopK int
TopP float32
MinP float32
RepetitionPenalty float32
FrequencyPenalty float32
PresencePenalty float32
NoRepeatNgramSize int
}
func DefaultSamplerConfig ¶
func DefaultSamplerConfig() SamplerConfig