ops

package

v0.0.0-...-6093f96 Latest Latest Go to latest Published: Apr 24, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/computerex/dlgo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package ops provides optimized mathematical operations for neural network inference.

All functions operate on flat float32 slices. Functions with "InPlace" in the name modify their input directly. Functions requiring output buffers take them as the first parameter (dst convention).

Index ¶

func Abs(out, a []float32)
func Add(out, a, b []float32)
func AddBias(dst, bias []float32)
func AddScaled(dst []float32, scale float32, src []float32, n int)
func ApplyFrequencyPresencePenalty(logits []float32, recentTokens []int32, freqPenalty, presencePenalty float32)
func ApplyMinP(logits []float32, minP float32)
func ApplyNgramBlocking(logits []float32, recentTokens []int32, ngramSize int)
func ApplyRepetitionPenalty(logits []float32, recentTokens []int32, penalty float32)
func ApplyRoPE(vec []float32, pos int, headDim int, freqBase float32, neox bool)
func ApplyRoPEBatch(qFlat []float32, numHeads int, kFlat []float32, numKVHeads int, ...)
func ApplyRoPEFromTable(vec []float32, pos int, headDim int, cosTable, sinTable []float32, neox bool)
func ApplyTemperature(logits []float32, temp float32)
func ApplyTopK(logits []float32, k int)
func ApplyTopP(logits []float32, p float32)
func Argmax(x []float32) int
func BatchNormInference(out, x, gamma, beta, runMean, runVar []float32, channels int, eps float32)
func Clamp(x []float32, minVal, maxVal float32)
func Clear(s []float32)
func Concat(a, b []float32) []float32
func CopySlice(dst, src []float32)
func DotProduct(a, b []float32, n int) float32
func ELU(x []float32, alpha float32)
func FastExp(x float32) float32
func FastTanh(x float32) float32
func GELU(x []float32)
func GLU(out, input []float32, dim int)
func GeGLU(out, gate, up []float32, n int)
func GroupNorm(out, x, w, b []float32, numGroups, groupSize int, eps float32)
func HardSigmoid(x []float32)
func HardSwish(x []float32)
func LayerNorm(out, x, w, b []float32, eps float32)
func LeakyReLU(x []float32, alpha float32)
func LogSoftmax(x []float32)
func MatVecMul(out, W, x []float32, outDim, inDim int)
func Max(out, a, b []float32)
func Min(out, a, b []float32)
func Mish(x []float32)
func Mul(out, a, b []float32)
func Neg(out, a []float32)
func Pow(out, a []float32, p float32)
func RMSNorm(out, x, w []float32, eps float32)
func RMSNormInPlace(x, w []float32, eps float32)
func ReLU(x []float32)
func Reciprocal(out, a []float32)
func ReduceMax(a []float32) float32
func ReduceMean(a []float32) float32
func ReduceMin(a []float32) float32
func ReduceSum(a []float32) float32
func ResidualLayerNorm(outNorm, residual, prev, sub, w, b []float32, scale, eps float32)
func RoPEFrequencyTable(maxLen, headDim int, freqBase float32) (cosTable, sinTable []float32)
func SampleToken(logits []float32, cfg SamplerConfig, recentTokens []int32, rng *rand.Rand) int
func ScalarAdd(out, a []float32, s float32)
func ScalarMul(out, a []float32, s float32)
func Scale(x []float32, s float32)
func SiLU(x []float32)
func Sigmoid(x float32) float32
func SigmoidExact(x []float32)
func SigmoidFast(x float32) float32
func Softmax(x []float32)
func Sqrt(out, a []float32)
func Sub(out, a, b []float32)
func SwiGLU(out, gate, up []float32, n int)
func SwiGLU_OAI(out, gate, up []float32, n int, alpha, limit float32)
func TanhExact(x []float32)
func TopKIndices(vals []float32, k int) []int
func Where(out []float32, cond []bool, trueVal, falseVal []float32)
func YaRNRoPEFrequencyTable(maxLen, headDim int, freqBase float32, factor float32, origMaxPos int, ...) (cosTable, sinTable []float32)
type SamplerConfig
- func DefaultSamplerConfig() SamplerConfig

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Abs ¶

func Abs(out, a []float32)

Abs computes element-wise absolute value: out[i] = |a[i]|.

func Add ¶

func Add(out, a, b []float32)

Add performs element-wise addition: out[i] = a[i] + b[i].

func AddBias ¶

func AddBias(dst, bias []float32)

AddBias adds a bias vector to dst in-place: dst[i] += bias[i].

func AddScaled ¶

func AddScaled(dst []float32, scale float32, src []float32, n int)

AddScaled computes dst[i] += scale * src[i] with 8x unrolling.

func ApplyFrequencyPresencePenalty ¶

func ApplyFrequencyPresencePenalty(logits []float32, recentTokens []int32, freqPenalty, presencePenalty float32)

ApplyFrequencyPresencePenalty applies frequency and presence penalties. Frequency penalty subtracts freq_penalty * count for each token. Presence penalty subtracts presence_penalty once for any token that appeared.

func ApplyMinP ¶

func ApplyMinP(logits []float32, minP float32)

func ApplyNgramBlocking ¶

func ApplyNgramBlocking(logits []float32, recentTokens []int32, ngramSize int)

ApplyNgramBlocking prevents generation of any n-gram that already appeared in recentTokens. Sets the logit of the token that would complete a repeated n-gram to -inf.

func ApplyRepetitionPenalty ¶

func ApplyRepetitionPenalty(logits []float32, recentTokens []int32, penalty float32)

func ApplyRoPE ¶

func ApplyRoPE(vec []float32, pos int, headDim int, freqBase float32, neox bool)

ApplyRoPE applies Rotary Positional Embedding to a single attention head vector.

vec:      [headDim] — one Q or K head vector (modified in-place)
pos:      absolute position of this token
headDim:  dimension per attention head
freqBase: RoPE frequency base (10000.0 for LLaMA, 1000000.0 for some Qwen)
neox:     if true, use split-half pairing (GPT-NeoX/Qwen style);
          if false, use interleaved pairing (LLaMA style)

func ApplyRoPEBatch ¶

func ApplyRoPEBatch(qFlat []float32, numHeads int, kFlat []float32, numKVHeads int,
	pos, headDim int, freqBase float32, neox bool)

ApplyRoPEBatch applies RoPE to all Q and K heads at a given position.

qFlat: [numHeads * headDim]   — all Q heads concatenated
kFlat: [numKVHeads * headDim] — all K heads concatenated

func ApplyRoPEFromTable ¶

func ApplyRoPEFromTable(vec []float32, pos int, headDim int, cosTable, sinTable []float32, neox bool)

ApplyRoPEFromTable applies RoPE using precomputed cos/sin tables. Faster than ApplyRoPE for repeated calls at different positions.

func ApplyTemperature ¶

func ApplyTemperature(logits []float32, temp float32)

func ApplyTopK ¶

func ApplyTopK(logits []float32, k int)

func ApplyTopP ¶

func ApplyTopP(logits []float32, p float32)

func Argmax ¶

func Argmax(x []float32) int

Argmax returns the index of the maximum value. Returns -1 for empty slices.

func BatchNormInference ¶

func BatchNormInference(out, x, gamma, beta, runMean, runVar []float32, channels int, eps float32)

BatchNormInference applies Batch Normalization in inference mode. Uses precomputed running mean and variance (no batch statistics).

x, out: [channels] (or [channels * spatial] if applied per-channel)
gamma, beta: [channels] (learned scale/shift)
runMean, runVar: [channels] (running statistics from training)

func Clamp ¶

func Clamp(x []float32, minVal, maxVal float32)

Clamp clamps all values in x to the range [minVal, maxVal] in-place.

func Clear ¶

func Clear(s []float32)

Clear zeros out a float32 slice.

func Concat ¶

func Concat(a, b []float32) []float32

Concat concatenates two slices.

func CopySlice ¶

func CopySlice(dst, src []float32)

CopySlice copies src into dst (min of both lengths).

func DotProduct ¶

func DotProduct(a, b []float32, n int) float32

DotProduct computes the dot product of two float32 slices with 8x loop unrolling.

func ELU ¶

func ELU(x []float32, alpha float32)

ELU applies Exponential Linear Unit in-place.

x[i] = x[i]              if x[i] >= 0
x[i] = alpha*(exp(x)-1)  if x[i] < 0

func FastExp ¶

func FastExp(x float32) float32

FastExp computes a fast approximation of exp(x) in float32 using Schraudolph's algorithm. ~0.3% max error, much faster than math.Exp(float64(x)).

func FastTanh ¶

func FastTanh(x float32) float32

FastTanh computes a fast approximation of tanh(x) = (exp(2x) - 1) / (exp(2x) + 1).

func GELU ¶

func GELU(x []float32)

GELU applies Gaussian Error Linear Unit activation in-place.

gelu(x) = 0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))

func GLU ¶

func GLU(out, input []float32, dim int)

GLU applies Gated Linear Unit. Input has size 2*dim; it is split into two halves and computes first_half * sigmoid(second_half). out must have length >= dim.

func GeGLU ¶

func GeGLU(out, gate, up []float32, n int)

GeGLU computes the GELU-gated activation used by Gemma.

out[i] = GELU(gate[i]) * up[i]

func GroupNorm ¶

func GroupNorm(out, x, w, b []float32, numGroups, groupSize int, eps float32)

GroupNorm applies Group Normalization (used in diffusion models, U-Nets).

x: [numGroups * groupSize] flat
w, b: [numGroups * groupSize] (affine parameters)
out: [numGroups * groupSize]

Each group of channels is independently normalized.

func HardSigmoid ¶

func HardSigmoid(x []float32)

HardSigmoid applies hard sigmoid in-place: clamp((x + 3) / 6, 0, 1). Fast approximation used in MobileNetV3.

func HardSwish ¶

func HardSwish(x []float32)

HardSwish applies hard swish in-place: x * hard_sigmoid(x). Used in MobileNetV3 and EfficientNet.

func LayerNorm ¶

func LayerNorm(out, x, w, b []float32, eps float32)

LayerNorm applies standard Layer Normalization.

out[i] = w[i] * (x[i] - mean) / sqrt(var + eps) + b[i]

func LeakyReLU ¶

func LeakyReLU(x []float32, alpha float32)

LeakyReLU applies Leaky ReLU in-place: x[i] = max(alpha*x[i], x[i]). Commonly used in GANs and some discriminator networks.

func LogSoftmax ¶

func LogSoftmax(x []float32)

LogSoftmax computes numerically stable log-softmax in-place.

func MatVecMul ¶

func MatVecMul(out, W, x []float32, outDim, inDim int)

MatVecMul performs float32 matrix-vector multiply: out[r] = dot(W[r,:], x). W is stored row-major as [outDim * inDim].

func Max ¶

func Max(out, a, b []float32)

Max performs element-wise maximum: out[i] = max(a[i], b[i]).

func Min ¶

func Min(out, a, b []float32)

Min performs element-wise minimum: out[i] = min(a[i], b[i]).

func Mish ¶

func Mish(x []float32)

Mish applies Mish activation in-place: x * tanh(softplus(x)). Used in YOLOv4 and some modern architectures.

func Mul ¶

func Mul(out, a, b []float32)

Mul performs element-wise multiplication: out[i] = a[i] * b[i].

func Neg ¶

func Neg(out, a []float32)

Neg negates all elements: out[i] = -a[i].

func Pow ¶

func Pow(out, a []float32, p float32)

Pow computes element-wise power: out[i] = a[i] ^ p.

func RMSNorm ¶

func RMSNorm(out, x, w []float32, eps float32)

RMSNorm applies Root Mean Square Layer Normalization.

out[i] = w[i] * x[i] / sqrt(mean(x²) + eps)

Used by LLaMA, Qwen, Gemma, Mistral, and most modern LLMs instead of LayerNorm. Simpler and faster than LayerNorm: no mean subtraction, no bias term.

func RMSNormInPlace ¶

func RMSNormInPlace(x, w []float32, eps float32)

RMSNormInPlace applies RMS normalization in-place (modifies x). Used for QK-norm in models like Qwen3.

func ReLU ¶

func ReLU(x []float32)

ReLU applies ReLU activation in-place: x[i] = max(0, x[i]).

func Reciprocal ¶

func Reciprocal(out, a []float32)

Reciprocal computes element-wise reciprocal: out[i] = 1 / a[i].

func ReduceMax ¶

func ReduceMax(a []float32) float32

ReduceMax returns the maximum value.

func ReduceMean ¶

func ReduceMean(a []float32) float32

ReduceMean returns the arithmetic mean.

func ReduceMin ¶

func ReduceMin(a []float32) float32

ReduceMin returns the minimum value.

func ReduceSum ¶

func ReduceSum(a []float32) float32

ReduceSum returns the sum of all elements.

func ResidualLayerNorm ¶

func ResidualLayerNorm(outNorm, residual, prev, sub, w, b []float32, scale, eps float32)

ResidualLayerNorm fuses residual addition and LayerNorm:

residual[i] = prev[i] + scale*sub[i]
outNorm = LayerNorm(residual)

func RoPEFrequencyTable ¶

func RoPEFrequencyTable(maxLen, headDim int, freqBase float32) (cosTable, sinTable []float32)

RoPEFrequencyTable precomputes the cos/sin tables for RoPE. Returns (cosTable, sinTable) each of shape [maxLen * headDim/2]. Use with ApplyRoPEFromTable for faster inference when processing many positions.

func SampleToken ¶

func SampleToken(logits []float32, cfg SamplerConfig, recentTokens []int32, rng *rand.Rand) int

func ScalarAdd ¶

func ScalarAdd(out, a []float32, s float32)

ScalarAdd adds a scalar to all elements: out[i] = a[i] + s.

func ScalarMul ¶

func ScalarMul(out, a []float32, s float32)

ScalarMul multiplies all elements by a scalar: out[i] = a[i] * s.

func Scale ¶

func Scale(x []float32, s float32)

Scale multiplies all elements in x by s in-place.

func SiLU ¶

func SiLU(x []float32)

SiLU applies SiLU (Swish) activation in-place: x[i] = x[i] * sigmoid(x[i]).

func Sigmoid ¶

func Sigmoid(x float32) float32

Sigmoid computes 1 / (1 + exp(-x)) using exact math.

func SigmoidExact ¶

func SigmoidExact(x []float32)

SigmoidExact applies exact sigmoid via math.Exp (for when precision matters more than speed).

func SigmoidFast ¶

func SigmoidFast(x float32) float32

SigmoidFast computes 1 / (1 + exp(-x)) using Schraudolph approximation.

func Softmax ¶

func Softmax(x []float32)

Softmax computes softmax in-place with numerical stability. Uses standard math.Exp matching llama.cpp's ggml_vec_soft_max_f32. Sum is accumulated in float64 for precision.

func Sqrt ¶

func Sqrt(out, a []float32)

Sqrt computes element-wise square root: out[i] = sqrt(a[i]).

func Sub ¶

func Sub(out, a, b []float32)

Sub performs element-wise subtraction: out[i] = a[i] - b[i].

func SwiGLU ¶

func SwiGLU(out, gate, up []float32, n int)

SwiGLU computes the SwiGLU gated activation used by LLaMA, Qwen, Mistral.

out[i] = SiLU(gate[i]) * up[i]

gate and up are two parallel FFN projections; SwiGLU fuses the activation.

func SwiGLU_OAI ¶

func SwiGLU_OAI(out, gate, up []float32, n int, alpha, limit float32)

SwiGLU_OAI computes the OpenAI MoE variant of SwiGLU (gpt-oss). Matches llama.cpp ggml_compute_forward_swiglu_oai_f32 exactly:

x = min(gate, limit)
y = clamp(up, -limit, limit)
out_glu = x / (1 + exp(alpha * (-x)))
out = out_glu * (y + 1)

func TanhExact ¶

func TanhExact(x []float32)

TanhExact applies exact tanh via math.Tanh (for when precision matters more than speed).

func TopKIndices ¶

func TopKIndices(vals []float32, k int) []int

TopKIndices returns the indices of the k largest values. Uses simple selection for small k; for production use with large k, a heap-based variant is preferred.

func Where ¶

func Where(out []float32, cond []bool, trueVal, falseVal []float32)

Where performs element-wise conditional selection:

out[i] = trueVal[i]  if cond[i]
out[i] = falseVal[i] if !cond[i]

func YaRNRoPEFrequencyTable ¶

func YaRNRoPEFrequencyTable(maxLen, headDim int, freqBase float32,
	factor float32, origMaxPos int, betaFast, betaSlow, extFactor, attnFactor float32) (cosTable, sinTable []float32)

YaRNRoPEFrequencyTable precomputes cos/sin tables with YaRN scaling. Matches llama.cpp rope_yarn exactly:

extFactor controls the ramp blend (0=pure interpolation, 1=full YaRN blend)
attnFactor is the base magnitude scaling (multiplied with log correction)
Dimensions below corrDims[0] keep original frequencies, above corrDims[1] are fully interpolated, in between smoothly blends via ramp * extFactor.

Types ¶

type SamplerConfig ¶

type SamplerConfig struct {
	Temperature       float32
	TopK              int
	TopP              float32
	MinP              float32
	RepetitionPenalty float32
	FrequencyPenalty  float32
	PresencePenalty   float32
	NoRepeatNgramSize int
}

func DefaultSamplerConfig ¶

func DefaultSamplerConfig() SamplerConfig

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL