Documentation
¶
Overview ¶
Package optimizer provides various optimization algorithms for neural networks.
Package optimizer provides neural network optimizers including AdamW and SGD.
Stability: beta
Package optimizer provides various optimization algorithms for neural networks.
Package optimizer provides various optimization algorithms for neural networks.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AdamW ¶
AdamW implements the AdamW optimizer.
func NewAdamW ¶
func NewAdamW[T tensor.Numeric](engine compute.Engine[T], learningRate, beta1, beta2, epsilon, weightDecay T) *AdamW[T]
NewAdamW creates a new AdamW optimizer.
func (*AdamW[T]) SetLR ¶ added in v1.8.0
func (a *AdamW[T]) SetLR(lr T)
SetLR sets the learning rate. This is typically called by a scheduler.
func (*AdamW[T]) SetMaxGradNorm ¶ added in v1.11.0
SetMaxGradNorm sets the maximum gradient norm for gradient clipping. If maxGradNorm <= 0, gradient clipping is disabled.
type AdamW8bit ¶ added in v1.5.0
AdamW8bit implements the AdamW optimizer with block-wise INT8 quantization for first and second moment estimates. Parameters remain in full precision. This reduces optimizer state memory by ~4x compared to FP32 AdamW.
func NewAdamW8bit ¶ added in v1.5.0
NewAdamW8bit creates a new 8-bit AdamW optimizer.
type EMA ¶ added in v0.2.1
EMA wraps an Optimizer with Exponential Moving Average weight averaging. After each inner optimizer step, it updates shadow weights:
shadow = decay * shadow + (1-decay) * param.Value
Call SwapShadow before validation to use averaged weights, then SwapBack to restore training weights.
func (*EMA[T]) Step ¶ added in v0.2.1
Step runs the inner optimizer step and then updates shadow weights.
type Int8State ¶ added in v1.5.0
type Int8State struct {
// contains filtered or unexported fields
}
Int8State holds a block-wise INT8-quantized representation of a float32 slice. Each block of blockSize elements shares a single scale factor, reducing memory from 4 bytes/element to ~1 byte/element (+ negligible scale overhead).
type Optimizer ¶
type Optimizer[T tensor.Numeric] interface { Step(ctx context.Context, params []*graph.Parameter[T]) error }
Optimizer defines the interface for optimization algorithms.
type SGD ¶
SGD implements the stochastic gradient descent optimizer.
func NewSGD ¶
func NewSGD[T tensor.Numeric](engine compute.Engine[T], ops numeric.Arithmetic[T], learningRate float32) *SGD[T]
NewSGD creates a new SGD optimizer.
type SWA ¶ added in v0.2.1
SWA wraps an Optimizer with Stochastic Weight Averaging. Unlike EMA which averages every step, SWA averages at epoch boundaries. Call UpdateAverage at the end of each epoch (after startEpoch). Call SwapWeights before validation to use averaged weights.
func (*SWA[T]) NAveraged ¶ added in v0.2.1
NAveraged returns the number of checkpoints averaged so far.
func (*SWA[T]) SwapWeights ¶ added in v0.2.1
SwapWeights swaps live params with averaged params.
func (*SWA[T]) UpdateAverage ¶ added in v0.2.1
UpdateAverage updates the running average of parameters. Should be called at the end of each epoch. Only averages when epoch >= startEpoch. Formula: avg = avg + (param - avg) / (n + 1)