optimizers

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 28, 2023 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Overview

Package optimizers implements a collection of ML optimizers, that can be used by train.Trainer, or by themselves. They all implement optimizers.Interface.

Index

Constants

View Source
const (
	// AdamDefaultLearningRate is used by Adam if no learning rate is set.
	AdamDefaultLearningRate = 0.001

	// AdamDefaultScope is the default scope name for moments and step used by Adam.
	AdamDefaultScope = "AdamOptimizer"
)
View Source
const GlobalStepVariableName = "global_step"

GlobalStepVariableName as stored in context.Context, usually in the root scope -- but depends on the caller.

View Source
const LearningRateKey = "learning_rate"

LearningRateKey is the string key for learning rate in Context.Params.

View Source
const SgdDefaultLearningRate = 0.1

SgdDefaultLearningRate is the default learning rate used by the StochasticGradientDescent optimizer.

Variables

View Source
var (
	// KnownOptimizers is a map of known optimizers by name to their default constructors.
	// This provides an easy quick start point. One can hyperparameter-tune the optimizers
	// for usually slightly better results.
	KnownOptimizers = map[string]func() Interface{
		"sgd":    StochasticGradientDescent,
		"adam":   func() Interface { return Adam().Done() },
		"adamax": func() Interface { return Adam().Adamax().Done() },
		"adamw":  func() Interface { return Adam().WeightDecay(0.004).Done() },
	}
)

Functions

func IncrementGlobalStepGraph

func IncrementGlobalStepGraph(ctx *context.Context, g *Graph, dtype shapes.DType) *Node

IncrementGlobalStepGraph creates (if not there yet) a global step counter, and returns it incremented -- its first returned value will be 1.

It only builds the computation graph, no actual values are generated.

Typically, this is called by the optimizers UpdateGraph method.

func LearningRateVar

func LearningRateVar(ctx *context.Context, dtype shapes.DType, defaultValue float64) *context.Variable

LearningRateVar returns the learning rate variable -- a scalar value of the given dtype.

If variable doesn't exist yet, it will be created using the parameter LearningRateKey, if it is set, or the provided defaultValue (must be a scalar convertible to dtype) if not.

func LearningRateVarWithValue

func LearningRateVarWithValue(ctx *context.Context, dtype shapes.DType, value float64) *context.Variable

LearningRateVarWithValue creates (or reuses) variable for learning rate with the given value.

Types

type AdamConfig

type AdamConfig struct {
	// contains filtered or unexported fields
}

AdamConfig holds the configuration for an Adam configuration, create using Adam(), and once configured call Done to create an Adam based optimizer.Interface.

func Adam

func Adam() *AdamConfig

Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. According to [Kingma et al., 2014](http://arxiv.org/abs/1412.6980), the method is "*computationally efficient, has little memory requirement, invariant to diagonal rescaling of gradients, and is well suited for problems that are large in terms of data/parameters*".

It returns a configuration object that can be used to set its parameters. Once configured call IsNil, and it will return an optimizer.Interface.

func (*AdamConfig) Adamax

func (c *AdamConfig) Adamax() *AdamConfig

Adamax configure Adam to use a L-infinity (== max, which gives the name) for the second moment, instead of L2, as described in the same Adam paper.

func (*AdamConfig) Betas

func (c *AdamConfig) Betas(beta1, beta2 float64) *AdamConfig

Betas sets the two moving averages constants (exponential decays). They default to 0.9 and 0.999.

func (*AdamConfig) Done

func (c *AdamConfig) Done() Interface

Done will finish the configuration and construct an optimizer.Interface that implements Adam to specification.

func (*AdamConfig) Epsilon

func (c *AdamConfig) Epsilon(epsilon float64) *AdamConfig

Epsilon used on the denominator as a small constant for stability.

func (*AdamConfig) LearningRate

func (c *AdamConfig) LearningRate(value float64) *AdamConfig

LearningRate sets the base learning rate as a floating point value -- eventually converted to the same dtype as the loss.

Default is either the value of LearningRateKey ("learning_rate") global parameter in Context if defined, or 0.001 if not.

func (*AdamConfig) Scope

func (c *AdamConfig) Scope(name string) *AdamConfig

Scope defines the top-level scope to use to store the 1st and 2nd order moments of the gradients and the step number used by Adam optimizer. Generally this doesn't need to be changed, but if one is using multiple schedules, potentially with different loss functions (so the moments should be different), one can change.

It defaults to AdamDefaultScope.

func (*AdamConfig) WeightDecay

func (c *AdamConfig) WeightDecay(weightDecay float64) *AdamConfig

WeightDecay configure optimizer to work as AdamW, with the given static weight decay. This is because L2 regularization doesn't work well with Adam. TODO: (1) Allow certain variables to be excluded from weight decay (e.g: biases); (2) Allow dynamically calculated weight decay.

type CosineAnnealingOptions

type CosineAnnealingOptions struct {
	// contains filtered or unexported fields
}

CosineAnnealingOptions is returned by CosineAnnealingSchedule to configure the cosine annealing schedule strategy. When finished to configure, call `IsNil`.

func CosineAnnealingSchedule

func CosineAnnealingSchedule(ctx *context.Context, graph *Graph, dtype shapes.DType) *CosineAnnealingOptions

CosineAnnealingSchedule allows one to set up a cosine annealing schedule for the learning rate. See details https://paperswithcode.com/method/cosine-annealing.

It returns a CosineAnnealingOptions that can be configured. When finished configuring call `IsNil` and it will generate the computation graph that updates the learning rate at every training step.

Example with only one cycle (assuming `*flagNumSteps` is the number of training steps):

```

	func modelGraph(cxt *context.Context, inputs []*Node) *Node {
     graph := inputs[0].Graph()
		if *flagUseCosineSchedule {
			optimizers.CosineAnnealingSchedule(ctx, graph, types.Float32).PeriodInSteps(*flagNumSteps).IsNil()
		}
	}

```

func (*CosineAnnealingOptions) Done

func (opt *CosineAnnealingOptions) Done()

Done finalizes the configuration of CosineAnnealingSchedule and generates the computation graph code to implment it.

If invalid options are given, an error is raised in the Graph.

func (*CosineAnnealingOptions) LearningRate

func (opt *CosineAnnealingOptions) LearningRate(learningRate float64) *CosineAnnealingOptions

LearningRate at the start of the cosine cycle. If not given, it will try to read from the context params (keyed by LearningRateKey). If neither are set, it will fail and return an error in the context and graph.

func (*CosineAnnealingOptions) MinLearningRate

func (opt *CosineAnnealingOptions) MinLearningRate(minLearningRate float64) *CosineAnnealingOptions

MinLearningRate at the end of the cosine cycle. Defaults to 10^-3 * initial learning rate.

func (*CosineAnnealingOptions) PeriodInSteps

func (opt *CosineAnnealingOptions) PeriodInSteps(periodSteps int) *CosineAnnealingOptions

PeriodInSteps sets the number of steps for one period of the cosine schedule. The effective learning rate decreases over the given period of training steps, and then is restarted at each new period.

It's common to use only one period (so no annealing, just a cosine schedule), in which case just set to the number of steps that will be used for training.

There is no default yet, this value must be given, or an error will be issued in the graph and context.

type Interface

type Interface interface {
	// UpdateGraph is the function called during computation graph building, it
	// calculates the updates to the variables (weights) of the model needed for one
	// training step. It should return these updates.
	//
	// Variable values can be updated in graph building time (inside UpdateGraph) using Variable.SetValueGraph,
	// and the trainer (train.Trainer) will make sure these values are returned from the graph execution
	// and the materialized values used to update the variables (Variable.SetValue).
	//
	// ctx holds the variables to train (marked as trainable), the hyperparameters
	// used by the optimizer (in `ctx.Params`) and non-trainable variables
	// that the optimizer itself may create. One should scope it (context.Context.In("<some scope name>"))
	// to avoid naming conflicts on the variables created -- notice that
	// some complex training scheduling scheme may have more than one optimizer
	// on the same Context object.
	//
	// loss must be a scalar value.
	UpdateGraph(ctx *context.Context, graph *Graph, loss *Node)
}

Interface implemented by optimizer implementations.

func MustOptimizerByName

func MustOptimizerByName(optName string) Interface

MustOptimizerByName returns an optimizer given the name, or log.Fatal if one does not exist. It uses KnownOptimizers -- in case one wants to better handle invalid values.

Example usage:

``` var flagOptimizer = flag.String("optimizer", "adamw", fmt.Sprintf("Optimizer, options: %q", types.SortedKeys(optimizers.KnownOptimizers)))

...

trainer := train.NewTrainer(manager, ctx, ModelGraph,
   losses.SomeLoss,
   optimizers.MustOptimizerByName(*flagOptimizer),
   []metrics.Interface{someMetric},    // trainMetrics
   []metrics.Interface{otherMetric})   // evalMetrics

```

func StochasticGradientDescent

func StochasticGradientDescent() Interface

StochasticGradientDescent creates an optimizer that performs SGD. It looks for "learning_rate" in Context.Params for the initial learning rate, otherwise it defaults to SgdDefaultLearningRate.

It has a decay of learning rate given by: `learning_rate = initial_learning_rate / Sqrt(global_step)`

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL