optimizers

package

v0.1.0 Latest Latest Go to latest Published: Apr 28, 2023 License: Apache-2.0 Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/gomlx/gomlx

Links

Open Source Insights

Documentation ¶

Overview ¶

Package optimizers implements a collection of ML optimizers, that can be used by train.Trainer, or by themselves. They all implement optimizers.Interface.

Index ¶

Constants
Variables
func IncrementGlobalStepGraph(ctx *context.Context, g *Graph, dtype shapes.DType) *Node
func LearningRateVar(ctx *context.Context, dtype shapes.DType, defaultValue float64) *context.Variable
func LearningRateVarWithValue(ctx *context.Context, dtype shapes.DType, value float64) *context.Variable
type AdamConfig
- func Adam() *AdamConfig
type CosineAnnealingOptions
- func CosineAnnealingSchedule(ctx *context.Context, graph *Graph, dtype shapes.DType) *CosineAnnealingOptions
type Interface
- func MustOptimizerByName(optName string) Interface
- func StochasticGradientDescent() Interface

Constants ¶

View Source

const (
	// AdamDefaultLearningRate is used by Adam if no learning rate is set.
	AdamDefaultLearningRate = 0.001

	// AdamDefaultScope is the default scope name for moments and step used by Adam.
	AdamDefaultScope = "AdamOptimizer"
)

View Source

const GlobalStepVariableName = "global_step"

GlobalStepVariableName as stored in context.Context, usually in the root scope -- but depends on the caller.

View Source

const LearningRateKey = "learning_rate"

LearningRateKey is the string key for learning rate in Context.Params.

View Source

const SgdDefaultLearningRate = 0.1

SgdDefaultLearningRate is the default learning rate used by the StochasticGradientDescent optimizer.

Variables ¶

View Source

var (
	// KnownOptimizers is a map of known optimizers by name to their default constructors.
	// This provides an easy quick start point. One can hyperparameter-tune the optimizers
	// for usually slightly better results.
	KnownOptimizers = map[string]func() Interface{
		"sgd":    StochasticGradientDescent,
		"adam":   func() Interface { return Adam().Done() },
		"adamax": func() Interface { return Adam().Adamax().Done() },
		"adamw":  func() Interface { return Adam().WeightDecay(0.004).Done() },
	}
)

Functions ¶

func IncrementGlobalStepGraph ¶

func IncrementGlobalStepGraph(ctx *context.Context, g *Graph, dtype shapes.DType) *Node

IncrementGlobalStepGraph creates (if not there yet) a global step counter, and returns it incremented -- its first returned value will be 1.

It only builds the computation graph, no actual values are generated.

Typically, this is called by the optimizers UpdateGraph method.

func LearningRateVar ¶

func LearningRateVar(ctx *context.Context, dtype shapes.DType, defaultValue float64) *context.Variable

LearningRateVar returns the learning rate variable -- a scalar value of the given dtype.

If variable doesn't exist yet, it will be created using the parameter LearningRateKey, if it is set, or the provided defaultValue (must be a scalar convertible to dtype) if not.

func LearningRateVarWithValue ¶

func LearningRateVarWithValue(ctx *context.Context, dtype shapes.DType, value float64) *context.Variable

LearningRateVarWithValue creates (or reuses) variable for learning rate with the given value.

Types ¶

type AdamConfig ¶

type AdamConfig struct {
	// contains filtered or unexported fields
}

AdamConfig holds the configuration for an Adam configuration, create using Adam(), and once configured call Done to create an Adam based optimizer.Interface.

func Adam ¶

func Adam() *AdamConfig

Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. According to [Kingma et al., 2014](http://arxiv.org/abs/1412.6980), the method is "*computationally efficient, has little memory requirement, invariant to diagonal rescaling of gradients, and is well suited for problems that are large in terms of data/parameters*".

It returns a configuration object that can be used to set its parameters. Once configured call IsNil, and it will return an optimizer.Interface.

func (*AdamConfig) Adamax ¶

func (c *AdamConfig) Adamax() *AdamConfig

Adamax configure Adam to use a L-infinity (== max, which gives the name) for the second moment, instead of L2, as described in the same Adam paper.

func (*AdamConfig) Betas ¶

func (c *AdamConfig) Betas(beta1, beta2 float64) *AdamConfig

Betas sets the two moving averages constants (exponential decays). They default to 0.9 and 0.999.

func (*AdamConfig) Done ¶

func (c *AdamConfig) Done() Interface

Done will finish the configuration and construct an optimizer.Interface that implements Adam to specification.

func (*AdamConfig) Epsilon ¶

func (c *AdamConfig) Epsilon(epsilon float64) *AdamConfig

Epsilon used on the denominator as a small constant for stability.

func (*AdamConfig) LearningRate ¶

func (c *AdamConfig) LearningRate(value float64) *AdamConfig

LearningRate sets the base learning rate as a floating point value -- eventually converted to the same dtype as the loss.

Default is either the value of LearningRateKey ("learning_rate") global parameter in Context if defined, or 0.001 if not.

func (*AdamConfig) Scope ¶

func (c *AdamConfig) Scope(name string) *AdamConfig

Scope defines the top-level scope to use to store the 1st and 2nd order moments of the gradients and the step number used by Adam optimizer. Generally this doesn't need to be changed, but if one is using multiple schedules, potentially with different loss functions (so the moments should be different), one can change.

It defaults to AdamDefaultScope.

func (*AdamConfig) WeightDecay ¶

func (c *AdamConfig) WeightDecay(weightDecay float64) *AdamConfig

WeightDecay configure optimizer to work as AdamW, with the given static weight decay. This is because L2 regularization doesn't work well with Adam. TODO: (1) Allow certain variables to be excluded from weight decay (e.g: biases); (2) Allow dynamically calculated weight decay.

type CosineAnnealingOptions ¶

type CosineAnnealingOptions struct {
	// contains filtered or unexported fields
}

CosineAnnealingOptions is returned by CosineAnnealingSchedule to configure the cosine annealing schedule strategy. When finished to configure, call `IsNil`.

func CosineAnnealingSchedule ¶

func CosineAnnealingSchedule(ctx *context.Context, graph *Graph, dtype shapes.DType) *CosineAnnealingOptions

CosineAnnealingSchedule allows one to set up a cosine annealing schedule for the learning rate. See details https://paperswithcode.com/method/cosine-annealing.

It returns a CosineAnnealingOptions that can be configured. When finished configuring call `IsNil` and it will generate the computation graph that updates the learning rate at every training step.

Example with only one cycle (assuming `*flagNumSteps` is the number of training steps):

```

	func modelGraph(cxt *context.Context, inputs []*Node) *Node {
     graph := inputs[0].Graph()
		if *flagUseCosineSchedule {
			optimizers.CosineAnnealingSchedule(ctx, graph, types.Float32).PeriodInSteps(*flagNumSteps).IsNil()
		}
	}

```

func (*CosineAnnealingOptions) Done ¶

func (opt *CosineAnnealingOptions) Done()

Done finalizes the configuration of CosineAnnealingSchedule and generates the computation graph code to implment it.

If invalid options are given, an error is raised in the Graph.

func (*CosineAnnealingOptions) LearningRate ¶

func (opt *CosineAnnealingOptions) LearningRate(learningRate float64) *CosineAnnealingOptions

LearningRate at the start of the cosine cycle. If not given, it will try to read from the context params (keyed by LearningRateKey). If neither are set, it will fail and return an error in the context and graph.

func (*CosineAnnealingOptions) MinLearningRate ¶

func (opt *CosineAnnealingOptions) MinLearningRate(minLearningRate float64) *CosineAnnealingOptions

MinLearningRate at the end of the cosine cycle. Defaults to 10^-3 * initial learning rate.

func (*CosineAnnealingOptions) PeriodInSteps ¶

func (opt *CosineAnnealingOptions) PeriodInSteps(periodSteps int) *CosineAnnealingOptions

PeriodInSteps sets the number of steps for one period of the cosine schedule. The effective learning rate decreases over the given period of training steps, and then is restarted at each new period.

It's common to use only one period (so no annealing, just a cosine schedule), in which case just set to the number of steps that will be used for training.

There is no default yet, this value must be given, or an error will be issued in the graph and context.

type Interface ¶

type Interface interface {
	// UpdateGraph is the function called during computation graph building, it
	// calculates the updates to the variables (weights) of the model needed for one
	// training step. It should return these updates.
	//
	// Variable values can be updated in graph building time (inside UpdateGraph) using Variable.SetValueGraph,
	// and the trainer (train.Trainer) will make sure these values are returned from the graph execution
	// and the materialized values used to update the variables (Variable.SetValue).
	//
	// ctx holds the variables to train (marked as trainable), the hyperparameters
	// used by the optimizer (in `ctx.Params`) and non-trainable variables
	// that the optimizer itself may create. One should scope it (context.Context.In("<some scope name>"))
	// to avoid naming conflicts on the variables created -- notice that
	// some complex training scheduling scheme may have more than one optimizer
	// on the same Context object.
	//
	// loss must be a scalar value.
	UpdateGraph(ctx *context.Context, graph *Graph, loss *Node)
}

Interface implemented by optimizer implementations.

func MustOptimizerByName ¶

func MustOptimizerByName(optName string) Interface

MustOptimizerByName returns an optimizer given the name, or log.Fatal if one does not exist. It uses KnownOptimizers -- in case one wants to better handle invalid values.

Example usage:

``` var flagOptimizer = flag.String("optimizer", "adamw", fmt.Sprintf("Optimizer, options: %q", types.SortedKeys(optimizers.KnownOptimizers)))

...

trainer := train.NewTrainer(manager, ctx, ModelGraph,
   losses.SomeLoss,
   optimizers.MustOptimizerByName(*flagOptimizer),
   []metrics.Interface{someMetric},    // trainMetrics
   []metrics.Interface{otherMetric})   // evalMetrics

```

func StochasticGradientDescent ¶

func StochasticGradientDescent() Interface

StochasticGradientDescent creates an optimizer that performs SGD. It looks for "learning_rate" in Context.Params for the initial learning rate, otherwise it defaults to SgdDefaultLearningRate.

It has a decay of learning rate given by: `learning_rate = initial_learning_rate / Sqrt(global_step)`

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL