layers

package
v0.13.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 7, 2024 License: Apache-2.0 Imports: 16 Imported by: 0

Documentation

Overview

Package layers holds a collection of common modeling layers. It includes dense layer, convolutions (TODO), activation functions, dropout (TODO), etc.

A small convention on naming: typically layers are nouns (like "Convolution", "Dense" (layer), "MultiHeadAttention"), while computations are usually verbs ("Convolve", "Reduce..", "Multiply (Mul)", etc.).

Index

Constants

View Source
const (
	// ParamL2Regularization context hyperparameter defines the L2 regularization of kernels.
	// Each layer may decide independently to implement it or not.
	//
	// This is an alias to regularizers.ParamL2
	// Dense, DenseWithBias, FNN, kan and Convolution kernels look at this hyperparameter.
	// The value should be a float64.
	// The default is `0.0`.
	//
	// Deprecated: use regularizers.ParamL2
	ParamL2Regularization = "l2_regularization"

	// ParamDropoutRate context hyperparameter defines the amount of dropout applied when DropoutFromContext is used.
	// Should be a value from `0.0` to `1.0`, where 0 means no dropout, and 1 would drop everything out.
	//
	// It is only applied if `Context.IsTraining() == true`, that is, during evaluation/inference it is
	// ignored.
	//
	// The default is `0.0`, which means no dropout.
	ParamDropoutRate = "dropout_rate"
)

Variables

View Source
var (
	// ParamLayerNormEpsilon is the context parameter that defines the default layer normalization epsilon value.
	// The default is 1e-3.
	ParamLayerNormEpsilon = "layer_norm_epsilon"

	// ParamLayerNormCenter is the context parameter that defines whether to center the norm by default.
	// The default is true.
	ParamLayerNormCenter = "layer_norm_center"

	// ParamLayerNormLearnedGain is the context parameter that defines whether to learn a gain for the
	// layer norm, that multiplies its output.
	// The default is true.
	ParamLayerNormLearnedGain = "layer_norm_learned_gain"

	// ParamLayerNormLearnedScale is an alias to ParamLayerNormLearnedGain.
	// Deprecated: renamed to follow original papers nomenclature.
	ParamLayerNormLearnedScale = ParamLayerNormLearnedGain

	// ParamLayerNormRescale is the context parameter that defines whether to rescale the layer
	// by dividing it by the square root of the variance.
	// The default is true.
	ParamLayerNormRescale = "layer_norm_rescale"

	// ParamLayerNormL2Regularization is the context parameter that defines the amount of L2 regularization
	// to apply to the learned gain, if one is defined.
	// The default is 0.0.
	ParamLayerNormL2Regularization = "layer_norm_l2_regularization"
)
View Source
var (
	// KnownNormalizers is a map of normalizer string to a function that applies them
	// with the default values, with the feature axis set to -1. This will only work
	// for the most standard problems, since anything with a different shape will need
	// special feature axes configuration for each normalization technique.
	//
	// It includes "none", which is a no-op.
	//
	// Notice that some normalizers use variables, and they need to be unique
	// in their scope (`Context.In(scope)`) -- except if one wants to deliberately share
	// normalization variables across more than one application.
	KnownNormalizers = map[string]func(ctx *context.Context, input *Node) *Node{
		"batch": func(ctx *context.Context, input *Node) *Node {
			return batchnorm.New(ctx, input, -1).Done()
		},
		"layer": func(ctx *context.Context, input *Node) *Node {
			return LayerNormalization(ctx, input, -1).Done()
		},
		"none": func(ctx *context.Context, input *Node) *Node {
			return input
		},
	}

	// ParamNormalization context hyperparameter defines the type of normalization to use
	// between layers of a neural network.
	//
	// It is used if the model calls NormalizeFromContext or MaskedNormalizeFromContext on the embeddings in
	// between layers.
	// This is usually applied after a residual sum (but model choices varies).
	//
	// Valid values are "layer" for [LayerNormalization], "batch" for [batchnorm.New] or "none"".
	//
	// Notice that this won't work for special shapes setups.
	// [New] will normalize on the batch axis (assumed to be axis-0), and
	// [LayerNormalization] will normalize across the layer values, assumed to be the last.
	//
	// The default is `layer`.
	ParamNormalization = "normalization"
)

Functions

func AddL2Regularization deprecated

func AddL2Regularization(ctx *context.Context, amount *Node, values ...*Node)

AddL2Regularization calculates the L2 of the given values (typically variable nodes returned by context.Variable.ValueGraph()), scale by the given amount (typically a constant) and then train.AddLoss the resulting value, having the effect of regularizing the weights (variables).

Deprecated: use package regularizers instead.

func AddL2RegularizationStatic deprecated added in v0.9.0

func AddL2RegularizationStatic(ctx *context.Context, amount float64, values ...*Node)

AddL2RegularizationStatic is like AddL2Regularization, but takes the `amount` as a static Go float64 value.

Deprecated: use package regularizers instead.

func AssertQuantilesForPWLCalibrationValid added in v0.5.0

func AssertQuantilesForPWLCalibrationValid[T constraints.Ordered](values []T)

AssertQuantilesForPWLCalibrationValid validates that raw values for quantiles are ok to be used for PieceWiseLinearCalibration. It checks for:

  • Enough data points.
  • Monotonicity of data points: quantiles should always be increasing.

Errors are reported back with `panic`.

func Dense

func Dense(ctx *context.Context, input *Node, useBias bool, outputDimensions ...int) *Node

Dense adds a single dense linear layer, a learnable linear transformation. Optionally, it can include a bias term.

It automatically adds regularization to the weights (not to biases) configured in hyperparameters -- see regularizers.FromContext.

It the input has shape `[<batch dimensions...>, featureDimension]`, the output will have shape `[<batch dimensions...>, <outputDimensions...>]`.

See also FNN for a more configurable (including hidden layers) version.

func DenseWithBias

func DenseWithBias(ctx *context.Context, input *Node, outputDimensions ...int) *Node

DenseWithBias adds a single dense linear layer, a learnable linear transformation plus a bias term.

It the input has shape `[<batch dimensions...>, featureDimension]`, the output will have shape `[<batch dimensions...>, <outputDimensions...>]`.

See also FNN for a more configurable (including hidden layers) version.

func Dropout

func Dropout(ctx *context.Context, input *Node, dropoutRate *Node) *Node

Dropout randomly replace the input with zeros if ctx.IsTraining() is true. Otherwise, it's a no op (it returns input). If the input is float, it scales the output by 1/(1-dropoutRate) to preserve the mean of the values of the input.

func DropoutFromContext added in v0.9.0

func DropoutFromContext(ctx *context.Context, x *Node) *Node

DropoutFromContext applies a dropout configured in the context parameters keyed by ParamDropoutRate.

If it is 0.0 this is a no-op. If `Context.IsTraining() == false` this is also a no-op, so it doesn't impact evaluation or inference.

func DropoutNormalize

func DropoutNormalize(ctx *context.Context, input *Node, dropoutRate *Node, normalize bool) *Node

DropoutNormalize randomly replace the input with zeros if ctx.IsTraining() is true. Otherwise, it's a no op (it returns input). If normalize is set, it scales the output by 1/(1-dropoutRate) to preserve the mean of the input values.

func DropoutStatic added in v0.9.0

func DropoutStatic(ctx *context.Context, input *Node, dropoutRate float64) *Node

DropoutStatic is the same as Dropout, but it takes the `dropoutRate` as a static value, given as a float64. If `dropoutRate <= 0` or it's not training, this is a no-op.

func Embedding

func Embedding(ctx *context.Context, input *Node, dtype dtypes.DType, vocabSize, dimension int) *Node

Embedding creates an embedding table with vocabSize elements (typically a vocabulary size) each of dimension values -- so a [vocabSize, dimension] variable table.

It then converts each integer value of the input to an embedding of the given dimension size. The input must have an integer dtype, and the last dimension must be of size 1. If it's not of size one, an extra dimension is added to the end. All values of the input must smaller than vocabSize, otherwise it will fail -- no checking is explicitly made.

The output has rank one larger than the input, with the last dimension the same as the embedding dimension.

func MaskedNormalizeFromContext added in v0.9.0

func MaskedNormalizeFromContext(ctx *context.Context, input, mask *Node) *Node

MaskedNormalizeFromContext applies a normalization (or none) according to the hyperparameter ParamNormalization configured in the context. The `mask` is actually optional, and can be set to nil if not using a mask.

This is not recommended for images, since one may want to normalize over specific axes.

func MustNormalizeByName

func MustNormalizeByName(ctx *context.Context, normalization string, input *Node) *Node

MustNormalizeByName applies the requested normalization using default parameters. If an invalid normalization is given, it panics with an error.

This will only work for the most standard problems, since anything with a different shape will need special feature axes configuration for each normalization technique.

It's a simple wrapper around KnownNormalizers, if one wants to handle errors, just check for its values. For valid values see KnownNormalizers.

Some layer libraries will use this by default for you, taking the value from the context -- e.g: fnn.New.

But if not, one example use:

``` var flagNormalization = flag.String("norm", "none",

fmt.Sprintf("Type of layer normalization to use. Valid values: %q.",
	types.SortedKeys(layers.KnownNormalizers)))

...

func ModelGraph(...) {
    ...
    logits = MustNormalizeByName(ctx, *flagNormalization, logits)
    ...
}

```

func Normalize added in v0.4.0

func Normalize(x *Node, independentAxes ...int) *Node

Normalize shifts and scales the input such that the mean becomes zero and the variance one. It calculates `(x - mean(x)) / (sigma(x))`, where sigma is the standard deviation.

The parameter `independentAxes` list axes that should not be normalized together. A typical value is -1, the feature axis (last axis), so that each feature gets its own normalization.

func NormalizeFromContext added in v0.9.0

func NormalizeFromContext(ctx *context.Context, input *Node) *Node

NormalizeFromContext applies a normalization (or none) according to the hyperparameter ParamNormalization configured in the context.

This is not recommended for images, since one may want to normalize over specific axes.

func PieceWiseLinearCalibration

func PieceWiseLinearCalibration(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node

PieceWiseLinearCalibration creates a piece-wise linear function from the input, splitting it in the given keypoints with outputs initialized with values from 0 to 1.

The keypoints are typically quantiles of the input feature, starting with the minimum value and ending on the maximum. It must have rank-1 and be of the same DType as input. Its values must be ordered, and cannot be repeated (this may lead to NaNs). Consider using AssertQuantilesForPWLCalibrationValid on the quantiles.

If outputTrainable is set to true, the outputs mapped to the keypoints are made trainable, and may change to values outside the range [0, 1].

In any case, if the input is beyond the first or last keypoint, the output of the function will flatten, preventing any extrapolations (often they are bad in NN).

This is a simpler version to the one described here: https://www.tensorflow.org/lattice/api_docs/python/tfl/layers/PWLCalibration

func PieceWiseLinearCalibrationCascaded

func PieceWiseLinearCalibrationCascaded(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node

PieceWiseLinearCalibrationCascaded is a similar implementation for PieceWiseLinearCalibration that is equally powerful (express the same functions) simpler (fewer ops) and faster, but is parametrizing differently (cascaded linear functions), and may have different learning characteristics when doing gradient descent.

Types

type ConvBuilder

type ConvBuilder struct {
	// contains filtered or unexported fields
}

ConvBuilder is a helper to build a convolution computation. Create it with Convolution, set the desired parameters and when all is set, call Done.

func Convolution

func Convolution(ctx *context.Context, x *Node) *ConvBuilder

Convolution prepares one convolution on x with the given kernel for arbitrary number of spatial dimensions (1D, 2D, 3D, etc.).

It is very flexible and to ease setting its parameters it returns a ConvBuilder object for configuration. Once it is set up call `ConvBuilder.Done` and it will return the convolved x. Browse through ConvBuilder to see the capabilities, and the defaults.

Two parameters need setting: Filters (or channels) and KernelSize. It will fail if they are not set.

The shape of x should be `[batch, <spatial_dimensions...>, input_channels]` if configured with `ConvBuilder.ChannelsAxis(images.ChannelsLast)`, the default. If one sets `ConvBuilder.ChannelsAxis(images.ChannelsFirst)`, the shape should be `[batch, input_channels, <spatial_dimensions...>]` instead.

func (*ConvBuilder) ChannelsAxis added in v0.3.0

func (conv *ConvBuilder) ChannelsAxis(channelsAxisConfig images.ChannelsAxisConfig) *ConvBuilder

ChannelsAxis configures the axis for the channels (aka. "depth" or "features") dimension. The default is `images.ChannelsLast`, meaning the "channels" dimension comes last.

Note: `images` refers to package `github.com/gomlx/gomlx/types/tensor/image`.

It returns the modified Config object, so calls can be cascaded.

func (*ConvBuilder) CurrentScope added in v0.3.0

func (conv *ConvBuilder) CurrentScope() *ConvBuilder

CurrentScope configures the convolution not to create a sub-scope for the kernel weights it needs, and instead use the current one provided in Convolution.

By default, Convolution will create a sub-scope named "conv".

func (*ConvBuilder) DilationPerDim

func (conv *ConvBuilder) DilationPerDim(dilations ...int) *ConvBuilder

DilationPerDim sets the kernel dilations for each spatial dimension of the convolution. The default is 1 for every dimension.

Specifies the kernel up-sampling rate. In the literature, the same parameter is sometimes called input stride or dilation. The effective kernel size used for the convolution will be `kernel_shape + (kernel_shape - 1) * (dilation - 1)`, obtained by inserting (dilation-1) zeros between consecutive elements of the original filter in the spatial dimension.

One cannot use strides and dilation at the same time.

func (*ConvBuilder) Dilations

func (conv *ConvBuilder) Dilations(dilation int) *ConvBuilder

Dilations sets the dilations of the convolution. It sets the same value for every dimension. The default is 1.

It specifies the kernel up-sampling rate. In the literature, the same parameter is sometimes called input stride or dilation. The effective kernel size used for the convolution will be `kernel_shape + (kernel_shape - 1) * (dilation - 1)`, obtained by inserting (dilation-1) zeros between consecutive elements of the original filter in the spatial dimension.

One cannot use strides and dilation at the same time.

func (*ConvBuilder) Done

func (conv *ConvBuilder) Done() *Node

Done indicates that the Convolution layer is finished being configured. It then creates the convolution and it's kernels (variables) and returns the resulting Node.

func (*ConvBuilder) Filters

func (conv *ConvBuilder) Filters(filters int) *ConvBuilder

Filters sets the number of filters -- specifies the number of output channels. There is no default and this number must be set, before Done is called.

func (*ConvBuilder) KernelSize

func (conv *ConvBuilder) KernelSize(size int) *ConvBuilder

KernelSize sets the kernel size for every axis. There is no default and this number must be set, before Done is called.

You can also use KernelSizePerDim to set the kernel size per dimension (axis) individually.

func (*ConvBuilder) KernelSizePerDim

func (conv *ConvBuilder) KernelSizePerDim(sizes ...int) *ConvBuilder

KernelSizePerDim sets the kernel size for each dimension(axis). There is no default and this number must be set, before Done is called.

You can also use KernelSize to set the kernel size the same for all dimensions.

func (*ConvBuilder) NoPadding

func (conv *ConvBuilder) NoPadding() *ConvBuilder

NoPadding removes any paddings, so if the kernel spatial dimensions > 1, the output shape will be reduced on the edges.

This is the default.

func (*ConvBuilder) PadSame

func (conv *ConvBuilder) PadSame() *ConvBuilder

PadSame adds paddings on the edges of x such that in the end the output of the convolution has the same shape as the input (assuming strides=1).

The default is NoPadding.

func (*ConvBuilder) Regularizer added in v0.11.0

func (conv *ConvBuilder) Regularizer(regularizer regularizers.Regularizer) *ConvBuilder

Regularizer to be applied to the learned weights (but not the biases). Default is none.

To use more than one type of Regularizer, use regularizers.Combine, and set the returned combined regularizer here.

The default is regularizers.FromContext, which is configured by regularizers.ParamL1 and regularizers.ParamL2.

func (*ConvBuilder) StridePerDim

func (conv *ConvBuilder) StridePerDim(strides ...int) *ConvBuilder

StridePerDim sets the strides for each spatial dimension of the convolution. The default is 1 for every dimension.

The stride is how many steps to move after a convolution. A value of 2 will half the input size, since a convolution will be done at every other position, and so on. It can be defined separately per dimension.

One cannot use strides and dilation at the same time.

func (*ConvBuilder) Strides

func (conv *ConvBuilder) Strides(strides int) *ConvBuilder

Strides sets the strides of the convolution. It sets the same value for every dimension. The default is 1.

The stride is how many steps to move after a convolution. A value of 2 will half the input size, since a convolution will be done at every other position, and so on. It can be defined separately per dimension.

One cannot use strides and dilation at the same time.

func (*ConvBuilder) UseBias

func (conv *ConvBuilder) UseBias(useBias bool) *ConvBuilder

UseBias sets whether to add a trainable bias term to the convolution. Default is true.

type LayerNormBuilder

type LayerNormBuilder struct {
	// contains filtered or unexported fields
}

LayerNormBuilder is a helper to build a layer normalization computation. Create it with LayerNormalization, set the desired parameters and when all is set, call Done. See LayerNormalization for details.

func LayerNormalization

func LayerNormalization(ctx *context.Context, x *Node, normalizingAxes ...int) *LayerNormBuilder

LayerNormalization performs a layer normalization on the input. It includes a scaling and offset factor, and normalization over the feature entries.

This is an alternative to BatchNormalization, that doesn't suffer from the problem of variance on small batch sizes, nor does it need to keep a moving average of the normalization parameters. Commonly used with transformer layers (see MultiHeadAttention).

normalizingAxes are the axes over which to normalize: mean and variance are calculated over these axes and the values are then normalized. E.g: if your input is `[batch_size, features]` you should use `normalizingAxes=[1]` (same as -1) to normalize over the `features` axis; if your input is an image of shape [batch_size, height, width, channels] one common approach is to normalize over the image, so `normalizingAxes=[1 2]`, but not over the channels.

Notice the difference between BatchNormalization, that normalizes over the batch dimension, as opposed to the feature dimensions.

The layer norm may have a learned gain and offset, controlled by LayerNormBuilder.LearnedGain and LayerNormBuilder.LearnedOffset settings, enabled by default.

To ease setting its parameters it returns a LayerNormBuilder object for configuration. Once it is set up call `LayerNormBuilder.Done` and it will return the normalized x. Browse through LayerNormBuilder to check for its capabilities, and the defaults.

Layer normalization behaves the same during training and inference -- as opposed to batch normalization.

Based on paper "Layer Normalization" (Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton), https://arxiv.org/abs/1607.06450

FutureWork: support padding by not normalizing parts that weren't touched ...

func (*LayerNormBuilder) Done

func (builder *LayerNormBuilder) Done() *Node

Done finishes configuring the LayerNormalization and generates the graph computation to normalize the input.

func (*LayerNormBuilder) Epsilon

func (builder *LayerNormBuilder) Epsilon(value float64) *LayerNormBuilder

Epsilon is a small float added to variance to avoid dividing by zero. It defaults to the value given by ParamLayerNormEpsilon.

It is not used if ScaleNormalization is set to false.

func (*LayerNormBuilder) LearnedGain added in v0.9.0

func (builder *LayerNormBuilder) LearnedGain(value bool) *LayerNormBuilder

LearnedGain defines whether the layer normalization tries apply a multiplying gain the input, a tensor with the shape of the combined normalizing axes -- so it changes the direction of the inputs, it's not simply a gain.

Default is true.

func (*LayerNormBuilder) LearnedOffset

func (builder *LayerNormBuilder) LearnedOffset(value bool) *LayerNormBuilder

LearnedOffset defines whether the layer normalization tries to center the input by adding a learned offset. It defaults to true.

The offset will be learned separately for each axis that is not the batch (assumed to be axis 0 only) and not any of the normalizingAxes.

func (*LayerNormBuilder) Mask added in v0.9.0

func (builder *LayerNormBuilder) Mask(mask *Node) *LayerNormBuilder

Mask sets the mask for the input values. False values in the mask should be ignored for the normalization.

func (*LayerNormBuilder) ScaleNormalization

func (builder *LayerNormBuilder) ScaleNormalization(value bool) *LayerNormBuilder

ScaleNormalization defines whether the input's scale is normalized by the square root of the variance. The default is true, and this is the original paper specification, but in some cases it works best without it.

type MultiHeadAttentionBuilder

type MultiHeadAttentionBuilder struct {
	// contains filtered or unexported fields
}

MultiHeadAttentionBuilder is a helper to build a multi-head-attention computation. Create it with MultiHeadAttention, set the desired parameters and when all is set, call Done.

func MultiHeadAttention

func MultiHeadAttention(ctx *context.Context, query, key, value *Node, numHeads int, headDim int) *MultiHeadAttentionBuilder

MultiHeadAttention defines a multi-head attention layers, as described in the paper "Attention Is All You Need", https://arxiv.org/abs/1706.03762, by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin.

It takes query, key and value and project them numHead times, to a headDim sized embeddings. Then it uses the dot-product of query and key as weights, and returns a softmax sum of value, for each head.

Typical shapes:

- query: `[batch_size, <query_elements>, inputQueryDim]`. - key: `[batch_size, <num_key/value_elements>, inputKeyDim]`. - value: `[batch_size, <num_key/value_elements>, inputValueDim]`.

And, when calling IsNil, after another output projection, it returns a node of shape `[batch_size, <num_queries>, inputValueDim]`, if no other settings are given. See settings in MultiHeadAttentionBuilder.to control various aspects.

Notice it's common to use key=values, and even query=keys=values. For instance for encoding text, one may use the input sequence as all 3 (query, key and value).

The function returns a MultiHeadAttentionBuilder that can be further configured, and the resulting Node is returned when MultiHeadAttentionBuilder.Done is called. Alternatively one can call MultiHeadAttentionBuilder.DoneWithCoefficients, in which case it returns both the updated state and the attention coefficients.

func (*MultiHeadAttentionBuilder) Done

func (b *MultiHeadAttentionBuilder) Done() (output *Node)

Done or DoneWithCoefficients should be called after all optional settings are configured. It returns both the attention output and the attention coefficients (matrix) used.

`output` will be shaped `[batch_size, <query_elements>, output_dim]`, where `output_dim` can be configured by `SetOutputDim`.

func (*MultiHeadAttentionBuilder) DoneWithCoefficients

func (b *MultiHeadAttentionBuilder) DoneWithCoefficients() (attentionOutput, attentionCoefficients *Node)

DoneWithCoefficients or Done should be called after all optional settings are configured. It returns both the attention output and the attention coefficients (matrix) used.

`output` will be shaped `[batch_size, <query_elements>, output_dim]`, where `output_dim` can be configured by `SetOutputDim`.

`coefficients` is shaped `[batch_size, <query_elements>, <num_heads>, <key_elements>]` with the attention weights (from 0 to 1).

func (*MultiHeadAttentionBuilder) Dropout

Dropout defines how much dropout to use in the attention coefficients calculation. If set to 0 or lower, it's simply disabled. Default is 0.

func (*MultiHeadAttentionBuilder) SetKeyMask

func (b *MultiHeadAttentionBuilder) SetKeyMask(keyMask *Node) *MultiHeadAttentionBuilder

SetKeyMask sets a mask for keys that are actually valid and can be attended. Defaults to no mask, meaning all keys are accessible. See also SetQueryMask.

Shape should be `[batch_size, numHeads, <key_elements>]`, or `[batch_size, <key_elements>]` if the mask is the same for every head.

Either use SetKeyMask and SetQueryMask separately or use SetKeyQueryMatrixMask, but not both. Optionally, one can also UseCausalMask, which is combined (logical-and) to any given mask.

func (*MultiHeadAttentionBuilder) SetKeyQueryDim

func (b *MultiHeadAttentionBuilder) SetKeyQueryDim(keyQueryDim int) *MultiHeadAttentionBuilder

SetKeyQueryDim allows finer configuration on the dimension of the projection used for the query/key pairs for each head. It defaults to the value given by `headDim`.

func (*MultiHeadAttentionBuilder) SetOutputDim

func (b *MultiHeadAttentionBuilder) SetOutputDim(outputDim int) *MultiHeadAttentionBuilder

SetOutputDim defines the output dimension of the final projection, from the flattened attention heads. It defaults to the value of the last dimension of `values` passed as input (`inputValueDim`).

func (*MultiHeadAttentionBuilder) SetQueryKeyMatrixMask

func (b *MultiHeadAttentionBuilder) SetQueryKeyMatrixMask(queryKeyMatrixMask *Node) *MultiHeadAttentionBuilder

SetQueryKeyMatrixMask sets a mask matrix that defines which queries can attend to which keys. Defaults to no mask, meaning all queries are accessible.

Shape should be `[batch_size, numHeads, <query_elements>, <key_elements>]`, or `[batch_size, <query_elements>, <key_elements>]` if the mask is the same for every head.

Either use SetKeyMask and SetQueryMask separately or use SetKeyQueryMatrixMask, but not both. Optionally, one can also UseCausalMask, which is combined (logical-and) to any given mask.

func (*MultiHeadAttentionBuilder) SetQueryMask

func (b *MultiHeadAttentionBuilder) SetQueryMask(queryMask *Node) *MultiHeadAttentionBuilder

SetQueryMask sets a mask for queries that are actually valid and should be used. Defaults to no mask, meaning all queries are accessible. See also SetKeyMask.

Shape should be `[batch_size, numHeads, <query_elements>]`, or `[batch_size, <query_elements>]` if the mask is the same for every head.

Either use SetKeyMask and SetQueryMask separately or use SetKeyQueryMatrixMask, but not both. Optionally, one can also UseCausalMask, which is combined (logical-and) to any given mask.

func (*MultiHeadAttentionBuilder) SetValueHeadDim

func (b *MultiHeadAttentionBuilder) SetValueHeadDim(valueDim int) *MultiHeadAttentionBuilder

SetValueHeadDim allows finer configuration on the dimension of the projection used for the value for each head. It defaults to the value given by `headDim`.

func (*MultiHeadAttentionBuilder) UseCausalMask

UseCausalMask adds a mask where a query can only attend to keys with lower indices than itself. It assumes that query and key are either the same or have the same inner shape, and there is only one inner rank -- so key/query should have rank-3 shape `[batch, inner_dim, key/query_dim]`.

This mask can be used in combination (logical-and) with other masks.

func (*MultiHeadAttentionBuilder) UseProjectionBias

func (b *MultiHeadAttentionBuilder) UseProjectionBias(useProjectionBias bool) *MultiHeadAttentionBuilder

UseProjectionBias defines whether to use a bias term on the final output projection. Default is true.

Directories

Path Synopsis
Package activations implements several common activations, and includes a generic Apply method to apply an activation by its type.
Package activations implements several common activations, and includes a generic Apply method to apply an activation by its type.
Package batchnorm implements a batch normalization layer, and associated tools.
Package batchnorm implements a batch normalization layer, and associated tools.
Package bsplines provide a GoMLX version of github.com/gomlx/bsplines: it provides evaluation of bsplines curves, that can be used as layers.
Package bsplines provide a GoMLX version of github.com/gomlx/bsplines: it provides evaluation of bsplines curves, that can be used as layers.
Package fnn implements a generic FNN (Feedforward Neural Network) with various configurations.
Package fnn implements a generic FNN (Feedforward Neural Network) with various configurations.
Package kan implements a generic Kolmogorov–Arnold Networks, as described in https://arxiv.org/pdf/2404.19756
Package kan implements a generic Kolmogorov–Arnold Networks, as described in https://arxiv.org/pdf/2404.19756
Package rational implements "learnable rational functions".
Package rational implements "learnable rational functions".
Package regularizers adds tools to facilitate add regularization to the weights learned.
Package regularizers adds tools to facilitate add regularization to the weights learned.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL