Documentation
¶
Overview ¶
Package layers holds a collection of common modeling layers. It includes dense layer, convolutions (TODO), activation functions, dropout (TODO), etc.
A small convention on naming: typically layers are nouns (like "Convolution", "Dense" (layer), "MultiHeadAttention"), while computations are usually verbs ("Convolve", "Reduce..", "Multiply (Mul)", etc.).
Index ¶
- Constants
- Variables
- func AddL2Regularization(ctx *context.Context, amount *Node, values ...*Node)deprecated
- func AddL2RegularizationStatic(ctx *context.Context, amount float64, values ...*Node)deprecated
- func AssertQuantilesForPWLCalibrationValid[T constraints.Ordered](values []T)
- func Dense(ctx *context.Context, input *Node, useBias bool, outputDimensions ...int) *Node
- func DenseWithBias(ctx *context.Context, input *Node, outputDimensions ...int) *Node
- func Dropout(ctx *context.Context, input *Node, dropoutRate *Node) *Node
- func DropoutFromContext(ctx *context.Context, x *Node) *Node
- func DropoutNormalize(ctx *context.Context, input *Node, dropoutRate *Node, normalize bool) *Node
- func DropoutStatic(ctx *context.Context, input *Node, dropoutRate float64) *Node
- func Embedding(ctx *context.Context, input *Node, dtype dtypes.DType, ...) *Node
- func MaskedNormalizeFromContext(ctx *context.Context, input, mask *Node) *Node
- func MustNormalizeByName(ctx *context.Context, normalization string, input *Node) *Node
- func Normalize(x *Node, independentAxes ...int) *Node
- func NormalizeFromContext(ctx *context.Context, input *Node) *Node
- func PieceWiseLinearCalibration(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node
- func PieceWiseLinearCalibrationCascaded(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node
- type ConvBuilder
- func (conv *ConvBuilder) ChannelsAxis(channelsAxisConfig images.ChannelsAxisConfig) *ConvBuilder
- func (conv *ConvBuilder) CurrentScope() *ConvBuilder
- func (conv *ConvBuilder) DilationPerDim(dilations ...int) *ConvBuilder
- func (conv *ConvBuilder) Dilations(dilation int) *ConvBuilder
- func (conv *ConvBuilder) Done() *Node
- func (conv *ConvBuilder) Filters(filters int) *ConvBuilder
- func (conv *ConvBuilder) KernelSize(size int) *ConvBuilder
- func (conv *ConvBuilder) KernelSizePerDim(sizes ...int) *ConvBuilder
- func (conv *ConvBuilder) NoPadding() *ConvBuilder
- func (conv *ConvBuilder) PadSame() *ConvBuilder
- func (conv *ConvBuilder) Regularizer(regularizer regularizers.Regularizer) *ConvBuilder
- func (conv *ConvBuilder) StridePerDim(strides ...int) *ConvBuilder
- func (conv *ConvBuilder) Strides(strides int) *ConvBuilder
- func (conv *ConvBuilder) UseBias(useBias bool) *ConvBuilder
- type LayerNormBuilder
- func (builder *LayerNormBuilder) Done() *Node
- func (builder *LayerNormBuilder) Epsilon(value float64) *LayerNormBuilder
- func (builder *LayerNormBuilder) LearnedGain(value bool) *LayerNormBuilder
- func (builder *LayerNormBuilder) LearnedOffset(value bool) *LayerNormBuilder
- func (builder *LayerNormBuilder) Mask(mask *Node) *LayerNormBuilder
- func (builder *LayerNormBuilder) ScaleNormalization(value bool) *LayerNormBuilder
- type MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) Done() (output *Node)
- func (b *MultiHeadAttentionBuilder) DoneWithCoefficients() (attentionOutput, attentionCoefficients *Node)
- func (b *MultiHeadAttentionBuilder) Dropout(rate float64) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetKeyMask(keyMask *Node) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetKeyQueryDim(keyQueryDim int) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetOutputDim(outputDim int) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetQueryKeyMatrixMask(queryKeyMatrixMask *Node) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetQueryMask(queryMask *Node) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetValueHeadDim(valueDim int) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) UseCausalMask() *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) UseProjectionBias(useProjectionBias bool) *MultiHeadAttentionBuilder
Constants ¶
const ( // ParamL2Regularization context hyperparameter defines the L2 regularization of kernels. // Each layer may decide independently to implement it or not. // // This is an alias to regularizers.ParamL2 // Dense, DenseWithBias, FNN, kan and Convolution kernels look at this hyperparameter. // The value should be a float64. // The default is `0.0`. // // Deprecated: use regularizers.ParamL2 ParamL2Regularization = "l2_regularization" // ParamDropoutRate context hyperparameter defines the amount of dropout applied when DropoutFromContext is used. // Should be a value from `0.0` to `1.0`, where 0 means no dropout, and 1 would drop everything out. // // It is only applied if `Context.IsTraining() == true`, that is, during evaluation/inference it is // ignored. // // The default is `0.0`, which means no dropout. ParamDropoutRate = "dropout_rate" )
Variables ¶
var ( // ParamLayerNormEpsilon is the context parameter that defines the default layer normalization epsilon value. // The default is 1e-3. ParamLayerNormEpsilon = "layer_norm_epsilon" // ParamLayerNormCenter is the context parameter that defines whether to center the norm by default. // The default is true. ParamLayerNormCenter = "layer_norm_center" // ParamLayerNormLearnedGain is the context parameter that defines whether to learn a gain for the // layer norm, that multiplies its output. // The default is true. ParamLayerNormLearnedGain = "layer_norm_learned_gain" // ParamLayerNormLearnedScale is an alias to ParamLayerNormLearnedGain. // Deprecated: renamed to follow original papers nomenclature. ParamLayerNormLearnedScale = ParamLayerNormLearnedGain // ParamLayerNormRescale is the context parameter that defines whether to rescale the layer // by dividing it by the square root of the variance. // The default is true. ParamLayerNormRescale = "layer_norm_rescale" // ParamLayerNormL2Regularization is the context parameter that defines the amount of L2 regularization // to apply to the learned gain, if one is defined. // The default is 0.0. ParamLayerNormL2Regularization = "layer_norm_l2_regularization" )
var ( // KnownNormalizers is a map of normalizer string to a function that applies them // with the default values, with the feature axis set to -1. This will only work // for the most standard problems, since anything with a different shape will need // special feature axes configuration for each normalization technique. // // It includes "none", which is a no-op. // // Notice that some normalizers use variables, and they need to be unique // in their scope (`Context.In(scope)`) -- except if one wants to deliberately share // normalization variables across more than one application. KnownNormalizers = map[string]func(ctx *context.Context, input *Node) *Node{ "batch": func(ctx *context.Context, input *Node) *Node { return batchnorm.New(ctx, input, -1).Done() }, "layer": func(ctx *context.Context, input *Node) *Node { return LayerNormalization(ctx, input, -1).Done() }, "none": func(ctx *context.Context, input *Node) *Node { return input }, } // ParamNormalization context hyperparameter defines the type of normalization to use // between layers of a neural network. // // It is used if the model calls NormalizeFromContext or MaskedNormalizeFromContext on the embeddings in // between layers. // This is usually applied after a residual sum (but model choices varies). // // Valid values are "layer" for [LayerNormalization], "batch" for [batchnorm.New] or "none"". // // Notice that this won't work for special shapes setups. // [New] will normalize on the batch axis (assumed to be axis-0), and // [LayerNormalization] will normalize across the layer values, assumed to be the last. // // The default is `layer`. ParamNormalization = "normalization" )
Functions ¶
func AddL2Regularization
deprecated
AddL2Regularization calculates the L2 of the given values (typically variable nodes returned by context.Variable.ValueGraph()), scale by the given amount (typically a constant) and then train.AddLoss the resulting value, having the effect of regularizing the weights (variables).
Deprecated: use package regularizers instead.
func AddL2RegularizationStatic
deprecated
added in
v0.9.0
func AssertQuantilesForPWLCalibrationValid ¶ added in v0.5.0
func AssertQuantilesForPWLCalibrationValid[T constraints.Ordered](values []T)
AssertQuantilesForPWLCalibrationValid validates that raw values for quantiles are ok to be used for PieceWiseLinearCalibration. It checks for:
- Enough data points.
- Monotonicity of data points: quantiles should always be increasing.
Errors are reported back with `panic`.
func Dense ¶
Dense adds a single dense linear layer, a learnable linear transformation. Optionally, it can include a bias term.
It automatically adds regularization to the weights (not to biases) configured in hyperparameters -- see regularizers.FromContext.
It the input has shape `[<batch dimensions...>, featureDimension]`, the output will have shape `[<batch dimensions...>, <outputDimensions...>]`.
See also FNN for a more configurable (including hidden layers) version.
func DenseWithBias ¶
DenseWithBias adds a single dense linear layer, a learnable linear transformation plus a bias term.
It the input has shape `[<batch dimensions...>, featureDimension]`, the output will have shape `[<batch dimensions...>, <outputDimensions...>]`.
See also FNN for a more configurable (including hidden layers) version.
func Dropout ¶
Dropout randomly replace the input with zeros if ctx.IsTraining() is true. Otherwise, it's a no op (it returns input). If the input is float, it scales the output by 1/(1-dropoutRate) to preserve the mean of the values of the input.
func DropoutFromContext ¶ added in v0.9.0
DropoutFromContext applies a dropout configured in the context parameters keyed by ParamDropoutRate.
If it is 0.0 this is a no-op. If `Context.IsTraining() == false` this is also a no-op, so it doesn't impact evaluation or inference.
func DropoutNormalize ¶
DropoutNormalize randomly replace the input with zeros if ctx.IsTraining() is true. Otherwise, it's a no op (it returns input). If normalize is set, it scales the output by 1/(1-dropoutRate) to preserve the mean of the input values.
func DropoutStatic ¶ added in v0.9.0
DropoutStatic is the same as Dropout, but it takes the `dropoutRate` as a static value, given as a float64. If `dropoutRate <= 0` or it's not training, this is a no-op.
func Embedding ¶
func Embedding(ctx *context.Context, input *Node, dtype dtypes.DType, vocabSize, dimension int) *Node
Embedding creates an embedding table with vocabSize elements (typically a vocabulary size) each of dimension values -- so a [vocabSize, dimension] variable table.
It then converts each integer value of the input to an embedding of the given dimension size. The input must have an integer dtype, and the last dimension must be of size 1. If it's not of size one, an extra dimension is added to the end. All values of the input must smaller than vocabSize, otherwise it will fail -- no checking is explicitly made.
The output has rank one larger than the input, with the last dimension the same as the embedding dimension.
func MaskedNormalizeFromContext ¶ added in v0.9.0
MaskedNormalizeFromContext applies a normalization (or none) according to the hyperparameter ParamNormalization configured in the context. The `mask` is actually optional, and can be set to nil if not using a mask.
This is not recommended for images, since one may want to normalize over specific axes.
func MustNormalizeByName ¶
MustNormalizeByName applies the requested normalization using default parameters. If an invalid normalization is given, it panics with an error.
This will only work for the most standard problems, since anything with a different shape will need special feature axes configuration for each normalization technique.
It's a simple wrapper around KnownNormalizers, if one wants to handle errors, just check for its values. For valid values see KnownNormalizers.
Some layer libraries will use this by default for you, taking the value from the context -- e.g: fnn.New.
But if not, one example use:
``` var flagNormalization = flag.String("norm", "none",
fmt.Sprintf("Type of layer normalization to use. Valid values: %q.",
types.SortedKeys(layers.KnownNormalizers)))
...
func ModelGraph(...) {
...
logits = MustNormalizeByName(ctx, *flagNormalization, logits)
...
}
```
func Normalize ¶ added in v0.4.0
func Normalize(x *Node, independentAxes ...int) *Node
Normalize shifts and scales the input such that the mean becomes zero and the variance one. It calculates `(x - mean(x)) / (sigma(x))`, where sigma is the standard deviation.
The parameter `independentAxes` list axes that should not be normalized together. A typical value is -1, the feature axis (last axis), so that each feature gets its own normalization.
func NormalizeFromContext ¶ added in v0.9.0
NormalizeFromContext applies a normalization (or none) according to the hyperparameter ParamNormalization configured in the context.
This is not recommended for images, since one may want to normalize over specific axes.
func PieceWiseLinearCalibration ¶
func PieceWiseLinearCalibration(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node
PieceWiseLinearCalibration creates a piece-wise linear function from the input, splitting it in the given keypoints with outputs initialized with values from 0 to 1.
The keypoints are typically quantiles of the input feature, starting with the minimum value and ending on the maximum. It must have rank-1 and be of the same DType as input. Its values must be ordered, and cannot be repeated (this may lead to NaNs). Consider using AssertQuantilesForPWLCalibrationValid on the quantiles.
If outputTrainable is set to true, the outputs mapped to the keypoints are made trainable, and may change to values outside the range [0, 1].
In any case, if the input is beyond the first or last keypoint, the output of the function will flatten, preventing any extrapolations (often they are bad in NN).
This is a simpler version to the one described here: https://www.tensorflow.org/lattice/api_docs/python/tfl/layers/PWLCalibration
func PieceWiseLinearCalibrationCascaded ¶
func PieceWiseLinearCalibrationCascaded(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node
PieceWiseLinearCalibrationCascaded is a similar implementation for PieceWiseLinearCalibration that is equally powerful (express the same functions) simpler (fewer ops) and faster, but is parametrizing differently (cascaded linear functions), and may have different learning characteristics when doing gradient descent.
Types ¶
type ConvBuilder ¶
type ConvBuilder struct {
// contains filtered or unexported fields
}
ConvBuilder is a helper to build a convolution computation. Create it with Convolution, set the desired parameters and when all is set, call Done.
func Convolution ¶
func Convolution(ctx *context.Context, x *Node) *ConvBuilder
Convolution prepares one convolution on x with the given kernel for arbitrary number of spatial dimensions (1D, 2D, 3D, etc.).
It is very flexible and to ease setting its parameters it returns a ConvBuilder object for configuration. Once it is set up call `ConvBuilder.Done` and it will return the convolved x. Browse through ConvBuilder to see the capabilities, and the defaults.
Two parameters need setting: Filters (or channels) and KernelSize. It will fail if they are not set.
The shape of x should be `[batch, <spatial_dimensions...>, input_channels]` if configured with `ConvBuilder.ChannelsAxis(images.ChannelsLast)`, the default. If one sets `ConvBuilder.ChannelsAxis(images.ChannelsFirst)`, the shape should be `[batch, input_channels, <spatial_dimensions...>]` instead.
func (*ConvBuilder) ChannelsAxis ¶ added in v0.3.0
func (conv *ConvBuilder) ChannelsAxis(channelsAxisConfig images.ChannelsAxisConfig) *ConvBuilder
ChannelsAxis configures the axis for the channels (aka. "depth" or "features") dimension. The default is `images.ChannelsLast`, meaning the "channels" dimension comes last.
Note: `images` refers to package `github.com/gomlx/gomlx/types/tensor/image`.
It returns the modified Config object, so calls can be cascaded.
func (*ConvBuilder) CurrentScope ¶ added in v0.3.0
func (conv *ConvBuilder) CurrentScope() *ConvBuilder
CurrentScope configures the convolution not to create a sub-scope for the kernel weights it needs, and instead use the current one provided in Convolution.
By default, Convolution will create a sub-scope named "conv".
func (*ConvBuilder) DilationPerDim ¶
func (conv *ConvBuilder) DilationPerDim(dilations ...int) *ConvBuilder
DilationPerDim sets the kernel dilations for each spatial dimension of the convolution. The default is 1 for every dimension.
Specifies the kernel up-sampling rate. In the literature, the same parameter is sometimes called input stride or dilation. The effective kernel size used for the convolution will be `kernel_shape + (kernel_shape - 1) * (dilation - 1)`, obtained by inserting (dilation-1) zeros between consecutive elements of the original filter in the spatial dimension.
One cannot use strides and dilation at the same time.
func (*ConvBuilder) Dilations ¶
func (conv *ConvBuilder) Dilations(dilation int) *ConvBuilder
Dilations sets the dilations of the convolution. It sets the same value for every dimension. The default is 1.
It specifies the kernel up-sampling rate. In the literature, the same parameter is sometimes called input stride or dilation. The effective kernel size used for the convolution will be `kernel_shape + (kernel_shape - 1) * (dilation - 1)`, obtained by inserting (dilation-1) zeros between consecutive elements of the original filter in the spatial dimension.
One cannot use strides and dilation at the same time.
func (*ConvBuilder) Done ¶
func (conv *ConvBuilder) Done() *Node
Done indicates that the Convolution layer is finished being configured. It then creates the convolution and it's kernels (variables) and returns the resulting Node.
func (*ConvBuilder) Filters ¶
func (conv *ConvBuilder) Filters(filters int) *ConvBuilder
Filters sets the number of filters -- specifies the number of output channels. There is no default and this number must be set, before Done is called.
func (*ConvBuilder) KernelSize ¶
func (conv *ConvBuilder) KernelSize(size int) *ConvBuilder
KernelSize sets the kernel size for every axis. There is no default and this number must be set, before Done is called.
You can also use KernelSizePerDim to set the kernel size per dimension (axis) individually.
func (*ConvBuilder) KernelSizePerDim ¶
func (conv *ConvBuilder) KernelSizePerDim(sizes ...int) *ConvBuilder
KernelSizePerDim sets the kernel size for each dimension(axis). There is no default and this number must be set, before Done is called.
You can also use KernelSize to set the kernel size the same for all dimensions.
func (*ConvBuilder) NoPadding ¶
func (conv *ConvBuilder) NoPadding() *ConvBuilder
NoPadding removes any paddings, so if the kernel spatial dimensions > 1, the output shape will be reduced on the edges.
This is the default.
func (*ConvBuilder) PadSame ¶
func (conv *ConvBuilder) PadSame() *ConvBuilder
PadSame adds paddings on the edges of x such that in the end the output of the convolution has the same shape as the input (assuming strides=1).
The default is NoPadding.
func (*ConvBuilder) Regularizer ¶ added in v0.11.0
func (conv *ConvBuilder) Regularizer(regularizer regularizers.Regularizer) *ConvBuilder
Regularizer to be applied to the learned weights (but not the biases). Default is none.
To use more than one type of Regularizer, use regularizers.Combine, and set the returned combined regularizer here.
The default is regularizers.FromContext, which is configured by regularizers.ParamL1 and regularizers.ParamL2.
func (*ConvBuilder) StridePerDim ¶
func (conv *ConvBuilder) StridePerDim(strides ...int) *ConvBuilder
StridePerDim sets the strides for each spatial dimension of the convolution. The default is 1 for every dimension.
The stride is how many steps to move after a convolution. A value of 2 will half the input size, since a convolution will be done at every other position, and so on. It can be defined separately per dimension.
One cannot use strides and dilation at the same time.
func (*ConvBuilder) Strides ¶
func (conv *ConvBuilder) Strides(strides int) *ConvBuilder
Strides sets the strides of the convolution. It sets the same value for every dimension. The default is 1.
The stride is how many steps to move after a convolution. A value of 2 will half the input size, since a convolution will be done at every other position, and so on. It can be defined separately per dimension.
One cannot use strides and dilation at the same time.
func (*ConvBuilder) UseBias ¶
func (conv *ConvBuilder) UseBias(useBias bool) *ConvBuilder
UseBias sets whether to add a trainable bias term to the convolution. Default is true.
type LayerNormBuilder ¶
type LayerNormBuilder struct {
// contains filtered or unexported fields
}
LayerNormBuilder is a helper to build a layer normalization computation. Create it with LayerNormalization, set the desired parameters and when all is set, call Done. See LayerNormalization for details.
func LayerNormalization ¶
func LayerNormalization(ctx *context.Context, x *Node, normalizingAxes ...int) *LayerNormBuilder
LayerNormalization performs a layer normalization on the input. It includes a scaling and offset factor, and normalization over the feature entries.
This is an alternative to BatchNormalization, that doesn't suffer from the problem of variance on small batch sizes, nor does it need to keep a moving average of the normalization parameters. Commonly used with transformer layers (see MultiHeadAttention).
normalizingAxes are the axes over which to normalize: mean and variance are calculated over these axes and the values are then normalized. E.g: if your input is `[batch_size, features]` you should use `normalizingAxes=[1]` (same as -1) to normalize over the `features` axis; if your input is an image of shape [batch_size, height, width, channels] one common approach is to normalize over the image, so `normalizingAxes=[1 2]`, but not over the channels.
Notice the difference between BatchNormalization, that normalizes over the batch dimension, as opposed to the feature dimensions.
The layer norm may have a learned gain and offset, controlled by LayerNormBuilder.LearnedGain and LayerNormBuilder.LearnedOffset settings, enabled by default.
To ease setting its parameters it returns a LayerNormBuilder object for configuration. Once it is set up call `LayerNormBuilder.Done` and it will return the normalized x. Browse through LayerNormBuilder to check for its capabilities, and the defaults.
Layer normalization behaves the same during training and inference -- as opposed to batch normalization.
Based on paper "Layer Normalization" (Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton), https://arxiv.org/abs/1607.06450
FutureWork: support padding by not normalizing parts that weren't touched ...
func (*LayerNormBuilder) Done ¶
func (builder *LayerNormBuilder) Done() *Node
Done finishes configuring the LayerNormalization and generates the graph computation to normalize the input.
func (*LayerNormBuilder) Epsilon ¶
func (builder *LayerNormBuilder) Epsilon(value float64) *LayerNormBuilder
Epsilon is a small float added to variance to avoid dividing by zero. It defaults to the value given by ParamLayerNormEpsilon.
It is not used if ScaleNormalization is set to false.
func (*LayerNormBuilder) LearnedGain ¶ added in v0.9.0
func (builder *LayerNormBuilder) LearnedGain(value bool) *LayerNormBuilder
LearnedGain defines whether the layer normalization tries apply a multiplying gain the input, a tensor with the shape of the combined normalizing axes -- so it changes the direction of the inputs, it's not simply a gain.
Default is true.
func (*LayerNormBuilder) LearnedOffset ¶
func (builder *LayerNormBuilder) LearnedOffset(value bool) *LayerNormBuilder
LearnedOffset defines whether the layer normalization tries to center the input by adding a learned offset. It defaults to true.
The offset will be learned separately for each axis that is not the batch (assumed to be axis 0 only) and not any of the normalizingAxes.
func (*LayerNormBuilder) Mask ¶ added in v0.9.0
func (builder *LayerNormBuilder) Mask(mask *Node) *LayerNormBuilder
Mask sets the mask for the input values. False values in the mask should be ignored for the normalization.
func (*LayerNormBuilder) ScaleNormalization ¶
func (builder *LayerNormBuilder) ScaleNormalization(value bool) *LayerNormBuilder
ScaleNormalization defines whether the input's scale is normalized by the square root of the variance. The default is true, and this is the original paper specification, but in some cases it works best without it.
type MultiHeadAttentionBuilder ¶
type MultiHeadAttentionBuilder struct {
// contains filtered or unexported fields
}
MultiHeadAttentionBuilder is a helper to build a multi-head-attention computation. Create it with MultiHeadAttention, set the desired parameters and when all is set, call Done.
func MultiHeadAttention ¶
func MultiHeadAttention(ctx *context.Context, query, key, value *Node, numHeads int, headDim int) *MultiHeadAttentionBuilder
MultiHeadAttention defines a multi-head attention layers, as described in the paper "Attention Is All You Need", https://arxiv.org/abs/1706.03762, by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin.
It takes query, key and value and project them numHead times, to a headDim sized embeddings. Then it uses the dot-product of query and key as weights, and returns a softmax sum of value, for each head.
Typical shapes:
- query: `[batch_size, <query_elements>, inputQueryDim]`. - key: `[batch_size, <num_key/value_elements>, inputKeyDim]`. - value: `[batch_size, <num_key/value_elements>, inputValueDim]`.
And, when calling IsNil, after another output projection, it returns a node of shape `[batch_size, <num_queries>, inputValueDim]`, if no other settings are given. See settings in MultiHeadAttentionBuilder.to control various aspects.
Notice it's common to use key=values, and even query=keys=values. For instance for encoding text, one may use the input sequence as all 3 (query, key and value).
The function returns a MultiHeadAttentionBuilder that can be further configured, and the resulting Node is returned when MultiHeadAttentionBuilder.Done is called. Alternatively one can call MultiHeadAttentionBuilder.DoneWithCoefficients, in which case it returns both the updated state and the attention coefficients.
func (*MultiHeadAttentionBuilder) Done ¶
func (b *MultiHeadAttentionBuilder) Done() (output *Node)
Done or DoneWithCoefficients should be called after all optional settings are configured. It returns both the attention output and the attention coefficients (matrix) used.
`output` will be shaped `[batch_size, <query_elements>, output_dim]`, where `output_dim` can be configured by `SetOutputDim`.
func (*MultiHeadAttentionBuilder) DoneWithCoefficients ¶
func (b *MultiHeadAttentionBuilder) DoneWithCoefficients() (attentionOutput, attentionCoefficients *Node)
DoneWithCoefficients or Done should be called after all optional settings are configured. It returns both the attention output and the attention coefficients (matrix) used.
`output` will be shaped `[batch_size, <query_elements>, output_dim]`, where `output_dim` can be configured by `SetOutputDim`.
`coefficients` is shaped `[batch_size, <query_elements>, <num_heads>, <key_elements>]` with the attention weights (from 0 to 1).
func (*MultiHeadAttentionBuilder) Dropout ¶
func (b *MultiHeadAttentionBuilder) Dropout(rate float64) *MultiHeadAttentionBuilder
Dropout defines how much dropout to use in the attention coefficients calculation. If set to 0 or lower, it's simply disabled. Default is 0.
func (*MultiHeadAttentionBuilder) SetKeyMask ¶
func (b *MultiHeadAttentionBuilder) SetKeyMask(keyMask *Node) *MultiHeadAttentionBuilder
SetKeyMask sets a mask for keys that are actually valid and can be attended. Defaults to no mask, meaning all keys are accessible. See also SetQueryMask.
Shape should be `[batch_size, numHeads, <key_elements>]`, or `[batch_size, <key_elements>]` if the mask is the same for every head.
Either use SetKeyMask and SetQueryMask separately or use SetKeyQueryMatrixMask, but not both. Optionally, one can also UseCausalMask, which is combined (logical-and) to any given mask.
func (*MultiHeadAttentionBuilder) SetKeyQueryDim ¶
func (b *MultiHeadAttentionBuilder) SetKeyQueryDim(keyQueryDim int) *MultiHeadAttentionBuilder
SetKeyQueryDim allows finer configuration on the dimension of the projection used for the query/key pairs for each head. It defaults to the value given by `headDim`.
func (*MultiHeadAttentionBuilder) SetOutputDim ¶
func (b *MultiHeadAttentionBuilder) SetOutputDim(outputDim int) *MultiHeadAttentionBuilder
SetOutputDim defines the output dimension of the final projection, from the flattened attention heads. It defaults to the value of the last dimension of `values` passed as input (`inputValueDim`).
func (*MultiHeadAttentionBuilder) SetQueryKeyMatrixMask ¶
func (b *MultiHeadAttentionBuilder) SetQueryKeyMatrixMask(queryKeyMatrixMask *Node) *MultiHeadAttentionBuilder
SetQueryKeyMatrixMask sets a mask matrix that defines which queries can attend to which keys. Defaults to no mask, meaning all queries are accessible.
Shape should be `[batch_size, numHeads, <query_elements>, <key_elements>]`, or `[batch_size, <query_elements>, <key_elements>]` if the mask is the same for every head.
Either use SetKeyMask and SetQueryMask separately or use SetKeyQueryMatrixMask, but not both. Optionally, one can also UseCausalMask, which is combined (logical-and) to any given mask.
func (*MultiHeadAttentionBuilder) SetQueryMask ¶
func (b *MultiHeadAttentionBuilder) SetQueryMask(queryMask *Node) *MultiHeadAttentionBuilder
SetQueryMask sets a mask for queries that are actually valid and should be used. Defaults to no mask, meaning all queries are accessible. See also SetKeyMask.
Shape should be `[batch_size, numHeads, <query_elements>]`, or `[batch_size, <query_elements>]` if the mask is the same for every head.
Either use SetKeyMask and SetQueryMask separately or use SetKeyQueryMatrixMask, but not both. Optionally, one can also UseCausalMask, which is combined (logical-and) to any given mask.
func (*MultiHeadAttentionBuilder) SetValueHeadDim ¶
func (b *MultiHeadAttentionBuilder) SetValueHeadDim(valueDim int) *MultiHeadAttentionBuilder
SetValueHeadDim allows finer configuration on the dimension of the projection used for the value for each head. It defaults to the value given by `headDim`.
func (*MultiHeadAttentionBuilder) UseCausalMask ¶
func (b *MultiHeadAttentionBuilder) UseCausalMask() *MultiHeadAttentionBuilder
UseCausalMask adds a mask where a query can only attend to keys with lower indices than itself. It assumes that query and key are either the same or have the same inner shape, and there is only one inner rank -- so key/query should have rank-3 shape `[batch, inner_dim, key/query_dim]`.
This mask can be used in combination (logical-and) with other masks.
func (*MultiHeadAttentionBuilder) UseProjectionBias ¶
func (b *MultiHeadAttentionBuilder) UseProjectionBias(useProjectionBias bool) *MultiHeadAttentionBuilder
UseProjectionBias defines whether to use a bias term on the final output projection. Default is true.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package activations implements several common activations, and includes a generic Apply method to apply an activation by its type.
|
Package activations implements several common activations, and includes a generic Apply method to apply an activation by its type. |
|
Package batchnorm implements a batch normalization layer, and associated tools.
|
Package batchnorm implements a batch normalization layer, and associated tools. |
|
Package bsplines provide a GoMLX version of github.com/gomlx/bsplines: it provides evaluation of bsplines curves, that can be used as layers.
|
Package bsplines provide a GoMLX version of github.com/gomlx/bsplines: it provides evaluation of bsplines curves, that can be used as layers. |
|
Package fnn implements a generic FNN (Feedforward Neural Network) with various configurations.
|
Package fnn implements a generic FNN (Feedforward Neural Network) with various configurations. |
|
Package kan implements a generic Kolmogorov–Arnold Networks, as described in https://arxiv.org/pdf/2404.19756
|
Package kan implements a generic Kolmogorov–Arnold Networks, as described in https://arxiv.org/pdf/2404.19756 |
|
Package rational implements "learnable rational functions".
|
Package rational implements "learnable rational functions". |
|
Package regularizers adds tools to facilitate add regularization to the weights learned.
|
Package regularizers adds tools to facilitate add regularization to the weights learned. |