layers

package

v0.1.0 Latest Latest Go to latest Published: Apr 28, 2023 License: Apache-2.0 Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/gomlx/gomlx

Links

Open Source Insights

Documentation ¶

Overview ¶

Package layers holds a collection of common modeling layers. It includes dense layer, convolutions (TODO), activation functions, dropout (TODO), etc.

A small convention on naming: typically layers are nouns (like "Convolution", "Dense" (layer), "MultiHeadAttention"), while computations are usually verbs ("Convolve", "Reduce..", "Multiply (Mul)", etc.).

Index ¶

Constants
Variables
func AddL2Regularization(ctx *context.Context, amount *Node, values ...*Node)
func Dense(ctx *context.Context, input *Node, useBias bool, outputDimensions ...int) *Node
func DenseWithBias(ctx *context.Context, input *Node, outputDimensions ...int) *Node
func Dropout(ctx *context.Context, input *Node, dropoutRate *Node) *Node
func DropoutNormalize(ctx *context.Context, input *Node, dropoutRate *Node, normalize bool) *Node
func Embedding(ctx *context.Context, input *Node, dtype shapes.DType, ...) *Node
func MustNormalizeByName(ctx *context.Context, normalization string, input *Node) *Node
func PieceWiseLinearCalibration(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node
func PieceWiseLinearCalibrationCascaded(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node
func Relu(x *Node) *Node
func ValidateQuantilesForPWLCalibration[T constraints.Ordered](values []T) error
type BatchNormBuilder
- func BatchNormalization(ctx *context.Context, x *Node, featureAxis int) *BatchNormBuilder
- func (builder *BatchNormBuilder) Center(value bool) *BatchNormBuilder
- func (builder *BatchNormBuilder) Done() *Node
- func (builder *BatchNormBuilder) Epsilon(value float64) *BatchNormBuilder
- func (builder *BatchNormBuilder) Momentum(value float64) *BatchNormBuilder
- func (builder *BatchNormBuilder) Scale(value bool) *BatchNormBuilder
type ConvBuilder
- func Convolution(ctx *context.Context, x *Node) *ConvBuilder
- func (conv *ConvBuilder) ChannelsAfter() *ConvBuilder
- func (conv *ConvBuilder) ChannelsFirst() *ConvBuilder
- func (conv *ConvBuilder) DilationPerDim(dilations ...int) *ConvBuilder
- func (conv *ConvBuilder) Dilations(dilation int) *ConvBuilder
- func (conv *ConvBuilder) Done() *Node
- func (conv *ConvBuilder) Filters(filters int) *ConvBuilder
- func (conv *ConvBuilder) KernelSize(size int) *ConvBuilder
- func (conv *ConvBuilder) KernelSizePerDim(sizes ...int) *ConvBuilder
- func (conv *ConvBuilder) NoPadding() *ConvBuilder
- func (conv *ConvBuilder) PadSame() *ConvBuilder
- func (conv *ConvBuilder) StridePerDim(strides ...int) *ConvBuilder
- func (conv *ConvBuilder) Strides(strides int) *ConvBuilder
- func (conv *ConvBuilder) UseBias(useBias bool) *ConvBuilder
type LayerNormBuilder
- func LayerNormalization(ctx *context.Context, x *Node, normalizingAxes ...int) *LayerNormBuilder
- func (builder *LayerNormBuilder) Done() *Node
- func (builder *LayerNormBuilder) Epsilon(value float64) *LayerNormBuilder
- func (builder *LayerNormBuilder) LearnedOffset(value bool) *LayerNormBuilder
- func (builder *LayerNormBuilder) LearnedScale(value bool) *LayerNormBuilder
- func (builder *LayerNormBuilder) ScaleNormalization(value bool) *LayerNormBuilder
type MultiHeadAttentionBuilder
- func MultiHeadAttention(ctx *context.Context, query, key, value *Node, numHeads int, headDim int) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) Done() (output *Node)
- func (b *MultiHeadAttentionBuilder) DoneWithCoefficients() (attentionOutput, attentionCoefficients *Node)
- func (b *MultiHeadAttentionBuilder) Dropout(rate float64) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetKeyMask(keyMask *Node) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetKeyQueryDim(keyQueryDim int) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetOutputDim(outputDim int) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetQueryKeyMatrixMask(queryKeyMatrixMask *Node) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetQueryMask(queryMask *Node) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) SetValueHeadDim(valueDim int) *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) UseCausalMask() *MultiHeadAttentionBuilder
- func (b *MultiHeadAttentionBuilder) UseProjectionBias(useProjectionBias bool) *MultiHeadAttentionBuilder

Constants ¶

View Source

const (
	// L2RegularizationKey is the key to a context.Context.Params that defines the default L2 regularization
	// of kernels. Each layer may decide independently to implement it or not. DenseWithBias and Convolution kernels
	// look at this hyperparameter. The value should be a float64.
	L2RegularizationKey = "l2_regularization"
)

Variables ¶

View Source

var (
	// KnownNormalizers is a map of normalizer string to a function that applys them
	// with the default values, with the feature axis set to -1. This will only work
	// for the most standard problems, since anything with a different shape will need
	// special feature axes configuration for each normalization technique.
	//
	// It includes "none", which is a no-op.
	//
	// Notice that some normalizers use variables, and they need to be unique
	// in ther scope (`Context.In(scope)`) -- except if one wants to deliberately share
	// normalization variables accross more than one application.
	KnownNormalizers = map[string]func(ctx *context.Context, input *Node) *Node{
		"batch": func(ctx *context.Context, input *Node) *Node {
			return BatchNormalization(ctx, input, -1).Done()
		},
		"norm": func(ctx *context.Context, input *Node) *Node {
			return LayerNormalization(ctx, input, -1).Done()
		},
		"none": func(ctx *context.Context, input *Node) *Node {
			return input
		},
	}
)

Functions ¶

func AddL2Regularization ¶

func AddL2Regularization(ctx *context.Context, amount *Node, values ...*Node)

AddL2Regularization calculates the L2 of the given values (typically variable nodes returned by context.Variable.ValueGraph()), scale by the given amount (typically a constant) and then train.AddLoss the resulting value, having the effect of regularizing the weights (variables).

func Dense ¶

func Dense(ctx *context.Context, input *Node, useBias bool, outputDimensions ...int) *Node

Dense adds a dense linear layer, a learnable linear transformation. Optionally it can include a bias term.

It the input has shape `[<batch dimensions...>, featureDimension]`, the output will have shape `[<batch dimensions...>, <outputDimensions...>]`.

func DenseWithBias ¶

func DenseWithBias(ctx *context.Context, input *Node, outputDimensions ...int) *Node

DenseWithBias adds a dense linear layer, a learnable linear transformation plus a bias term.

It the input has shape `[<batch dimensions...>, featureDimension]`, the output will have shape `[<batch dimensions...>, <outputDimensions...>]`.

func Dropout ¶

func Dropout(ctx *context.Context, input *Node, dropoutRate *Node) *Node

Dropout randomly replace the input with zeros if ctx.IsTraining() is true. Otherwise, it's a no op (it returns input). It scales the output by 1/(1-dropoutRate) to preserve the mean of the values of the input.

func DropoutNormalize ¶

func DropoutNormalize(ctx *context.Context, input *Node, dropoutRate *Node, normalize bool) *Node

DropoutNormalize randomly replace the input with zeros if ctx.IsTraining() is true. Otherwise, it's a no op (it returns input). If normalize is set, it scales the output by 1/(1-dropoutRate) to preserve the mean of the values of the input.

func Embedding ¶

func Embedding(ctx *context.Context, input *Node, dtype shapes.DType, vocabSize, dimension int) *Node

Embedding creates an embedding table with vocabSize elements (typically a vocabulary size) each of dimension values -- so a [vocabSize, dimension] variable table.

It then converts each integer value of the input to an embedding of the given dimension size. The input must have an integer dtype, and the last dimension must be of size 1. If it's not of size one, an extra dimension is added to the end. All values of the input must smaller than vocabSize, otherwise it will fail -- no checking is explicitly made.

The output has rank one larger than the input, with the last dimension the same as the embedding dimension.

func MustNormalizeByName ¶

func MustNormalizeByName(ctx *context.Context, normalization string, input *Node) *Node

MustNormalizeByName applies the requested normalization using default parameters. If an invalid normalization is given, it panics with an error.

This will only work for the most standard problems, since anything with a different shape will need special feature axes configuration for each normalization technique.

It's a simple wrapper around KnonwNormalizers, if one wants to handle errors, just check for its values. For valid values see KnownNormalizers.

Typical use:

``` var flagNormalization = flag.String("norm", "none",

fmt.Sprintf("Type of layer normalization to use. Valid values: %q.",
	types.SortedKeys(layers.KnownNormalizers)))

...

func ModelGraph(...) {
    ...
    logits = MustNormalizeByName(ctx, *flagNormalization, logits)
    ...
}

```

func PieceWiseLinearCalibration ¶

func PieceWiseLinearCalibration(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node

PieceWiseLinearCalibration creates a piece-wise linear function from the input, splitting it in the given keypoints with outputs initialized with values from 0 to 1.

The keypoints are typically quantiles of the input feature, starting with the minimum value and ending on the maximum. It must have rank-1 and be of the same DType as input. Its values must be ordered, and cannot be repeated (this may lead to NaNs). Consider using ValidateQuantilesForPWLCalibration on the quantiles.

If outputTrainable is set to true, the outputs mapped to the keypoints are made trainable, and may change to values outside the range [0, 1].

In any case, if the input is beyond the first or last keypoint, the output of the function will flatten, preventing any extrapolations (often they are bad in NN).

This is a simpler version to the one described here: https://www.tensorflow.org/lattice/api_docs/python/tfl/layers/PWLCalibration

func PieceWiseLinearCalibrationCascaded ¶

func PieceWiseLinearCalibrationCascaded(ctx *context.Context, input, keypoints *Node, outputTrainable bool) *Node

PieceWiseLinearCalibrationCascaded is a similar implementation for PieceWiseLinearCalibration that is equally powerful (express the same functions) simpler (fewer ops) and faster, but is parametrizing differently (cascaded linear functions), and may have different learning characteristics when doing gradient descent.

func Relu ¶

func Relu(x *Node) *Node

Relu returns Max(x, 0), and is commonly used as an activation function in neural networks.

func ValidateQuantilesForPWLCalibration ¶

func ValidateQuantilesForPWLCalibration[T constraints.Ordered](values []T) error

ValidateQuantilesForPWLCalibration validate that raw values for quantiles are ok to be used for PieceWiseLinearCalibration. It checks for:

Enough data points.
Monotonicity of data points: quantiles should always be increasing.

Types ¶

type BatchNormBuilder ¶

type BatchNormBuilder struct {
	// contains filtered or unexported fields
}

BatchNormBuilder is a helper to build a batch normalization computation. Create it with BatchNormalization, set the desired parameters and when all is set, call Done.

func BatchNormalization ¶

func BatchNormalization(ctx *context.Context, x *Node, featureAxis int) *BatchNormBuilder

BatchNormalization performs a batch normalization layer on the input. It includes a scaling and offset factor, and normalization over the batch entries. It maintains a moving average mean and variance of the inputs which is later used during inference.

featureAxis is the axis over which **not to normalize**: this will normalize over the other dimensions, calculating the mean and variance by reducing all other dimensions. E.g: if your input is `[batch_size, features]` you should use featureAxis=1 (same as -1) to normalize over the batch; if your input is an image of shape `[batch_size, height, width, channels]` you should use featureAxis=3 (same as -1) to normalize over the batch and all the pixels, so each channel is normalized differently, but normalization happens over all the pixes of the whole batch.

Notice the difference between LayerNormalization, that normalizes over the feature dimensions, as opposed to the batch dimension.

To ease setting its parameters it returns a BatchNormBuilder object for configuration. Once it is set up call `BatchNormBuilder.Done` and it will return the normalized x. Browse through BatchNormBuilder to see the capabilities, and the defaults.

Batch normalization behaves differently during training and inference: during training it normalizes over the batch (so it likely won't work well for very small batch sizes) and in inference it normalizes using the collected moving average of the mean and variance.

Based on paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" (Sergey Ioffe, Christian Szegedy), https://arxiv.org/abs/1502.03167.

FutureWork: 1. Support padding by not normalizing parts that weren't touched. 2. Support selection of multiple features axes.

func (*BatchNormBuilder) Center ¶

func (builder *BatchNormBuilder) Center(value bool) *BatchNormBuilder

Center defines whether the batch normalization tries to center the input by adding a learned offset. Default to true.

func (*BatchNormBuilder) Done ¶

func (builder *BatchNormBuilder) Done() *Node

Done finishes configuring the BatchNormalization and generates the graph computation to normalize the input.

func (*BatchNormBuilder) Epsilon ¶

func (builder *BatchNormBuilder) Epsilon(value float64) *BatchNormBuilder

Epsilon is a small float added to variance to avoid dividing by zero. Defaults to 1e-3.

func (*BatchNormBuilder) Momentum ¶

func (builder *BatchNormBuilder) Momentum(value float64) *BatchNormBuilder

Momentum sets the moment of the moving average of the mean and variance maintained during training. This averaged mean and variance is used during inference for normalization. The default is 0.99.

func (*BatchNormBuilder) Scale ¶

func (builder *BatchNormBuilder) Scale(value bool) *BatchNormBuilder

Scale defines whether the batch normalization tries to scale the input by adding a learned scale. Default to true.

type ConvBuilder ¶

type ConvBuilder struct {
	// contains filtered or unexported fields
}

ConvBuilder is a helper to build a convolution computation. Create it with Convolution, set the desired parameters and when all is set, call Done.

func Convolution ¶

func Convolution(ctx *context.Context, x *Node) *ConvBuilder

Convolution prepares a convolution on x with the given kernel for arbitrary number of spatial dimensions (1D, 2D, 3D, etc.).

It is very flexible and to ease setting its parameters it returns a ConvBuilder object for configuration. Once it is set up call `ConvBuilder.Done` and it will return the convolved x. Browse through ConvBuilder to see the capabilities, and the defaults.

Two parameters need setting: Filters (or channels) and KernelSize. It will fail if they are not set.

The shape of x should be `[batch, <spatial_dimensions...>, input_channels]` if configured with `ConvBuilder.ChannelsAfter()`, the default. If one sets `ConvBuilder.ChannelsFirst()`, the shape should be `[batch, input_channels, <spatial_dimensions...>]` instead.

func (*ConvBuilder) ChannelsAfter ¶

func (conv *ConvBuilder) ChannelsAfter() *ConvBuilder

ChannelsAfter specify the order of the dimensions for x and kernel. This is the default.

If this is set x should be shaped `[batch, <spatial_dimensions...>, channels]`.

func (*ConvBuilder) ChannelsFirst ¶

func (conv *ConvBuilder) ChannelsFirst() *ConvBuilder

ChannelsFirst specify the order of the dimensions for x and kernel. The default is ChannelsAfter.

If this is set x should be shaped `[batch, channels, <spatial_dimensions...>]`.

func (*ConvBuilder) DilationPerDim ¶

func (conv *ConvBuilder) DilationPerDim(dilations ...int) *ConvBuilder

DilationPerDim sets the kernel dilations for each spatial dimension of the convolution. The default is 1 for every dimension.

Specifies the kernel up-sampling rate. In the literature, the same parameter is sometimes called input stride or dilation. The effective kernel size used for the convolution will be `kernel_shape + (kernel_shape - 1) * (dilation - 1)`, obtained by inserting (dilation-1) zeros between consecutive elements of the original filter in the spatial dimension.

One cannot use strides and dilation at the same time.

func (*ConvBuilder) Dilations ¶

func (conv *ConvBuilder) Dilations(dilation int) *ConvBuilder

Dilations sets the dilations of the convolution. It sets the same value for every dimension. The default is 1.

Specifies the kernel up-sampling rate. In the literature, the same parameter is sometimes called input stride or dilation. The effective kernel size used for the convolution will be `kernel_shape + (kernel_shape - 1) * (dilation - 1)`, obtained by inserting (dilation-1) zeros between consecutive elements of the original filter in the spatial dimension.

One cannot use strides and dilation at the same time.

func (*ConvBuilder) Done ¶

func (conv *ConvBuilder) Done() *Node

Done indicates that the Convolution layer is finished being configured. It then creates the convolution and it's kernels (variables) and returns the resulting Node.

func (*ConvBuilder) Filters ¶

func (conv *ConvBuilder) Filters(filters int) *ConvBuilder

Filters sets the number of filters -- specifies the number of output channels. There is no default and this number must be set, before Done is called.

func (*ConvBuilder) KernelSize ¶

func (conv *ConvBuilder) KernelSize(size int) *ConvBuilder

KernelSize sets the kernel size for every axis. There is no default and this number must be set, before Done is called.

You can also use KernelSizePerDim to set the kernel size per dimension (axis) individually.

func (*ConvBuilder) KernelSizePerDim ¶

func (conv *ConvBuilder) KernelSizePerDim(sizes ...int) *ConvBuilder

KernelSizePerDim sets the kernel size for each dimension(axis). There is no default and this number must be set, before Done is called.

You can also use KernelSize to set the kernel size the same for all dimensions.

func (*ConvBuilder) NoPadding ¶

func (conv *ConvBuilder) NoPadding() *ConvBuilder

NoPadding removes any paddings, so if the kernel spatial dimensions > 1, the output shape will be reduced on the edges.

This is the default.

func (*ConvBuilder) PadSame ¶

func (conv *ConvBuilder) PadSame() *ConvBuilder

PadSame adds paddings on the edges of x such that in the end the output of the convolution has the same shape as the input (assuming strides=1).

The default is NoPadding.

func (*ConvBuilder) StridePerDim ¶

func (conv *ConvBuilder) StridePerDim(strides ...int) *ConvBuilder

StridePerDim sets the strides for each spatial dimension of the convolution. The default is 1 for every dimension.

The stride is how many steps to move after a convolution. A value of 2 will half the input size, since a convolution will be done at every other position, and so on. It can be defined separately per dimension.

One cannot use strides and dilation at the same time.

func (*ConvBuilder) Strides ¶

func (conv *ConvBuilder) Strides(strides int) *ConvBuilder

Strides sets the strides of the convolution. It sets the same value for every dimension. The default is 1.

The stride is how many steps to move after a convolution. A value of 2 will half the input size, since a convolution will be done at every other position, and so on. It can be defined separately per dimension.

One cannot use strides and dilation at the same time.

func (*ConvBuilder) UseBias ¶

func (conv *ConvBuilder) UseBias(useBias bool) *ConvBuilder

UseBias sets whether to add a trainable bias term to the convolution. Default is true.

type LayerNormBuilder ¶

type LayerNormBuilder struct {
	// contains filtered or unexported fields
}

LayerNormBuilder is a helper to build a layer normalization computation. Create it with LayerNormalization, set the desired parameters and when all is set, call Done. See LayerNormalization for details.

func LayerNormalization ¶

func LayerNormalization(ctx *context.Context, x *Node, normalizingAxes ...int) *LayerNormBuilder

LayerNormalization performs a layer normalization on the input. It includes a scaling and offset factor, and normalization over the feature entries.

This is an alternative to BatchNormalization, that doesn't suffer from the problem of variance on small batch sizes, nor does it need to keep a moving average of the normalization parameters. Commonly used with transformer layers (see MultiHeadAttention).

normalizingAxes are the axes over which to normalize: mean and variance are calculated over these axes and the values are then normalized. E.g: if your input is `[batch_size, features]` you should use `normalizingAxes=[1]` (same as -1) to normalize over the `features` axis; if your input is an image of shape `[batch_size, height, width, channels]` one common approach is to normalize over the image, so `normalizingAxes=[1 2]`, but not over the channels (or batch).

Notice the difference between BatchNormalization, that normalizes over the batch dimension, as opposed to the feature dimensions.

The layer norm may have a learned scale and offset, controlled by LayerNormBuilder.LearnedScale and LayerNormBuilder.LearnedOffset settings, enabled by default.

To ease setting its parameters it returns a LayerNormBuilder object for configuration. Once it is set up call `LayerNormBuilder.Done` and it will return the normalized x. Browse through LayerNormBuilder to check for its capabilities, and the defaults.

Layer normalization behaves the same during training and inference -- as opposed to batch normalization.

Based on paper "Layer Normalization" (Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton), https://arxiv.org/abs/1607.06450

FutureWork: support padding by not normalizing parts that weren't touched ...

func (*LayerNormBuilder) Done ¶

func (builder *LayerNormBuilder) Done() *Node

Done finishes configuring the LayerNormalization and generates the graph computation to normalize the input.

func (*LayerNormBuilder) Epsilon ¶

func (builder *LayerNormBuilder) Epsilon(value float64) *LayerNormBuilder

Epsilon is a small float added to variance to avoid dividing by zero. It defaults to 1e-3.

It is not used if ScaleNormalization is set to false.

func (*LayerNormBuilder) LearnedOffset ¶

func (builder *LayerNormBuilder) LearnedOffset(value bool) *LayerNormBuilder

LearnedOffset defines whether the layer normalization tries to center the input by adding a learned offset. It defaults to true.

The offset will be learned separately for each axis that is not the batch (assumed to be axis 0 only) and not any of the normalizingAxes.

func (*LayerNormBuilder) LearnedScale ¶

func (builder *LayerNormBuilder) LearnedScale(value bool) *LayerNormBuilder

LearnedScale defines whether the layer normalization tries to scale the input by adding a learned scale. It defaults to true.

The scale will be learned separately for each axis that is not the batch (assumed to be axis 0 only) and not any of the normalizingAxes.

func (*LayerNormBuilder) ScaleNormalization ¶

func (builder *LayerNormBuilder) ScaleNormalization(value bool) *LayerNormBuilder

ScaleNormalization defines whether the input's scale is normalized by the square root of the variance. The default is true, and this is the original paper specification, but in some cases it works best without it.

type MultiHeadAttentionBuilder ¶

type MultiHeadAttentionBuilder struct {
	// contains filtered or unexported fields
}

MultiHeadAttentionBuilder is a helper to build a multi-head-attention computation. Create it with MultiHeadAttention, set the desired parameters and when all is set, call Done.

func MultiHeadAttention ¶

func MultiHeadAttention(ctx *context.Context, query, key, value *Node, numHeads int, headDim int) *MultiHeadAttentionBuilder

MultiHeadAttention defines a multi-head attention layers, as described in the paper "Attention Is All You Need", https://arxiv.org/abs/1706.03762, by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin.

It takes query, key and value and project them numHead times, to a headDim sized embeddings. Then it uses the dot-product of query and key as weights, and returns a softmax sum of value, for each head.

Typical shapes:

- query: `[batch_size, <query_elements>, inputQueryDim]`. - key: `[batch_size, <num_key/value_elements>, inputKeyDim]`. - value: `[batch_size, <num_key/value_elements>, inputValueDim]`.

And, when calling IsNil, after another output projection, it returns a node of shape `[batch_size, <num_queries>, inputValueDim]`, if no other settings are given. See settings in MultiHeadAttentionBuilder.to control various aspects.

Notice it's common to use key=values, and even query=keys=values. For instance for encoding text, one may use the input sequence as all 3 (query, key and value).

The function returns a MultiHeadAttentionBuilder that can be further configured, and the resulting Node is returned when MultiHeadAttentionBuilder.Done is called. Alternatively one can call MultiHeadAttentionBuilder.DoneWithCoefficients, in which case it returns both the updated state and the attention coefficients.

func (*MultiHeadAttentionBuilder) Done ¶

func (b *MultiHeadAttentionBuilder) Done() (output *Node)

Done or DoneWithCoefficients should be called after all optional settings are configured. It returns both the attention output and the attention coefficients (matrix) used.

`output` will be shaped `[batch_size, <query_elements>, output_dim]`, where `output_dim` can be configured by `SetOutputDim`.

func (*MultiHeadAttentionBuilder) DoneWithCoefficients ¶

func (b *MultiHeadAttentionBuilder) DoneWithCoefficients() (attentionOutput, attentionCoefficients *Node)

DoneWithCoefficients or Done should be called after all optional settings are configured. It returns both the attention output and the attention coefficients (matrix) used.

`output` will be shaped `[batch_size, <query_elements>, output_dim]`, where `output_dim` can be configured by `SetOutputDim`.

`coefficients` is shaped `[batch_size, <query_elements>, <num_heads>, <key_elements>]` with the attention weights (from 0 to 1).

func (*MultiHeadAttentionBuilder) Dropout ¶

func (b *MultiHeadAttentionBuilder) Dropout(rate float64) *MultiHeadAttentionBuilder

Dropout defines how much dropout to use in the attention coefficients calculation. If set to 0 or lower, it's simply disabled. Default is 0.

func (*MultiHeadAttentionBuilder) SetKeyMask ¶

func (b *MultiHeadAttentionBuilder) SetKeyMask(keyMask *Node) *MultiHeadAttentionBuilder

SetKeyMask sets a mask for keys that are actually valid and can be attended. Defaults to no mask, meaning all keys are accessible. See also SetQueryMask.

Shape should be `[batch_size, numHeads, <key_elements>]`, or `[batch_size, <key_elements>]` if the mask is the same for every head.

Either use SetKeyMask and SetQueryMask separately or use SetKeyQueryMatrixMask, but not both. Opionally one can also UseCausalMask, which is combined (logical-and) to any given mask.

func (*MultiHeadAttentionBuilder) SetKeyQueryDim ¶

func (b *MultiHeadAttentionBuilder) SetKeyQueryDim(keyQueryDim int) *MultiHeadAttentionBuilder

SetKeyQueryDim allows finer configuration on the dimension of the projection used for the query/key pairs for each head. It defaults to the value given by `headDim`.

func (*MultiHeadAttentionBuilder) SetOutputDim ¶

func (b *MultiHeadAttentionBuilder) SetOutputDim(outputDim int) *MultiHeadAttentionBuilder

SetOutputDim defines the output dimension of the final projection, from the flattened attention heads. It defaults to the value of the last dimension of `values` passed as input (`inputValueDim`).

func (*MultiHeadAttentionBuilder) SetQueryKeyMatrixMask ¶

func (b *MultiHeadAttentionBuilder) SetQueryKeyMatrixMask(queryKeyMatrixMask *Node) *MultiHeadAttentionBuilder

SetQueryKeyMatrixMask sets a mask matrix that defines which queries can attend to which keys. Defaults to no mask, meaning all queries are accessible.

Shape should be `[batch_size, numHeads, <query_elements>, <key_elements>]`, or `[batch_size, <query_elements>, <key_elements>]` if the mask is the same for every head.

Either use SetKeyMask and SetQueryMask separately or use SetKeyQueryMatrixMask, but not both. Opionally one can also UseCausalMask, which is combined (logical-and) to any given mask.

func (*MultiHeadAttentionBuilder) SetQueryMask ¶

func (b *MultiHeadAttentionBuilder) SetQueryMask(queryMask *Node) *MultiHeadAttentionBuilder

SetQueryMask sets a mask for queries that are actually valid and should be used. Defaults to no mask, meaning all queries are accessible. See also SetKeyMask.

Shape should be `[batch_size, numHeads, <query_elements>]`, or `[batch_size, <query_elements>]` if the mask is the same for every head.

Either use SetKeyMask and SetQueryMask separately or use SetKeyQueryMatrixMask, but not both. Opionally one can also UseCausalMask, which is combined (logical-and) to any given mask.

func (*MultiHeadAttentionBuilder) SetValueHeadDim ¶

func (b *MultiHeadAttentionBuilder) SetValueHeadDim(valueDim int) *MultiHeadAttentionBuilder

SetValueHeadDim allows finer configuration on the dimension of the projection used for the value for each head. It defaults to the value given by `headDim`.

func (*MultiHeadAttentionBuilder) UseCausalMask ¶

func (b *MultiHeadAttentionBuilder) UseCausalMask() *MultiHeadAttentionBuilder

UseCausalMask adds a mask where a query can only attend to keys with lower indices than itself. It assumes that query and key are either the same or have the same inner shape, and there is only one inner rank -- so key/query should have rank-3 shape `[batch, inner_dim, key/query_dim]`.

This mask can be used in combination (logical-and) with other masks.

func (*MultiHeadAttentionBuilder) UseProjectionBias ¶

func (b *MultiHeadAttentionBuilder) UseProjectionBias(useProjectionBias bool) *MultiHeadAttentionBuilder

UseProjectionBias defines whether to use a bias term on the final output projection. Default is true.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL