Documentation ¶
Index ¶
- func Affine(g *ag.Graph, xs ...ag.Node) ag.Node
- func BiAffine(g *ag.Graph, w, u, v, b, x1, x2 ag.Node) ag.Node
- func BiLinear(g *ag.Graph, w, x1, x2 ag.Node) ag.Node
- func ClearSupport(m Model)
- func Conv2D(g *ag.Graph, w, x ag.Node, xStride, yStride int) ag.Node
- func DumpParamsVector(model Model) *mat.Dense
- func ForEachParam(m Model, callback func(param *Param))
- func ForEachParamStrict(m Model, callback func(param *Param))
- func LinearAttention(g *ag.Graph, qs, ks, vs []ag.Node, mappingFunction MappingFunc, eps float64) []ag.Node
- func LoadParamsVector(model Model, vector *mat.Dense)
- func PayloadMarshalBinaryTo(supp *Payload, w io.Writer) (int, error)
- func ScaledDotProductAttention(g *ag.Graph, qs, ks, vs []ag.Node, scaleFactor float64) (context []ag.Node, prob []mat.Matrix)
- func ScaledDotProductAttentionConcurrent(g *ag.Graph, qs, ks, vs []ag.Node, scaleFactor float64) (context []ag.Node, prob []mat.Matrix)
- func Separate(g *ag.Graph, x ag.Node) [][]ag.Node
- func SeparateVec(g *ag.Graph, x ag.Node) []ag.Node
- func SplitVec(g *ag.Graph, x ag.Node, chunks int) []ag.Node
- func ZeroGrad(m Model)
- type BaseProcessor
- type Context
- type DefaultParamsIterator
- type MappingFunc
- type Model
- type Param
- func (r *Param) ApplyDelta(delta mat.Matrix)
- func (r *Param) ClearPayload()
- func (r *Param) Grad() mat.Matrix
- func (r *Param) HasGrad() bool
- func (r *Param) MarshalBinary() ([]byte, error)
- func (r *Param) Name() string
- func (r *Param) Payload() *Payload
- func (r *Param) PropagateGrad(grad mat.Matrix)
- func (r *Param) ReplaceValue(value mat.Matrix)
- func (r *Param) RequiresGrad() bool
- func (r *Param) ScalarValue() float64
- func (r *Param) SetName(name string)
- func (r *Param) SetPayload(payload *Payload)
- func (r *Param) SetType(name string)
- func (r *Param) Type() ParamsType
- func (r *Param) UnmarshalBinary(data []byte) error
- func (r *Param) Value() mat.Matrix
- func (r *Param) ZeroGrad()
- type ParamOption
- type ParamSerializer
- type ParamsIterator
- type ParamsSerializer
- type ParamsType
- type Payload
- type ProcessingMode
- type Processor
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Affine ¶
Affine performs an affine transformation over an arbitrary (odd) number of nodes held in the input. The first node is the “bias”, which is added to the output as-is. The remaining nodes of the form "Wx" are multiplied together in pairs, then added. The pairs except the first whose "x" is nil are not considered. y = b + W1x1 + W2x2 + ... + WnXn
func ClearSupport ¶
func ClearSupport(m Model)
ClearPayload clears the support structure of all model's parameters (including sub-params). TODO: use ParamsIterator?
func ForEachParam ¶
ForEachParam iterate all the parameters of a model also exploring the sub-parameters recursively. TODO: don't loop the field every time, use a lazy initialized "params list" instead
func ForEachParamStrict ¶
ForEachParamStrict iterate all the parameters of a model without exploring the sub-models.
func LinearAttention ¶
func LinearAttention(g *ag.Graph, qs, ks, vs []ag.Node, mappingFunction MappingFunc, eps float64) []ag.Node
LinearAttention performs the self-attention as a linear dot-product of kernel feature maps. It operates with O(N) complexity, where N is the sequence length. Reference: "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" by Katharopoulos et al. (2020)
func LoadParamsVector ¶
TODO: use ParamsIterator?
func PayloadMarshalBinaryTo ¶
PayloadMarshalBinaryTo returns the number of bytes written into w and an error, if any.
func ScaledDotProductAttention ¶
func ScaledDotProductAttention(g *ag.Graph, qs, ks, vs []ag.Node, scaleFactor float64) (context []ag.Node, prob []mat.Matrix)
ScaledDotProductAttention is a self-attention mechanism relating different positions of a single sequence in order to compute a representation of the same sequence. This method requires that the query, the key and the value vectors have already been obtained from the input sequence. The scaled factor is the square root of the dimension of the key vectors.
func ScaledDotProductAttentionConcurrent ¶
func ScaledDotProductAttentionConcurrent(g *ag.Graph, qs, ks, vs []ag.Node, scaleFactor float64) (context []ag.Node, prob []mat.Matrix)
ScaledDotProductAttentionConcurrent does the same thing as ScaledDotProductAttention but processes input concurrently.
func Separate ¶
Separate returns a matrix of Node(s) represented as a slice of slice containing the elements extracted from the input. The dimensions of the resulting matrix are the same of the input.
func SeparateVec ¶
SeparateVec returns a slice of Node(s) containing the elements extracted from the input. The size of the vector equals the number of input elements. You can think of this method as the inverse of the ag.Concat operator.
Types ¶
type BaseProcessor ¶
type BaseProcessor struct { Model Model Mode ProcessingMode Graph *ag.Graph FullSeqProcessing bool }
BaseProcessors satisfies some methods of the Processor interface. It is meant to be embedded in other processors to reduce the amount of boilerplate code.
func (*BaseProcessor) GetGraph ¶
func (p *BaseProcessor) GetGraph() *ag.Graph
GetGraph returns the computational graph on which the processor operates.
func (*BaseProcessor) GetMode ¶
func (p *BaseProcessor) GetMode() ProcessingMode
GetMode returns whether the processor is being used for training or inference.
func (*BaseProcessor) GetModel ¶
func (p *BaseProcessor) GetModel() Model
GetModel returns the model the processor belongs to.
func (*BaseProcessor) RequiresFullSeq ¶
func (p *BaseProcessor) RequiresFullSeq() bool
RequiresFullSeq returns whether the processor needs the complete sequence to start processing (as in the case of BiRNN and other bidirectional models), or not.
type Context ¶
type Context struct { // Graph is the computational graph on which the processor(s) operate. Graph *ag.Graph // Mode regulates the different usage of some operations whether you're doing training or inference. Mode ProcessingMode }
Context is used to instantiate a processor to operate on a graph, according to the desired ProcessingMode. If a processor contains other sub-processors, you must instantiate them using the same context to make sure you are operating on the same graph and in the same mode.
type DefaultParamsIterator ¶
type DefaultParamsIterator struct {
// contains filtered or unexported fields
}
func NewDefaultParamsIterator ¶
func NewDefaultParamsIterator(models ...Model) *DefaultParamsIterator
func (*DefaultParamsIterator) ParamsList ¶
func (i *DefaultParamsIterator) ParamsList() []*Param
type Model ¶
type Model interface { // NewProc returns a new processor to execute the forward step. NewProc(ctx Context) Processor }
Model contains the serializable parameters.
type Param ¶
type Param struct {
// contains filtered or unexported fields
}
func NewParam ¶
func NewParam(value mat.Matrix, opts ...ParamOption) *Param
NewParam returns a new param.
func (*Param) ApplyDelta ¶
ApplyDelta updates the value of the underlying storage applying the delta.
func (*Param) ClearPayload ¶
func (r *Param) ClearPayload()
ClearPayload clears the support structure.
func (*Param) MarshalBinary ¶
MarshalBinary satisfies package pkg/encoding/gob custom marshaling interface
func (*Param) PropagateGrad ¶
PropagateGrad accumulate the gradients
func (*Param) ReplaceValue ¶
ReplaceValue replaces the value of the parameter and clears the support structure.
func (*Param) RequiresGrad ¶
RequiresGrad returns true if the param requires gradients.
func (*Param) ScalarValue ¶
ScalarValue() returns the the scalar value of the node. It panics if the value is not a scalar. Note that it is not possible to start the backward step from a scalar value.
func (*Param) SetPayload ¶
func (*Param) Type ¶
func (r *Param) Type() ParamsType
Type returns the params type (weights, biases, undefined).
func (*Param) UnmarshalBinary ¶
UnmarshalBinary satisfies pkg/encoding/gob custom marshaling interface
type ParamOption ¶
type ParamOption func(*Param)
func RequiresGrad ¶
func RequiresGrad(value bool) ParamOption
func SetStorage ¶
func SetStorage(storage kvdb.KeyValueDB) ParamOption
type ParamSerializer ¶
type ParamSerializer struct {
*Param
}
func (*ParamSerializer) Deserialize ¶
func (s *ParamSerializer) Deserialize(r io.Reader) (n int, err error)
type ParamsIterator ¶
type ParamsIterator interface {
ParamsList() []*Param
}
type ParamsSerializer ¶
type ParamsSerializer struct {
Model
}
func NewParamsSerializer ¶
func NewParamsSerializer(m Model) *ParamsSerializer
func (*ParamsSerializer) Deserialize ¶
func (m *ParamsSerializer) Deserialize(r io.Reader) (n int, err error)
Deserialize assigns the params with the values obtained from the reader. TODO: use ParamsIterator?
type ParamsType ¶
type ParamsType int
const ( Weights ParamsType = iota Biases Undefined )
func ToType ¶
func ToType(s string) ParamsType
ToType convert a string to a ParamsType. It returns Undefined if the string doesn't match any ParamsType.
func (ParamsType) String ¶
func (t ParamsType) String() string
type Payload ¶
Payload contains the support data used for example by the optimization methods
func NewEmptySupport ¶
func NewEmptySupport() *Payload
NewEmptySupport returns an empty support structure, not connected to any optimization method.
type ProcessingMode ¶
type ProcessingMode int
ProcessingMode regulates the different usage of some operations (e.g. Dropout, BatchNorm, etc.) inside a Processor, depending on whether you're doing training or inference. Failing to set the right mode will yield inconsistent inference results.
const ( // Training is to be used during the training phase of a model. For example, dropouts are enabled. Training ProcessingMode = iota // Inference keeps weights fixed while using the model and disables some operations (e.g. skip dropout). Inference )
type Processor ¶
type Processor interface { // GetModel returns the model the processor belongs to. GetModel() Model // GetMode returns whether the processor is being used for training or inference. GetMode() ProcessingMode // GetGraph returns the computational graph on which the processor operates. GetGraph() *ag.Graph // RequiresFullSeq returns whether the processor needs the complete sequence to start processing // (as in the case of BiRNN and other bidirectional models), or not. RequiresFullSeq() bool // Forward performs the forward step for each input and returns the result. // Recurrent networks treats the input nodes as a sequence. // Differently, feed-forward networks are stateless so every computation is independent. Forward(xs ...ag.Node) []ag.Node }
Processor performs the operations on the computational graphs using the model's parameters.
Directories ¶
Path | Synopsis |
---|---|
Bidirectional Recurrent Neural Network (BiRNN) with a Conditional Random Fields (CRF) on top.
|
Bidirectional Recurrent Neural Network (BiRNN) with a Conditional Random Fields (CRF) on top. |
Implementation of the Broad Learning System (BLS) described in "Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture" by C. L. Philip Chen and Zhulin Liu, 2017.
|
Implementation of the Broad Learning System (BLS) described in "Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture" by C. L. Philip Chen and Zhulin Liu, 2017. |
gnn
|
|
slstm
slstm Reference: "Sentence-State LSTM for Text Representation" by Zhang et al, 2018.
|
slstm Reference: "Sentence-State LSTM for Text Representation" by Zhang et al, 2018. |
startransformer
StarTransformer is a variant of the model introduced by Qipeng Guo, Xipeng Qiu et al.
|
StarTransformer is a variant of the model introduced by Qipeng Guo, Xipeng Qiu et al. |
LSH-Attention as in `Reformer: The Efficient Transformer` by N. Kitaev, Ł. Kaiser, A. Levskaya.
|
LSH-Attention as in `Reformer: The Efficient Transformer` by N. Kitaev, Ł. Kaiser, A. Levskaya. |
normalization
|
|
adanorm
Reference: "Understanding and Improving Layer Normalization" by Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin (2019).
|
Reference: "Understanding and Improving Layer Normalization" by Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin (2019). |
fixnorm
Reference: "Improving Lexical Choice in Neural Machine Translation" by Toan Q. Nguyen and David Chiang (2018) (https://arxiv.org/pdf/1710.01329.pdf)
|
Reference: "Improving Lexical Choice in Neural Machine Translation" by Toan Q. Nguyen and David Chiang (2018) (https://arxiv.org/pdf/1710.01329.pdf) |
layernorm
Reference: "Layer normalization" by Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton (2016).
|
Reference: "Layer normalization" by Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton (2016). |
layernormsimple
Reference: "Understanding and Improving Layer Normalization" by Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin (2019).
|
Reference: "Understanding and Improving Layer Normalization" by Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin (2019). |
rmsnorm
Reference: "Root Mean Square Layer Normalization" by Biao Zhang and Rico Sennrich (2019).
|
Reference: "Root Mean Square Layer Normalization" by Biao Zhang and Rico Sennrich (2019). |
Implementation of the recursive auto-encoder strategy described in "Towards Lossless Encoding of Sentences" by Prato et al., 2019.
|
Implementation of the recursive auto-encoder strategy described in "Towards Lossless Encoding of Sentences" by Prato et al., 2019. |
This package contains built-in Residual Connections (RC).
|
This package contains built-in Residual Connections (RC). |
rec
|
|
horn
Higher Order Recurrent Neural Networks (HORN)
|
Higher Order Recurrent Neural Networks (HORN) |
lstmsc
LSTM enriched with a PolicyGradient to enable Dynamic Skip Connections.
|
LSTM enriched with a PolicyGradient to enable Dynamic Skip Connections. |
mist
Implementation of the MIST (MIxed hiSTory) recurrent network as described in "Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies" by Di Pietro et al., 2018 (https://arxiv.org/pdf/1702.07805.pdf).
|
Implementation of the MIST (MIxed hiSTory) recurrent network as described in "Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies" by Di Pietro et al., 2018 (https://arxiv.org/pdf/1702.07805.pdf). |
nru
Implementation of the NRU (Non-Saturating Recurrent Units) recurrent network as described in "Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies" by Chandar et al., 2019.
|
Implementation of the NRU (Non-Saturating Recurrent Units) recurrent network as described in "Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies" by Chandar et al., 2019. |
rla
RLA (Recurrent Linear Attention) "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" by Katharopoulos et al., 2020.
|
RLA (Recurrent Linear Attention) "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" by Katharopoulos et al., 2020. |
srnn
srnn implements the SRNN (Shuffling Recurrent Neural Networks) by Rotman and Wolf, 2020.
|
srnn implements the SRNN (Shuffling Recurrent Neural Networks) by Rotman and Wolf, 2020. |
This is an implementation of the Synthetic Attention described in: "SYNTHESIZER: Rethinking Self-Attention in Transformer Models" by Tay et al., 2020.
|
This is an implementation of the Synthetic Attention described in: "SYNTHESIZER: Rethinking Self-Attention in Transformer Models" by Tay et al., 2020. |