Documentation
¶
Overview ¶
Package embeddings provides neural network embedding layers.
Stability: stable
Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
Index ¶
- type RotaryPositionalEmbedding
- func (rpe *RotaryPositionalEmbedding[T]) AttentionScaleFactor() float64
- func (rpe *RotaryPositionalEmbedding[T]) Attributes() map[string]interface{}
- func (rpe *RotaryPositionalEmbedding[T]) Backward(ctx context.Context, mode types.BackwardMode, dOut *tensor.TensorNumeric[T], ...) ([]*tensor.TensorNumeric[T], error)
- func (rpe *RotaryPositionalEmbedding[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)
- func (rpe *RotaryPositionalEmbedding[T]) GetAngles(offset, seqLen int) (cos, sin *tensor.TensorNumeric[T], halfRotary int, err error)
- func (rpe *RotaryPositionalEmbedding[T]) GetAnglesGPU(counterPtr unsafe.Pointer, seqLen int, stream unsafe.Pointer) (cos, sin *tensor.TensorNumeric[T], halfRotary int, err error)
- func (rpe *RotaryPositionalEmbedding[T]) OpType() string
- func (rpe *RotaryPositionalEmbedding[T]) OutputShape() []int
- func (rpe *RotaryPositionalEmbedding[T]) Parameters() []*graph.Parameter[T]
- func (rpe *RotaryPositionalEmbedding[T]) RotaryDim() int
- func (rpe *RotaryPositionalEmbedding[T]) Scale(ctx context.Context, factor float64) error
- func (rpe *RotaryPositionalEmbedding[T]) SetPositionOffset(offset int)
- type RotaryPositionalEmbeddingOption
- type RotaryPositionalEmbeddingOptions
- type TokenEmbedding
- func (te *TokenEmbedding[T]) Attributes() map[string]interface{}
- func (te *TokenEmbedding[T]) Backward(ctx context.Context, mode types.BackwardMode, ...) ([]*tensor.TensorNumeric[T], error)
- func (te *TokenEmbedding[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)
- func (te *TokenEmbedding[T]) OpType() string
- func (te *TokenEmbedding[T]) OutputShape() []int
- func (te *TokenEmbedding[T]) Parameters() []*graph.Parameter[T]
- type TokenEmbeddingOption
- type TokenEmbeddingOptions
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type RotaryPositionalEmbedding ¶
type RotaryPositionalEmbedding[T tensor.Numeric] struct { // contains filtered or unexported fields }
RotaryPositionalEmbedding applies Rotary Positional Embedding to a tensor.
func NewRotaryPositionalEmbedding ¶
func NewRotaryPositionalEmbedding[T tensor.Numeric]( ctx context.Context, engine compute.Engine[T], headDim int, seqLen int, options ...RotaryPositionalEmbeddingOption, ) (*RotaryPositionalEmbedding[T], error)
NewRotaryPositionalEmbedding creates a new RotaryPositionalEmbedding layer. headDim: The dimension of the head. Must be even. seqLen: The maximum sequence length this embedding will be applied to. engine: The compute engine to use for tensor operations.
func (*RotaryPositionalEmbedding[T]) AttentionScaleFactor ¶ added in v0.2.1
func (rpe *RotaryPositionalEmbedding[T]) AttentionScaleFactor() float64
AttentionScaleFactor returns the YaRN attention scaling factor. Returns 1.0 when YaRN is not enabled.
func (*RotaryPositionalEmbedding[T]) Attributes ¶ added in v0.2.1
func (rpe *RotaryPositionalEmbedding[T]) Attributes() map[string]interface{}
Attributes returns the attributes of the RotaryPositionalEmbedding layer.
func (*RotaryPositionalEmbedding[T]) Backward ¶
func (rpe *RotaryPositionalEmbedding[T]) Backward(ctx context.Context, mode types.BackwardMode, dOut *tensor.TensorNumeric[T], _ ...*tensor.TensorNumeric[T]) ([]*tensor.TensorNumeric[T], error)
Backward computes the gradients for RoPE. Shapes are derived from dOut so that a single RoPE instance can be shared across Q and K paths whose batch dimensions differ.
func (*RotaryPositionalEmbedding[T]) Forward ¶
func (rpe *RotaryPositionalEmbedding[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)
Forward applies Rotary Positional Embedding to the input tensor.
func (*RotaryPositionalEmbedding[T]) GetAngles ¶ added in v0.2.1
func (rpe *RotaryPositionalEmbedding[T]) GetAngles(offset, seqLen int) (cos, sin *tensor.TensorNumeric[T], halfRotary int, err error)
GetAngles returns the cos/sin angle tensors for the given position range, along with halfRotary. For GPU-resident tables, returns non-owning views. This is used by the fused QK norm+RoPE kernel during decode.
func (*RotaryPositionalEmbedding[T]) GetAnglesGPU ¶ added in v0.2.1
func (rpe *RotaryPositionalEmbedding[T]) GetAnglesGPU(counterPtr unsafe.Pointer, seqLen int, stream unsafe.Pointer) ( cos, sin *tensor.TensorNumeric[T], halfRotary int, err error, )
GetAnglesGPU returns cos/sin angle tensors selected by a GPU-resident counter, avoiding CPU-side offset computation. This enables CUDA graph capture of the decode loop by keeping all position-dependent reads on GPU. counterPtr is a device pointer to an int32 position counter (from GPUKVCache). stream is the CUDA stream (unsafe.Pointer to cudaStream_t) for kernel launch. seqLen is the number of positions to select (1 for decode).
func (*RotaryPositionalEmbedding[T]) OpType ¶ added in v0.2.1
func (rpe *RotaryPositionalEmbedding[T]) OpType() string
OpType returns the operation type of the RotaryPositionalEmbedding layer.
func (*RotaryPositionalEmbedding[T]) OutputShape ¶
func (rpe *RotaryPositionalEmbedding[T]) OutputShape() []int
OutputShape returns the output shape of the RoPE layer.
func (*RotaryPositionalEmbedding[T]) Parameters ¶
func (rpe *RotaryPositionalEmbedding[T]) Parameters() []*graph.Parameter[T]
Parameters returns no trainable parameters for RoPE.
func (*RotaryPositionalEmbedding[T]) RotaryDim ¶ added in v0.2.1
func (rpe *RotaryPositionalEmbedding[T]) RotaryDim() int
RotaryDim returns the number of dimensions that receive rotation.
func (*RotaryPositionalEmbedding[T]) Scale ¶ added in v0.2.0
func (rpe *RotaryPositionalEmbedding[T]) Scale(ctx context.Context, factor float64) error
Scale scales the positional embeddings by a given factor.
func (*RotaryPositionalEmbedding[T]) SetPositionOffset ¶ added in v0.2.1
func (rpe *RotaryPositionalEmbedding[T]) SetPositionOffset(offset int)
SetPositionOffset sets the position offset for the next Forward call. During autoregressive decode, call this with the current cache sequence length so that the new token is rotated at the correct absolute position instead of always position 0.
type RotaryPositionalEmbeddingOption ¶ added in v0.2.0
type RotaryPositionalEmbeddingOption func(*RotaryPositionalEmbeddingOptions)
RotaryPositionalEmbeddingOption is a functional option for configuring RotaryPositionalEmbedding layers.
func WithRotaryBase ¶ added in v0.2.0
func WithRotaryBase(base float64) RotaryPositionalEmbeddingOption
WithRotaryBase sets the base (theta) parameter for the inverse frequency calculation.
func WithRotaryDimFraction ¶ added in v0.2.1
func WithRotaryDimFraction(fraction float64) RotaryPositionalEmbeddingOption
WithRotaryDimFraction sets the fraction of head dimensions that receive rotation. Default is 1.0 (all dimensions rotated). Phi-4 uses 0.75 for partial RoPE.
func WithYaRNScaling ¶ added in v0.2.1
func WithYaRNScaling(factor float64, origMaxLen int) RotaryPositionalEmbeddingOption
WithYaRNScaling enables YaRN (Yet another RoPE extensioN) scaling. factor is the context extension factor (e.g. 4.0 for 4x). origMaxLen is the original maximum sequence length before scaling.
type RotaryPositionalEmbeddingOptions ¶ added in v0.2.0
type RotaryPositionalEmbeddingOptions struct {
Base float64 // Base for the inverse frequency calculation (theta parameter)
YaRN bool // Whether to apply YaRN scaling
YaRNFactor float64 // YaRN scaling factor (e.g. 4.0 for 4x context extension)
YaRNOrigML int // Original max sequence length before scaling
RotaryDimFraction float64 // Fraction of head dims to rotate (default 1.0 = all)
}
RotaryPositionalEmbeddingOptions holds configuration options for RotaryPositionalEmbedding layers.
type TokenEmbedding ¶
TokenEmbedding converts token IDs into dense vector representations.
func NewTokenEmbedding ¶
func NewTokenEmbedding[T tensor.Numeric](engine compute.Engine[T], vocabSize, embeddingDim int, options ...TokenEmbeddingOption[T]) (*TokenEmbedding[T], error)
NewTokenEmbedding creates a new TokenEmbedding layer. vocabSize: The size of the vocabulary (number of unique tokens). embeddingDim: The dimension of the embedding vectors.
func NewTokenEmbeddingFromParam ¶ added in v0.2.0
func NewTokenEmbeddingFromParam[T tensor.Numeric](engine compute.Engine[T], embeddingTable *graph.Parameter[T]) (*TokenEmbedding[T], error)
NewTokenEmbeddingFromParam creates a new TokenEmbedding layer from an existing embedding table.
func (*TokenEmbedding[T]) Attributes ¶ added in v0.2.1
func (te *TokenEmbedding[T]) Attributes() map[string]interface{}
Attributes returns the attributes of the TokenEmbedding layer.
func (*TokenEmbedding[T]) Backward ¶
func (te *TokenEmbedding[T]) Backward(ctx context.Context, mode types.BackwardMode, outputGradient *tensor.TensorNumeric[T], _ ...*tensor.TensorNumeric[T]) ([]*tensor.TensorNumeric[T], error)
Backward computes the gradients for the embedding table.
func (*TokenEmbedding[T]) Forward ¶
func (te *TokenEmbedding[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)
Forward performs the embedding lookup. Input: A tensor of token IDs (T type). Output: A tensor of embedding vectors (T type).
func (*TokenEmbedding[T]) OpType ¶ added in v0.2.1
func (te *TokenEmbedding[T]) OpType() string
OpType returns the operation type of the TokenEmbedding layer.
func (*TokenEmbedding[T]) OutputShape ¶
func (te *TokenEmbedding[T]) OutputShape() []int
OutputShape returns the output shape of the embedding layer.
func (*TokenEmbedding[T]) Parameters ¶
func (te *TokenEmbedding[T]) Parameters() []*graph.Parameter[T]
Parameters returns the trainable embedding table.
type TokenEmbeddingOption ¶ added in v0.2.0
type TokenEmbeddingOption[T tensor.Numeric] func(*TokenEmbeddingOptions[T])
TokenEmbeddingOption is a functional option for configuring TokenEmbedding layers.
func WithTokenEmbeddingInitializer ¶ added in v0.2.0
func WithTokenEmbeddingInitializer[T tensor.Numeric](initializer components.WeightInitializer[T]) TokenEmbeddingOption[T]
WithTokenEmbeddingInitializer sets a custom weight initializer for the embedding table.
type TokenEmbeddingOptions ¶ added in v0.2.0
type TokenEmbeddingOptions[T tensor.Numeric] struct { Initializer components.WeightInitializer[T] }
TokenEmbeddingOptions holds configuration options for TokenEmbedding layers.