embeddings

package
v1.26.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 27, 2026 License: Apache-2.0 Imports: 10 Imported by: 1

Documentation

Overview

Package embeddings provides neural network embedding layers.

Stability: stable

Package embeddings provides neural network embedding layers for the Zerfoo ML framework.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type RotaryPositionalEmbedding

type RotaryPositionalEmbedding[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

RotaryPositionalEmbedding applies Rotary Positional Embedding to a tensor.

func NewRotaryPositionalEmbedding

func NewRotaryPositionalEmbedding[T tensor.Numeric](
	ctx context.Context,
	engine compute.Engine[T],
	headDim int,
	seqLen int,
	options ...RotaryPositionalEmbeddingOption,
) (*RotaryPositionalEmbedding[T], error)

NewRotaryPositionalEmbedding creates a new RotaryPositionalEmbedding layer. headDim: The dimension of the head. Must be even. seqLen: The maximum sequence length this embedding will be applied to. engine: The compute engine to use for tensor operations.

func (*RotaryPositionalEmbedding[T]) AttentionScaleFactor added in v0.2.1

func (rpe *RotaryPositionalEmbedding[T]) AttentionScaleFactor() float64

AttentionScaleFactor returns the YaRN attention scaling factor. Returns 1.0 when YaRN is not enabled.

func (*RotaryPositionalEmbedding[T]) Attributes added in v0.2.1

func (rpe *RotaryPositionalEmbedding[T]) Attributes() map[string]interface{}

Attributes returns the attributes of the RotaryPositionalEmbedding layer.

func (*RotaryPositionalEmbedding[T]) Backward

Backward computes the gradients for RoPE. Shapes are derived from dOut so that a single RoPE instance can be shared across Q and K paths whose batch dimensions differ.

func (*RotaryPositionalEmbedding[T]) Forward

func (rpe *RotaryPositionalEmbedding[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)

Forward applies Rotary Positional Embedding to the input tensor.

func (*RotaryPositionalEmbedding[T]) GetAngles added in v0.2.1

func (rpe *RotaryPositionalEmbedding[T]) GetAngles(offset, seqLen int) (cos, sin *tensor.TensorNumeric[T], halfRotary int, err error)

GetAngles returns the cos/sin angle tensors for the given position range, along with halfRotary. For GPU-resident tables, returns non-owning views. This is used by the fused QK norm+RoPE kernel during decode.

func (*RotaryPositionalEmbedding[T]) GetAnglesGPU added in v0.2.1

func (rpe *RotaryPositionalEmbedding[T]) GetAnglesGPU(counterPtr unsafe.Pointer, seqLen int, stream unsafe.Pointer) (
	cos, sin *tensor.TensorNumeric[T], halfRotary int, err error,
)

GetAnglesGPU returns cos/sin angle tensors selected by a GPU-resident counter, avoiding CPU-side offset computation. This enables CUDA graph capture of the decode loop by keeping all position-dependent reads on GPU. counterPtr is a device pointer to an int32 position counter (from GPUKVCache). stream is the CUDA stream (unsafe.Pointer to cudaStream_t) for kernel launch. seqLen is the number of positions to select (1 for decode).

func (*RotaryPositionalEmbedding[T]) OpType added in v0.2.1

func (rpe *RotaryPositionalEmbedding[T]) OpType() string

OpType returns the operation type of the RotaryPositionalEmbedding layer.

func (*RotaryPositionalEmbedding[T]) OutputShape

func (rpe *RotaryPositionalEmbedding[T]) OutputShape() []int

OutputShape returns the output shape of the RoPE layer.

func (*RotaryPositionalEmbedding[T]) Parameters

func (rpe *RotaryPositionalEmbedding[T]) Parameters() []*graph.Parameter[T]

Parameters returns no trainable parameters for RoPE.

func (*RotaryPositionalEmbedding[T]) RotaryDim added in v0.2.1

func (rpe *RotaryPositionalEmbedding[T]) RotaryDim() int

RotaryDim returns the number of dimensions that receive rotation.

func (*RotaryPositionalEmbedding[T]) Scale added in v0.2.0

func (rpe *RotaryPositionalEmbedding[T]) Scale(ctx context.Context, factor float64) error

Scale scales the positional embeddings by a given factor.

func (*RotaryPositionalEmbedding[T]) SetPositionOffset added in v0.2.1

func (rpe *RotaryPositionalEmbedding[T]) SetPositionOffset(offset int)

SetPositionOffset sets the position offset for the next Forward call. During autoregressive decode, call this with the current cache sequence length so that the new token is rotated at the correct absolute position instead of always position 0.

type RotaryPositionalEmbeddingOption added in v0.2.0

type RotaryPositionalEmbeddingOption func(*RotaryPositionalEmbeddingOptions)

RotaryPositionalEmbeddingOption is a functional option for configuring RotaryPositionalEmbedding layers.

func WithRotaryBase added in v0.2.0

func WithRotaryBase(base float64) RotaryPositionalEmbeddingOption

WithRotaryBase sets the base (theta) parameter for the inverse frequency calculation.

func WithRotaryDimFraction added in v0.2.1

func WithRotaryDimFraction(fraction float64) RotaryPositionalEmbeddingOption

WithRotaryDimFraction sets the fraction of head dimensions that receive rotation. Default is 1.0 (all dimensions rotated). Phi-4 uses 0.75 for partial RoPE.

func WithYaRNScaling added in v0.2.1

func WithYaRNScaling(factor float64, origMaxLen int) RotaryPositionalEmbeddingOption

WithYaRNScaling enables YaRN (Yet another RoPE extensioN) scaling. factor is the context extension factor (e.g. 4.0 for 4x). origMaxLen is the original maximum sequence length before scaling.

type RotaryPositionalEmbeddingOptions added in v0.2.0

type RotaryPositionalEmbeddingOptions struct {
	Base              float64 // Base for the inverse frequency calculation (theta parameter)
	YaRN              bool    // Whether to apply YaRN scaling
	YaRNFactor        float64 // YaRN scaling factor (e.g. 4.0 for 4x context extension)
	YaRNOrigML        int     // Original max sequence length before scaling
	RotaryDimFraction float64 // Fraction of head dims to rotate (default 1.0 = all)
}

RotaryPositionalEmbeddingOptions holds configuration options for RotaryPositionalEmbedding layers.

type TokenEmbedding

type TokenEmbedding[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

TokenEmbedding converts token IDs into dense vector representations.

func NewTokenEmbedding

func NewTokenEmbedding[T tensor.Numeric](engine compute.Engine[T], vocabSize, embeddingDim int, options ...TokenEmbeddingOption[T]) (*TokenEmbedding[T], error)

NewTokenEmbedding creates a new TokenEmbedding layer. vocabSize: The size of the vocabulary (number of unique tokens). embeddingDim: The dimension of the embedding vectors.

func NewTokenEmbeddingFromParam added in v0.2.0

func NewTokenEmbeddingFromParam[T tensor.Numeric](engine compute.Engine[T], embeddingTable *graph.Parameter[T]) (*TokenEmbedding[T], error)

NewTokenEmbeddingFromParam creates a new TokenEmbedding layer from an existing embedding table.

func (*TokenEmbedding[T]) Attributes added in v0.2.1

func (te *TokenEmbedding[T]) Attributes() map[string]interface{}

Attributes returns the attributes of the TokenEmbedding layer.

func (*TokenEmbedding[T]) Backward

func (te *TokenEmbedding[T]) Backward(ctx context.Context, mode types.BackwardMode, outputGradient *tensor.TensorNumeric[T], _ ...*tensor.TensorNumeric[T]) ([]*tensor.TensorNumeric[T], error)

Backward computes the gradients for the embedding table.

func (*TokenEmbedding[T]) Forward

func (te *TokenEmbedding[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)

Forward performs the embedding lookup. Input: A tensor of token IDs (T type). Output: A tensor of embedding vectors (T type).

func (*TokenEmbedding[T]) OpType added in v0.2.1

func (te *TokenEmbedding[T]) OpType() string

OpType returns the operation type of the TokenEmbedding layer.

func (*TokenEmbedding[T]) OutputShape

func (te *TokenEmbedding[T]) OutputShape() []int

OutputShape returns the output shape of the embedding layer.

func (*TokenEmbedding[T]) Parameters

func (te *TokenEmbedding[T]) Parameters() []*graph.Parameter[T]

Parameters returns the trainable embedding table.

type TokenEmbeddingOption added in v0.2.0

type TokenEmbeddingOption[T tensor.Numeric] func(*TokenEmbeddingOptions[T])

TokenEmbeddingOption is a functional option for configuring TokenEmbedding layers.

func WithTokenEmbeddingInitializer added in v0.2.0

func WithTokenEmbeddingInitializer[T tensor.Numeric](initializer components.WeightInitializer[T]) TokenEmbeddingOption[T]

WithTokenEmbeddingInitializer sets a custom weight initializer for the embedding table.

type TokenEmbeddingOptions added in v0.2.0

type TokenEmbeddingOptions[T tensor.Numeric] struct {
	Initializer components.WeightInitializer[T]
}

TokenEmbeddingOptions holds configuration options for TokenEmbedding layers.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL