Documentation
¶
Overview ¶
Package audio provides audio-related neural network layers.
Stability: alpha
Package audio provides audio-related neural network layers.
Index ¶
- type WhisperEncoder
- func (e *WhisperEncoder[T]) Attributes() map[string]interface{}
- func (e *WhisperEncoder[T]) Backward(_ context.Context, _ types.BackwardMode, _ *tensor.TensorNumeric[T], ...) ([]*tensor.TensorNumeric[T], error)
- func (e *WhisperEncoder[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)
- func (e *WhisperEncoder[T]) OpType() string
- func (e *WhisperEncoder[T]) OutputShape() []int
- func (e *WhisperEncoder[T]) Parameters() []*graph.Parameter[T]
- type WhisperEncoderConfig
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type WhisperEncoder ¶
WhisperEncoder implements a Whisper-style audio encoder with a 2-layer Conv1D frontend (stride 2 for temporal downsampling) followed by N transformer encoder blocks (self-attention + FFN + layer norm).
Input shape: [batch, num_mels, T_frames] Output shape: [T_downsampled, hidden_dim]
func NewWhisperEncoder ¶
func NewWhisperEncoder[T tensor.Numeric]( name string, engine compute.Engine[T], ops numeric.Arithmetic[T], cfg WhisperEncoderConfig, ) (*WhisperEncoder[T], error)
NewWhisperEncoder creates a new WhisperEncoder.
func (*WhisperEncoder[T]) Attributes ¶
func (e *WhisperEncoder[T]) Attributes() map[string]interface{}
func (*WhisperEncoder[T]) Backward ¶
func (e *WhisperEncoder[T]) Backward(_ context.Context, _ types.BackwardMode, _ *tensor.TensorNumeric[T], _ ...*tensor.TensorNumeric[T]) ([]*tensor.TensorNumeric[T], error)
func (*WhisperEncoder[T]) Forward ¶
func (e *WhisperEncoder[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)
Forward runs the Whisper encoder. Input: [batch, num_mels, T_frames] Output: [T_downsampled, hidden_dim]
func (*WhisperEncoder[T]) OpType ¶
func (e *WhisperEncoder[T]) OpType() string
func (*WhisperEncoder[T]) OutputShape ¶
func (e *WhisperEncoder[T]) OutputShape() []int
func (*WhisperEncoder[T]) Parameters ¶
func (e *WhisperEncoder[T]) Parameters() []*graph.Parameter[T]
Parameters returns all trainable parameters from the encoder.
type WhisperEncoderConfig ¶
type WhisperEncoderConfig struct {
NumMels int // Number of mel channels (input channels for conv frontend).
HiddenDim int // Hidden dimension throughout the encoder.
NumHeads int // Number of attention heads per transformer block.
NumLayers int // Number of transformer encoder blocks.
KernelSize int // Kernel size for the conv1d frontend layers.
}
WhisperEncoderConfig holds configuration for a WhisperEncoder.