audio

package

v1.26.0 Latest Latest Go to latest Published: Mar 27, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package audio provides audio-related neural network layers.

Stability: alpha

Package audio provides audio-related neural network layers.

Index ¶

type WhisperEncoder
- func NewWhisperEncoder[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], ...) (*WhisperEncoder[T], error)
type WhisperEncoderConfig

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type WhisperEncoder ¶

type WhisperEncoder[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

WhisperEncoder implements a Whisper-style audio encoder with a 2-layer Conv1D frontend (stride 2 for temporal downsampling) followed by N transformer encoder blocks (self-attention + FFN + layer norm).

Input shape: [batch, num_mels, T_frames] Output shape: [T_downsampled, hidden_dim]

func NewWhisperEncoder ¶

func NewWhisperEncoder[T tensor.Numeric](
	name string,
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	cfg WhisperEncoderConfig,
) (*WhisperEncoder[T], error)

NewWhisperEncoder creates a new WhisperEncoder.

func (*WhisperEncoder[T]) Attributes ¶

func (e *WhisperEncoder[T]) Attributes() map[string]interface{}

func (*WhisperEncoder[T]) Backward ¶

func (e *WhisperEncoder[T]) Backward(_ context.Context, _ types.BackwardMode, _ *tensor.TensorNumeric[T], _ ...*tensor.TensorNumeric[T]) ([]*tensor.TensorNumeric[T], error)

func (*WhisperEncoder[T]) Forward ¶

func (e *WhisperEncoder[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)

Forward runs the Whisper encoder. Input: [batch, num_mels, T_frames] Output: [T_downsampled, hidden_dim]

func (*WhisperEncoder[T]) OpType ¶

func (e *WhisperEncoder[T]) OpType() string

func (*WhisperEncoder[T]) OutputShape ¶

func (e *WhisperEncoder[T]) OutputShape() []int

func (*WhisperEncoder[T]) Parameters ¶

func (e *WhisperEncoder[T]) Parameters() []*graph.Parameter[T]

Parameters returns all trainable parameters from the encoder.

type WhisperEncoderConfig ¶

type WhisperEncoderConfig struct {
	NumMels    int // Number of mel channels (input channels for conv frontend).
	HiddenDim  int // Hidden dimension throughout the encoder.
	NumHeads   int // Number of attention heads per transformer block.
	NumLayers  int // Number of transformer encoder blocks.
	KernelSize int // Kernel size for the conv1d frontend layers.
}

WhisperEncoderConfig holds configuration for a WhisperEncoder.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL