audio

package
v1.26.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 27, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package audio provides audio-related neural network layers.

Stability: alpha

Package audio provides audio-related neural network layers.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type WhisperEncoder

type WhisperEncoder[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

WhisperEncoder implements a Whisper-style audio encoder with a 2-layer Conv1D frontend (stride 2 for temporal downsampling) followed by N transformer encoder blocks (self-attention + FFN + layer norm).

Input shape: [batch, num_mels, T_frames] Output shape: [T_downsampled, hidden_dim]

func NewWhisperEncoder

func NewWhisperEncoder[T tensor.Numeric](
	name string,
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	cfg WhisperEncoderConfig,
) (*WhisperEncoder[T], error)

NewWhisperEncoder creates a new WhisperEncoder.

func (*WhisperEncoder[T]) Attributes

func (e *WhisperEncoder[T]) Attributes() map[string]interface{}

func (*WhisperEncoder[T]) Backward

func (*WhisperEncoder[T]) Forward

func (e *WhisperEncoder[T]) Forward(ctx context.Context, inputs ...*tensor.TensorNumeric[T]) (*tensor.TensorNumeric[T], error)

Forward runs the Whisper encoder. Input: [batch, num_mels, T_frames] Output: [T_downsampled, hidden_dim]

func (*WhisperEncoder[T]) OpType

func (e *WhisperEncoder[T]) OpType() string

func (*WhisperEncoder[T]) OutputShape

func (e *WhisperEncoder[T]) OutputShape() []int

func (*WhisperEncoder[T]) Parameters

func (e *WhisperEncoder[T]) Parameters() []*graph.Parameter[T]

Parameters returns all trainable parameters from the encoder.

type WhisperEncoderConfig

type WhisperEncoderConfig struct {
	NumMels    int // Number of mel channels (input channels for conv frontend).
	HiddenDim  int // Hidden dimension throughout the encoder.
	NumHeads   int // Number of attention heads per transformer block.
	NumLayers  int // Number of transformer encoder blocks.
	KernelSize int // Kernel size for the conv1d frontend layers.
}

WhisperEncoderConfig holds configuration for a WhisperEncoder.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL