autowhisper

package
v0.14.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2026 License: GPL-3.0 Imports: 9 Imported by: 0

Documentation

Overview

Package autowhisper provides a simple cross-OS API over whisper.cpp Go bindings.

It automatically converts WAV input to 16 kHz mono and runs transcription without requiring users to set C_INCLUDE_PATH or LIBRARY_PATH. Consumers only need to provide a path to a whisper GGML model and a WAV file path.

Example:

text, err := autowhisper.TranscribeFile("/path/to/ggml-tiny.bin", "/path/to/audio.wav", autowhisper.Options{ Language: "auto" })
if err != nil { log.Fatal(err) }
fmt.Println(text)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GPUDiscrete

func GPUDiscrete() bool

GPUDiscrete returns true if a discrete GPU is being used for inference. On Windows or Linux with Vulkan, this distinguishes between discrete GPUs (NVIDIA, AMD Radeon, Intel Arc) and integrated GPUs (Intel UHD/Iris, AMD APU graphics). On other platforms (macOS), this returns false; callers should handle macOS specially since Metal provides good performance even on integrated GPUs.

func GPUEnabled

func GPUEnabled() bool

GPUEnabled returns true if GPU acceleration is being used for inference. On Windows or Linux with Vulkan support, this is true if a Vulkan GPU is available. On macOS, Metal is always used (handled by the whisper.cpp library internally). On Linux or Windows without Vulkan, this returns false (CPU-only).

func ProcessorDescription

func ProcessorDescription() string

ProcessorDescription returns a string describing the processor being used for whisper. If GPU acceleration is enabled, it returns the GPU device description. If running on CPU, it returns CPU info with OS, architecture, and core count. The result is cached since it involves expensive system calls (e.g., system_profiler on macOS).

func TranscribeWithModel

func TranscribeWithModel(m *Model, pcm []int16, inSampleRate, inChannels int, opts Options) (string, error)

TranscribeWithModel transcribes PCM16 audio using a pre-loaded model. This is more efficient when transcribing multiple audio samples as it avoids reloading the model each time.

Types

type GPUDeviceInfo

type GPUDeviceInfo = whisper.GPUDeviceInfo

GPUDeviceInfo re-exports the GPU device information type.

type GPUInfo

type GPUInfo = whisper.GPUInfo

GPUInfo re-exports the GPU information type.

func GetGPUInfo

func GetGPUInfo() GPUInfo

GetGPUInfo returns detailed information about GPU acceleration status and devices. This includes all available GPU devices, their memory, and which device is selected.

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model wraps a loaded whisper model for reuse across transcriptions.

func LoadModelFromBytes

func LoadModelFromBytes(data []byte) (*Model, error)

LoadModelFromBytes loads a whisper model from bytes for reuse.

func (*Model) Close

func (m *Model) Close() error

Close releases the model resources.

type Options

type Options struct {
	// Language to use for speech recognition. Use "auto" to auto-detect (default).
	Language string
	// Translate to English if supported by model.
	Translate bool
	// Number of threads to use. If 0, uses runtime.NumCPU().
	Threads int
	// Enable word-level splitting for more granular segments.
	SplitOnWord bool
	// Initial system prompt to bias decoding.
	InitialPrompt string
	// Enable token timestamps (may reduce speed).
	TokenTimestamps bool
	// Max tokens per segment (0 = no limit).
	MaxTokensPerSegment uint
	// RealtimeFactor is the ratio of transcription time to audio duration from benchmarking.
	// A value < 0.05 (20x+ realtime) indicates fast hardware suitable for beam search.
	// A value of 0 means unknown/not benchmarked.
	RealtimeFactor float64
}

Options configures the transcription behavior.

type Transcriber

type Transcriber struct {
	// contains filtered or unexported fields
}

Transcriber accumulates audio during PTT and transcribes on Stop(). This replaces the previous streaming approach which was wasteful: it processed 3s windows every 500ms during PTT but discarded all intermediate results and reprocessed ALL audio on PTT release anyway.

func NewTranscriber

func NewTranscriber(m *Model, modelMu *sync.Mutex, opts Options) *Transcriber

NewTranscriber creates a new transcriber for accumulating audio. The modelMu mutex is used to serialize access to the whisper model.

func (*Transcriber) AddSamples

func (t *Transcriber) AddSamples(samples []int16)

AddSamples adds new audio samples to the buffer. This should be called from the audio capture callback.

func (*Transcriber) Stop

func (t *Transcriber) Stop() (text string, audioDuration time.Duration)

Stop ends the recording session and returns the final transcription along with the duration of the recorded audio.

func (*Transcriber) StopWithAudio

func (t *Transcriber) StopWithAudio() (text string, audioDuration time.Duration, audio []float32)

StopWithAudio ends the recording session and returns the final transcription, audio duration, and the raw audio samples (as float32, 16kHz mono). This is useful for evaluation modes that need to re-process the audio.

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL