autowhisper

package

v0.14.3 Latest Latest Go to latest Published: May 8, 2026 License: GPL-3.0 Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/mmp/vice

Links

Open Source Insights

Documentation ¶

Overview ¶

Package autowhisper provides a simple cross-OS API over whisper.cpp Go bindings.

It automatically converts WAV input to 16 kHz mono and runs transcription without requiring users to set C_INCLUDE_PATH or LIBRARY_PATH. Consumers only need to provide a path to a whisper GGML model and a WAV file path.

Example:

text, err := autowhisper.TranscribeFile("/path/to/ggml-tiny.bin", "/path/to/audio.wav", autowhisper.Options{ Language: "auto" })
if err != nil { log.Fatal(err) }
fmt.Println(text)

Index ¶

func GPUDiscrete() bool
func GPUEnabled() bool
func ProcessorDescription() string
func TranscribeWithModel(m *Model, pcm []int16, inSampleRate, inChannels int, opts Options) (string, error)
type GPUDeviceInfo
type GPUInfo
- func GetGPUInfo() GPUInfo
type Model
- func LoadModelFromBytes(data []byte) (*Model, error)
- func (m *Model) Close() error
type Options
type Transcriber
- func NewTranscriber(m *Model, modelMu *sync.Mutex, opts Options) *Transcriber

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func GPUDiscrete ¶

func GPUDiscrete() bool

GPUDiscrete returns true if a discrete GPU is being used for inference. On Windows or Linux with Vulkan, this distinguishes between discrete GPUs (NVIDIA, AMD Radeon, Intel Arc) and integrated GPUs (Intel UHD/Iris, AMD APU graphics). On other platforms (macOS), this returns false; callers should handle macOS specially since Metal provides good performance even on integrated GPUs.

func GPUEnabled ¶

func GPUEnabled() bool

GPUEnabled returns true if GPU acceleration is being used for inference. On Windows or Linux with Vulkan support, this is true if a Vulkan GPU is available. On macOS, Metal is always used (handled by the whisper.cpp library internally). On Linux or Windows without Vulkan, this returns false (CPU-only).

func ProcessorDescription ¶

func ProcessorDescription() string

ProcessorDescription returns a string describing the processor being used for whisper. If GPU acceleration is enabled, it returns the GPU device description. If running on CPU, it returns CPU info with OS, architecture, and core count. The result is cached since it involves expensive system calls (e.g., system_profiler on macOS).

func TranscribeWithModel ¶

func TranscribeWithModel(m *Model, pcm []int16, inSampleRate, inChannels int, opts Options) (string, error)

TranscribeWithModel transcribes PCM16 audio using a pre-loaded model. This is more efficient when transcribing multiple audio samples as it avoids reloading the model each time.

Types ¶

type GPUDeviceInfo ¶

type GPUDeviceInfo = whisper.GPUDeviceInfo

GPUDeviceInfo re-exports the GPU device information type.

type GPUInfo ¶

type GPUInfo = whisper.GPUInfo

GPUInfo re-exports the GPU information type.

func GetGPUInfo ¶

func GetGPUInfo() GPUInfo

GetGPUInfo returns detailed information about GPU acceleration status and devices. This includes all available GPU devices, their memory, and which device is selected.

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

Model wraps a loaded whisper model for reuse across transcriptions.

func LoadModelFromBytes ¶

func LoadModelFromBytes(data []byte) (*Model, error)

LoadModelFromBytes loads a whisper model from bytes for reuse.

func (*Model) Close ¶

func (m *Model) Close() error

Close releases the model resources.

type Options ¶

type Options struct {
	// Language to use for speech recognition. Use "auto" to auto-detect (default).
	Language string
	// Translate to English if supported by model.
	Translate bool
	// Number of threads to use. If 0, uses runtime.NumCPU().
	Threads int
	// Enable word-level splitting for more granular segments.
	SplitOnWord bool
	// Initial system prompt to bias decoding.
	InitialPrompt string
	// Enable token timestamps (may reduce speed).
	TokenTimestamps bool
	// Max tokens per segment (0 = no limit).
	MaxTokensPerSegment uint
	// RealtimeFactor is the ratio of transcription time to audio duration from benchmarking.
	// A value < 0.05 (20x+ realtime) indicates fast hardware suitable for beam search.
	// A value of 0 means unknown/not benchmarked.
	RealtimeFactor float64
}

Options configures the transcription behavior.

type Transcriber ¶

type Transcriber struct {
	// contains filtered or unexported fields
}

Transcriber accumulates audio during PTT and transcribes on Stop(). This replaces the previous streaming approach which was wasteful: it processed 3s windows every 500ms during PTT but discarded all intermediate results and reprocessed ALL audio on PTT release anyway.

func NewTranscriber ¶

func NewTranscriber(m *Model, modelMu *sync.Mutex, opts Options) *Transcriber

NewTranscriber creates a new transcriber for accumulating audio. The modelMu mutex is used to serialize access to the whisper model.

func (*Transcriber) AddSamples ¶

func (t *Transcriber) AddSamples(samples []int16)

AddSamples adds new audio samples to the buffer. This should be called from the audio capture callback.

func (*Transcriber) Stop ¶

func (t *Transcriber) Stop() (text string, audioDuration time.Duration)

Stop ends the recording session and returns the final transcription along with the duration of the recorded audio.

func (*Transcriber) StopWithAudio ¶

func (t *Transcriber) StopWithAudio() (text string, audioDuration time.Duration, audio []float32)

StopWithAudio ends the recording session and returns the final transcription, audio duration, and the raw audio samples (as float32, 16kHz mono). This is useful for evaluation modes that need to re-process the audio.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
internal
whisper
whisperlow

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL