voice

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 21, 2026 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Overview

Package voice provides speech-to-text and text-to-speech engines.

Audio I/O interface: AudioRecorder and AudioPlayer define the platform-independent contract. The default implementation delegates to the recorder/player in stt.go (os/exec, no CGo needed).

Package voice provides the full voice interaction loop for OK. It combines STT (Whisper.cpp), TTS (Piper), and the Agent into a complete speak-listen-respond cycle.

Package voice provides speech-to-text and text-to-speech engines for voice-based interaction with OK. It wraps Whisper.cpp (STT) and Piper TTS as external processes — no CGo required, pure Go build stays intact.

Architecture:

User speaks → STT (Whisper.cpp) → text → Agent processes → response text
→ TTS (Piper) → audio → User hears

Package voice provides speech interaction with OK. This file registers the voice builtin tool so the agent can speak and listen. Registration happens in boot.go, not via init(), to avoid circular imports.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AgentVoiceLoop

func AgentVoiceLoop(ctx context.Context, eng *Engine, ag *agent.Agent) error

AgentVoiceLoop runs a complete voice interaction with the agent.

func ModelDir

func ModelDir() string

ModelDir returns the directory where voice models are stored.

func WriteWAV

func WriteWAV(w io.Writer, samples []int16, sampleRate int) error

WriteWAV writes PCM audio data as a WAV file.

Types

type AudioPlayer

type AudioPlayer interface {
	Play(ctx context.Context, samples []byte) error
}

AudioPlayer plays raw PCM audio data to the speaker.

type AudioRecorder

type AudioRecorder interface {
	Record(ctx context.Context, duration time.Duration) ([]byte, error)
}

AudioRecorder records audio from the microphone.

func NewRecorder

func NewRecorder() AudioRecorder

NewRecorder returns the platform's best available audio recorder.

type Engine

type Engine struct {
	// contains filtered or unexported fields
}

Engine manages a voice interaction session.

func NewEngine

func NewEngine(lang string) *Engine

NewEngine creates a voice engine for the given language.

func (*Engine) ListenAndRespond

func (e *Engine) ListenAndRespond(ctx context.Context) (string, error)

ListenAndRespond performs one complete voice interaction turn.

func (*Engine) SetLanguage

func (e *Engine) SetLanguage(lang string)

SetLanguage switches the voice engine to a different language. Must be called with e.mu held, OR externally when no other goroutine is concurrently accessing stt/tts.

func (*Engine) SpeakOutput

func (e *Engine) SpeakOutput(ctx context.Context, text string) error

SpeakOutput speaks text to the user.

type Recorder

type Recorder struct{}

Recorder captures audio from the microphone.

func (*Recorder) Record

func (r *Recorder) Record(ctx context.Context, duration time.Duration) ([]byte, error)

Record captures audio for the given duration and returns WAV bytes.

type STT

type STT struct {
	// contains filtered or unexported fields
}

STT wraps Whisper.cpp for speech recognition.

func NewSTT

func NewSTT(modelDir, lang string) *STT

NewSTT creates a speech recognizer using the given whisper model.

func (*STT) DetectLanguage

func (s *STT) DetectLanguage(audio []byte) string

DetectLanguage detects the language of audio.

func (*STT) Transcribe

func (s *STT) Transcribe(ctx context.Context, audio []byte) (string, error)

Transcribe converts audio data (WAV) to text.

type TTS

type TTS struct {
	// contains filtered or unexported fields
}

TTS wraps Piper for speech synthesis.

func NewTTS

func NewTTS(modelDir, lang string) *TTS

NewTTS creates a speech synthesizer using the given Piper voice model.

func (*TTS) Speak

func (t *TTS) Speak(ctx context.Context, text string) error

Speak converts text to audio and plays it.

func (*TTS) Synthesize

func (t *TTS) Synthesize(ctx context.Context, text string) ([]byte, error)

Synthesize converts text to WAV audio bytes (doesn't play).

type Tool

type Tool struct {
	Engine *Engine
}

Tool implements the voice interaction builtin tool.

func (*Tool) Description

func (v *Tool) Description() string

func (*Tool) Execute

func (v *Tool) Execute(ctx context.Context, args json.RawMessage) (string, error)

func (*Tool) Name

func (v *Tool) Name() string

func (*Tool) ReadOnly

func (v *Tool) ReadOnly() bool

func (*Tool) Schema

func (v *Tool) Schema() json.RawMessage

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL