Documentation
¶
Overview ¶
Package voice provides speech-to-text and text-to-speech engines.
Audio I/O interface: AudioRecorder and AudioPlayer define the platform-independent contract. The default implementation delegates to the recorder/player in stt.go (os/exec, no CGo needed).
Package voice provides the full voice interaction loop for OK. It combines STT (Whisper.cpp), TTS (Piper), and the Agent into a complete speak-listen-respond cycle.
Package voice provides speech-to-text and text-to-speech engines for voice-based interaction with OK. It wraps Whisper.cpp (STT) and Piper TTS as external processes — no CGo required, pure Go build stays intact.
Architecture:
User speaks → STT (Whisper.cpp) → text → Agent processes → response text → TTS (Piper) → audio → User hears
Package voice provides speech interaction with OK. This file registers the voice builtin tool so the agent can speak and listen. Registration happens in boot.go, not via init(), to avoid circular imports.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AgentVoiceLoop ¶
AgentVoiceLoop runs a complete voice interaction with the agent.
Types ¶
type AudioPlayer ¶
AudioPlayer plays raw PCM audio data to the speaker.
type AudioRecorder ¶
type AudioRecorder interface {
Record(ctx context.Context, duration time.Duration) ([]byte, error)
}
AudioRecorder records audio from the microphone.
func NewRecorder ¶
func NewRecorder() AudioRecorder
NewRecorder returns the platform's best available audio recorder.
type Engine ¶
type Engine struct {
// contains filtered or unexported fields
}
Engine manages a voice interaction session.
func (*Engine) ListenAndRespond ¶
ListenAndRespond performs one complete voice interaction turn.
func (*Engine) SetLanguage ¶
SetLanguage switches the voice engine to a different language. Must be called with e.mu held, OR externally when no other goroutine is concurrently accessing stt/tts.
type STT ¶
type STT struct {
// contains filtered or unexported fields
}
STT wraps Whisper.cpp for speech recognition.
func (*STT) DetectLanguage ¶
DetectLanguage detects the language of audio.
type TTS ¶
type TTS struct {
// contains filtered or unexported fields
}
TTS wraps Piper for speech synthesis.