voice

package

v1.0.0 Latest Latest Go to latest Published: Jun 21, 2026 License: Apache-2.0 Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/NB-Agent/ok

Links

Open Source Insights

Documentation ¶

Overview ¶

Package voice provides speech-to-text and text-to-speech engines.

Audio I/O interface: AudioRecorder and AudioPlayer define the platform-independent contract. The default implementation delegates to the recorder/player in stt.go (os/exec, no CGo needed).

Package voice provides the full voice interaction loop for OK. It combines STT (Whisper.cpp), TTS (Piper), and the Agent into a complete speak-listen-respond cycle.

Package voice provides speech-to-text and text-to-speech engines for voice-based interaction with OK. It wraps Whisper.cpp (STT) and Piper TTS as external processes — no CGo required, pure Go build stays intact.

Architecture:

User speaks → STT (Whisper.cpp) → text → Agent processes → response text
→ TTS (Piper) → audio → User hears

Package voice provides speech interaction with OK. This file registers the voice builtin tool so the agent can speak and listen. Registration happens in boot.go, not via init(), to avoid circular imports.

Index ¶

func AgentVoiceLoop(ctx context.Context, eng *Engine, ag *agent.Agent) error
func ModelDir() string
func WriteWAV(w io.Writer, samples []int16, sampleRate int) error
type AudioPlayer
type AudioRecorder
- func NewRecorder() AudioRecorder
type Engine
- func NewEngine(lang string) *Engine
type Recorder
- func (r *Recorder) Record(ctx context.Context, duration time.Duration) ([]byte, error)
type STT
- func NewSTT(modelDir, lang string) *STT
- func (s *STT) DetectLanguage(audio []byte) string
- func (s *STT) Transcribe(ctx context.Context, audio []byte) (string, error)
type TTS
- func NewTTS(modelDir, lang string) *TTS
- func (t *TTS) Speak(ctx context.Context, text string) error
- func (t *TTS) Synthesize(ctx context.Context, text string) ([]byte, error)
type Tool

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func AgentVoiceLoop ¶

func AgentVoiceLoop(ctx context.Context, eng *Engine, ag *agent.Agent) error

AgentVoiceLoop runs a complete voice interaction with the agent.

func ModelDir ¶

func ModelDir() string

ModelDir returns the directory where voice models are stored.

func WriteWAV ¶

func WriteWAV(w io.Writer, samples []int16, sampleRate int) error

WriteWAV writes PCM audio data as a WAV file.

Types ¶

type AudioPlayer ¶

type AudioPlayer interface {
	Play(ctx context.Context, samples []byte) error
}

AudioPlayer plays raw PCM audio data to the speaker.

type AudioRecorder ¶

type AudioRecorder interface {
	Record(ctx context.Context, duration time.Duration) ([]byte, error)
}

AudioRecorder records audio from the microphone.

func NewRecorder ¶

func NewRecorder() AudioRecorder

NewRecorder returns the platform's best available audio recorder.

type Engine ¶

type Engine struct {
	// contains filtered or unexported fields
}

Engine manages a voice interaction session.

func NewEngine ¶

func NewEngine(lang string) *Engine

NewEngine creates a voice engine for the given language.

func (*Engine) ListenAndRespond ¶

func (e *Engine) ListenAndRespond(ctx context.Context) (string, error)

ListenAndRespond performs one complete voice interaction turn.

func (*Engine) SetLanguage ¶

func (e *Engine) SetLanguage(lang string)

SetLanguage switches the voice engine to a different language. Must be called with e.mu held, OR externally when no other goroutine is concurrently accessing stt/tts.

func (*Engine) SpeakOutput ¶

func (e *Engine) SpeakOutput(ctx context.Context, text string) error

SpeakOutput speaks text to the user.

type Recorder ¶

type Recorder struct{}

Recorder captures audio from the microphone.

func (*Recorder) Record ¶

func (r *Recorder) Record(ctx context.Context, duration time.Duration) ([]byte, error)

Record captures audio for the given duration and returns WAV bytes.

type STT ¶

type STT struct {
	// contains filtered or unexported fields
}

STT wraps Whisper.cpp for speech recognition.

func NewSTT ¶

func NewSTT(modelDir, lang string) *STT

NewSTT creates a speech recognizer using the given whisper model.

func (*STT) DetectLanguage ¶

func (s *STT) DetectLanguage(audio []byte) string

DetectLanguage detects the language of audio.

func (*STT) Transcribe ¶

func (s *STT) Transcribe(ctx context.Context, audio []byte) (string, error)

Transcribe converts audio data (WAV) to text.

type TTS ¶

type TTS struct {
	// contains filtered or unexported fields
}

TTS wraps Piper for speech synthesis.

func NewTTS ¶

func NewTTS(modelDir, lang string) *TTS

NewTTS creates a speech synthesizer using the given Piper voice model.

func (*TTS) Speak ¶

func (t *TTS) Speak(ctx context.Context, text string) error

Speak converts text to audio and plays it.

func (*TTS) Synthesize ¶

func (t *TTS) Synthesize(ctx context.Context, text string) ([]byte, error)

Synthesize converts text to WAV audio bytes (doesn't play).

type Tool ¶

type Tool struct {
	Engine *Engine
}

Tool implements the voice interaction builtin tool.

func (*Tool) Description ¶

func (v *Tool) Description() string

func (*Tool) Execute ¶

func (v *Tool) Execute(ctx context.Context, args json.RawMessage) (string, error)

func (*Tool) Name ¶

func (v *Tool) Name() string

func (*Tool) ReadOnly ¶

func (v *Tool) ReadOnly() bool

func (*Tool) Schema ¶

func (v *Tool) Schema() json.RawMessage

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL