voice

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2026 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package voice provides voice processing capabilities for omniagent.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	// Enabled indicates whether voice processing is enabled.
	Enabled bool
	// ResponseMode controls when to respond with voice: "auto", "always", "never".
	// "auto" responds with voice when the user sends a voice message.
	ResponseMode string
	// STT configures speech-to-text.
	STT STTConfig
	// TTS configures text-to-speech.
	TTS TTSConfig
}

Config configures voice processing.

type Processor

type Processor struct {
	// contains filtered or unexported fields
}

Processor handles voice transcription and synthesis using OmniVoice interfaces.

func New

func New(config Config, logger *slog.Logger) (*Processor, error)

New creates a new voice processor with the configured providers.

func (*Processor) Close

func (p *Processor) Close() error

Close releases provider resources.

func (*Processor) ResponseMode

func (p *Processor) ResponseMode() string

ResponseMode returns the voice response mode.

func (*Processor) SynthesizeSpeech

func (p *Processor) SynthesizeSpeech(ctx context.Context, text string) ([]byte, string, error)

SynthesizeSpeech converts text to audio using the configured TTS provider. Returns audio bytes and MIME type.

func (*Processor) TranscribeAudio

func (p *Processor) TranscribeAudio(ctx context.Context, audio []byte, mimeType string) (string, error)

TranscribeAudio converts audio to text using the configured STT provider.

type STTConfig

type STTConfig struct {
	// Provider is the STT provider name (e.g., "deepgram").
	Provider string
	// APIKey is the provider API key.
	APIKey string //nolint:gosec // G117: APIKey loaded from config file
	// Model is the provider-specific model identifier.
	Model string
	// Language is the BCP-47 language code. Empty for auto-detection.
	Language string
}

STTConfig configures the speech-to-text provider.

type TTSConfig

type TTSConfig struct {
	// Provider is the TTS provider name (e.g., "deepgram").
	Provider string
	// APIKey is the provider API key.
	APIKey string //nolint:gosec // G117: APIKey loaded from config file
	// Model is the provider-specific model identifier.
	Model string
	// VoiceID is the provider-specific voice identifier.
	VoiceID string
}

TTSConfig configures the text-to-speech provider.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL