tts

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 24, 2026 License: MIT Imports: 3 Imported by: 0

Documentation

Overview

Package tts provides a unified interface for Text-to-Speech providers.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrNoAvailableProvider is returned when no provider is available.
	ErrNoAvailableProvider = errors.New("tts: no available provider")

	// ErrVoiceNotFound is returned when a voice ID is not found.
	ErrVoiceNotFound = errors.New("tts: voice not found")

	// ErrInvalidConfig is returned when the synthesis config is invalid.
	ErrInvalidConfig = errors.New("tts: invalid configuration")

	// ErrRateLimited is returned when the provider rate limits the request.
	ErrRateLimited = errors.New("tts: rate limited")

	// ErrQuotaExceeded is returned when the provider quota is exceeded.
	ErrQuotaExceeded = errors.New("tts: quota exceeded")

	// ErrStreamClosed is returned when attempting to use a closed stream.
	ErrStreamClosed = errors.New("tts: stream closed")
)

Functions

This section is empty.

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client provides a unified interface across multiple TTS providers.

func NewClient

func NewClient(providers ...Provider) *Client

NewClient creates a new TTS client with the specified providers.

func (*Client) Provider

func (c *Client) Provider(name string) (Provider, bool)

Provider returns a specific provider by name.

func (*Client) SetFallbacks

func (c *Client) SetFallbacks(names ...string)

SetFallbacks sets the fallback provider order.

func (*Client) SetPrimary

func (c *Client) SetPrimary(name string)

SetPrimary sets the primary provider by name.

func (*Client) Synthesize

func (c *Client) Synthesize(ctx context.Context, text string, config SynthesisConfig) (*SynthesisResult, error)

Synthesize uses the primary provider with automatic fallback.

func (*Client) SynthesizeStream

func (c *Client) SynthesizeStream(ctx context.Context, text string, config SynthesisConfig) (<-chan StreamChunk, error)

SynthesizeStream uses the primary provider with automatic fallback.

type Provider

type Provider interface {
	// Name returns the provider name.
	Name() string

	// Synthesize converts text to speech and returns audio data.
	Synthesize(ctx context.Context, text string, config SynthesisConfig) (*SynthesisResult, error)

	// SynthesizeStream converts text to speech with streaming output.
	SynthesizeStream(ctx context.Context, text string, config SynthesisConfig) (<-chan StreamChunk, error)

	// ListVoices returns available voices from this provider.
	ListVoices(ctx context.Context) ([]Voice, error)

	// GetVoice returns a specific voice by ID.
	GetVoice(ctx context.Context, voiceID string) (*Voice, error)
}

Provider defines the interface for TTS providers.

type StreamChunk

type StreamChunk struct {
	// Audio is a chunk of audio data.
	Audio []byte

	// IsFinal indicates if this is the last chunk.
	IsFinal bool

	// Error contains any error that occurred during streaming.
	Error error
}

StreamChunk represents a chunk of streaming audio.

type StreamingProvider

type StreamingProvider interface {
	Provider

	// SynthesizeFromReader reads text from a reader and streams audio output.
	// Useful for streaming LLM output directly to TTS.
	SynthesizeFromReader(ctx context.Context, reader io.Reader, config SynthesisConfig) (<-chan StreamChunk, error)
}

StreamingProvider extends Provider with input streaming support.

type SynthesisConfig

type SynthesisConfig struct {
	// VoiceID is the voice to use for synthesis.
	VoiceID string

	// Model is the provider-specific model identifier (optional).
	Model string

	// OutputFormat specifies the audio format ("mp3", "pcm", "wav", "opus").
	OutputFormat string

	// SampleRate is the audio sample rate in Hz (e.g., 22050, 44100).
	SampleRate int

	// Speed is the speech speed multiplier (1.0 = normal).
	Speed float64

	// Pitch adjusts the voice pitch (-1.0 to 1.0, 0 = normal).
	Pitch float64

	// Stability controls voice consistency (0.0 to 1.0, provider-specific).
	Stability float64

	// SimilarityBoost enhances voice similarity (0.0 to 1.0, provider-specific).
	SimilarityBoost float64
}

SynthesisConfig configures a TTS synthesis request.

type SynthesisResult

type SynthesisResult struct {
	// Audio is the synthesized audio data.
	Audio []byte

	// Format is the audio format of the result.
	Format string

	// SampleRate is the sample rate of the audio.
	SampleRate int

	// DurationMs is the duration of the audio in milliseconds.
	DurationMs int

	// CharacterCount is the number of characters processed.
	CharacterCount int
}

SynthesisResult contains the result of a TTS synthesis.

type Voice

type Voice struct {
	// ID is the provider-specific voice identifier.
	ID string

	// Name is a human-readable name for the voice.
	Name string

	// Language is the BCP-47 language code (e.g., "en-US").
	Language string

	// Gender is the voice gender ("male", "female", "neutral").
	Gender string

	// Provider is the name of the TTS provider.
	Provider string

	// Metadata contains provider-specific additional information.
	Metadata map[string]any
}

Voice represents a voice configuration for TTS.

Directories

Path Synopsis
Package providertest provides conformance tests for TTS provider implementations.
Package providertest provides conformance tests for TTS provider implementations.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL