elevenlabs

package module
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 24, 2026 License: MIT Imports: 21 Imported by: 0

README

ElevenLabs Go SDK

Build Status Lint Status Go Report Card Docs License

Go SDK for the ElevenLabs API.

Features

  • 🗣️ Text-to-Speech: Convert text to realistic speech with multiple voices and models
  • 📝 Speech-to-Text: Transcribe audio with speaker diarization support
  • 🎙️ Speech-to-Speech: Voice conversion - transform speech to a different voice
  • 🔊 Sound Effects: Generate sound effects from text descriptions
  • 🎨 Voice Design: Create custom AI voices with specific characteristics
  • 🎵 Music Composition: Generate music from text prompts
  • 🎙️ Audio Isolation: Extract vocals/speech from audio
  • ⏱️ Forced Alignment: Get word-level timestamps for audio
  • 💬 Text-to-Dialogue: Generate multi-speaker conversations
  • 🌍 Dubbing: Translate and dub video/audio content
  • 📚 Projects: Manage long-form audio content (audiobooks, podcasts)
  • 📖 Pronunciation Dictionaries: Control pronunciation of specific terms
Real-Time Services
  • WebSocket TTS: Low-latency text-to-speech streaming for real-time voice synthesis
  • WebSocket STT: Real-time speech-to-text with partial results
  • 📞 Twilio Integration: Phone call integration for conversational AI agents
  • 📱 Phone Numbers: Manage phone numbers for voice agents

Installation

go get github.com/agentplexus/go-elevenlabs

Quick Start

Basic Text-to-Speech
package main

import (
    "context"
    "io"
    "log"
    "os"

    elevenlabs "github.com/agentplexus/go-elevenlabs"
)

func main() {
    // Create client (uses ELEVENLABS_API_KEY env var)
    client, err := elevenlabs.NewClient()
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    // List available voices
    voices, err := client.Voices().List(ctx)
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("Found %d voices", len(voices))

    // Generate speech
    if len(voices) > 0 {
        audio, err := client.TextToSpeech().Simple(ctx,
            voices[0].VoiceID,
            "Hello from the ElevenLabs Go SDK!")
        if err != nil {
            log.Fatal(err)
        }

        // Save to file
        f, _ := os.Create("hello.mp3")
        defer f.Close()
        io.Copy(f, audio)
    }
}
With Custom Options
client, err := elevenlabs.NewClient(
    elevenlabs.WithAPIKey("your-api-key"),
    elevenlabs.WithTimeout(5 * time.Minute),
)

Services

Text-to-Speech
// Simple generation
audio, err := client.TextToSpeech().Simple(ctx, voiceID, "Hello world")

// With full options
resp, err := client.TextToSpeech().Generate(ctx, &elevenlabs.TTSRequest{
    VoiceID: "21m00Tcm4TlvDq8ikWAM",
    Text:    "Hello with custom settings!",
    ModelID: "eleven_multilingual_v2",
    VoiceSettings: &elevenlabs.VoiceSettings{
        Stability:       0.6,
        SimilarityBoost: 0.8,
        Style:           0.1,
        SpeakerBoost:    true,
    },
    OutputFormat: "mp3_44100_192",
})
Speech-to-Text
// Transcribe from URL
result, err := client.SpeechToText().TranscribeURL(ctx, "https://example.com/audio.mp3")
fmt.Printf("Text: %s\n", result.Text)
fmt.Printf("Language: %s\n", result.LanguageCode)

// With speaker diarization
result, err := client.SpeechToText().TranscribeWithDiarization(ctx, audioURL)
for _, word := range result.Words {
    fmt.Printf("[%s] %s (%.2fs - %.2fs)\n", word.Speaker, word.Text, word.Start, word.End)
}
Sound Effects
// Simple sound effect
audio, err := client.SoundEffects().Simple(ctx, "thunder and rain storm")

// With options
sfx, err := client.SoundEffects().Generate(ctx, &elevenlabs.SoundEffectRequest{
    Text:            "spaceship engine humming",
    DurationSeconds: 10,
    PromptInfluence: 0.5,
})
Music Composition
// Generate music from prompt
resp, err := client.Music().Generate(ctx, &elevenlabs.MusicRequest{
    Prompt:     "upbeat electronic music for a tech video",
    DurationMs: 30000,
})

// Instrumental only
audio, err := client.Music().GenerateInstrumental(ctx, "calm piano melody", 60000)

// Generate with composition plan for fine-grained control
plan, _ := client.Music().GeneratePlan(ctx, &elevenlabs.CompositionPlanRequest{
    Prompt:     "pop song about summer",
    DurationMs: 180000,
})
resp, err := client.Music().GenerateDetailed(ctx, &elevenlabs.MusicDetailedRequest{
    CompositionPlan: plan,
})

// Separate stems (vocals, drums, bass, etc.)
f, _ := os.Open("song.mp3")
stems, err := client.Music().SeparateStems(ctx, &elevenlabs.StemSeparationRequest{
    File:     f,
    Filename: "song.mp3",
})
Audio Isolation
// Extract vocals from audio file
f, _ := os.Open("mixed_audio.mp3")
isolated, err := client.AudioIsolation().IsolateFile(ctx, f, "mixed_audio.mp3")
Forced Alignment
// Get word-level timestamps
f, _ := os.Open("speech.mp3")
result, err := client.ForcedAlignment().AlignFile(ctx, f, "speech.mp3",
    "The text that was spoken in the audio")

for _, word := range result.Words {
    fmt.Printf("%s: %.2fs - %.2fs\n", word.Text, word.Start, word.End)
}
Text-to-Dialogue
// Generate multi-speaker dialogue
audio, err := client.TextToDialogue().Simple(ctx, []elevenlabs.DialogueInput{
    {Text: "Hello, how are you?", VoiceID: "voice1"},
    {Text: "I'm doing great, thanks!", VoiceID: "voice2"},
})
Voice Design
// Generate a custom voice
resp, err := client.VoiceDesign().GeneratePreview(ctx, &elevenlabs.VoiceDesignRequest{
    Gender:         elevenlabs.VoiceGenderFemale,
    Age:            elevenlabs.VoiceAgeYoung,
    Accent:         elevenlabs.VoiceAccentAmerican,
    AccentStrength: 1.0,
    Text:           "This is a preview of the generated voice. It should be at least one hundred characters long for best results.",
})
Pronunciation Dictionaries
// Create from a map
dict, err := client.Pronunciation().CreateFromMap(ctx, "Tech Terms", map[string]string{
    "API":     "A P I",
    "kubectl": "kube control",
    "nginx":   "engine X",
})

// Create from JSON file
dict, err := client.Pronunciation().CreateFromJSON(ctx, "Terms", "pronunciation.json")
Dubbing
// Create dubbing job
dub, err := client.Dubbing().Create(ctx, &elevenlabs.DubbingRequest{
    SourceURL:      "https://example.com/video.mp4",
    TargetLanguage: "es",
    Name:           "Video - Spanish",
})

// Check status
status, err := client.Dubbing().GetStatus(ctx, dub.DubbingID)
Projects (Studio)
// Create a project for long-form content
project, err := client.Projects().Create(ctx, &elevenlabs.CreateProjectRequest{
    Name:                    "My Audiobook",
    DefaultModelID:          "eleven_multilingual_v2",
    DefaultParagraphVoiceID: voiceID,
})

// Convert to audio
err = client.Projects().Convert(ctx, project.ProjectID)
Speech-to-Speech (Voice Conversion)
// Convert speech from one voice to another
f, _ := os.Open("input.mp3")
resp, err := client.SpeechToSpeech().Convert(ctx, &elevenlabs.SpeechToSpeechRequest{
    VoiceID: targetVoiceID,
    Audio:   f,
})

// Simple conversion
output, err := client.SpeechToSpeech().Simple(ctx, targetVoiceID, audioReader)
WebSocket TTS (Real-Time Streaming)
// Connect for low-latency TTS (ideal for LLM output)
conn, err := client.WebSocketTTS().Connect(ctx, voiceID, &elevenlabs.WebSocketTTSOptions{
    ModelID:                  "eleven_turbo_v2_5",
    OutputFormat:             "pcm_16000",
    OptimizeStreamingLatency: 3,
})
defer conn.Close()

// Stream text as it arrives (e.g., from LLM)
for text := range llmOutputStream {
    conn.SendText(text)
}
conn.Flush()

// Receive audio chunks
for audio := range conn.Audio() {
    // Play or save audio chunks
}
WebSocket STT (Real-Time Transcription)
// Connect for live transcription
conn, err := client.WebSocketSTT().Connect(ctx, &elevenlabs.WebSocketSTTOptions{
    SampleRate:     16000,
    EnablePartials: true,
})
defer conn.Close()

// Send audio chunks
go func() {
    for audioChunk := range microphoneInput {
        conn.SendAudio(audioChunk)
    }
    conn.EndStream()
}()

// Receive transcripts
for transcript := range conn.Transcripts() {
    if transcript.IsFinal {
        fmt.Println("Final:", transcript.Text)
    } else {
        fmt.Println("Partial:", transcript.Text)
    }
}
Twilio Integration (Phone Calls)
// Register incoming Twilio call with an ElevenLabs agent
resp, err := client.Twilio().RegisterCall(ctx, &elevenlabs.TwilioRegisterCallRequest{
    AgentID: "your-agent-id",
})
// Return resp.TwiML to Twilio webhook

// Make outbound call
call, err := client.Twilio().OutboundCall(ctx, &elevenlabs.TwilioOutboundCallRequest{
    AgentID:            "your-agent-id",
    AgentPhoneNumberID: "phone-number-id",
    ToNumber:           "+1234567890",
})

// List phone numbers
numbers, err := client.PhoneNumbers().List(ctx)

Examples

See the examples/ directory for runnable examples:

Example Description
basic/ Common SDK operations
websocket-tts/ Real-time TTS streaming for LLM integration
websocket-stt/ Live transcription with partial results
speech-to-speech/ Voice conversion
twilio/ Phone call integration with Twilio
ttsscript/ Multi-voice script authoring
retryhttp/ Retry-capable HTTP transport
export ELEVENLABS_API_KEY="your-api-key"
go run examples/basic/main.go

Error Handling

audio, err := client.TextToSpeech().Simple(ctx, voiceID, text)
if err != nil {
    if elevenlabs.IsRateLimitError(err) {
        log.Println("Rate limited, waiting...")
        time.Sleep(time.Minute)
    } else if elevenlabs.IsUnauthorizedError(err) {
        log.Fatal("Invalid API key")
    } else if elevenlabs.IsNotFoundError(err) {
        log.Fatal("Voice not found")
    } else {
        log.Fatalf("Error: %v", err)
    }
}

Environment Variables

  • ELEVENLABS_API_KEY: Your ElevenLabs API key (used automatically if not provided via WithAPIKey)

Documentation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Documentation

Overview

Package elevenlabs provides a Go client for the ElevenLabs API.

The client wraps the ogen-generated API client with a higher-level interface that handles authentication and provides convenient methods for common operations.

Index

Constants

View Source
const DefaultBaseURL = "https://api.elevenlabs.io"

DefaultBaseURL is the default ElevenLabs API base URL.

View Source
const DefaultModelID = "eleven_multilingual_v2"

DefaultModelID is the recommended model for text-to-speech.

View Source
const Version = "0.3.0"

Version is the SDK version.

Variables

View Source
var (
	// ErrNoAPIKey is returned when no API key is provided.
	ErrNoAPIKey = errors.New("elevenlabs: API key is required")

	// ErrEmptyText is returned when text is empty.
	ErrEmptyText = errors.New("elevenlabs: text cannot be empty")

	// ErrEmptyVoiceID is returned when voice ID is empty.
	ErrEmptyVoiceID = errors.New("elevenlabs: voice ID is required")

	// ErrInvalidStability is returned when stability is out of range.
	ErrInvalidStability = errors.New("elevenlabs: stability must be between 0.0 and 1.0")

	// ErrInvalidSimilarityBoost is returned when similarity_boost is out of range.
	ErrInvalidSimilarityBoost = errors.New("elevenlabs: similarity_boost must be between 0.0 and 1.0")

	// ErrInvalidStyle is returned when style is out of range.
	ErrInvalidStyle = errors.New("elevenlabs: style must be between 0.0 and 1.0")

	// ErrInvalidSpeed is returned when speed is out of range.
	ErrInvalidSpeed = errors.New("elevenlabs: speed must be between 0.25 and 4.0")
)

Common errors

View Source
var ValidOutputFormats = map[string]bool{

	"mp3_22050_32":  true,
	"mp3_24000_48":  true,
	"mp3_44100_32":  true,
	"mp3_44100_64":  true,
	"mp3_44100_96":  true,
	"mp3_44100_128": true,
	"mp3_44100_192": true,

	"pcm_8000":  true,
	"pcm_16000": true,
	"pcm_22050": true,
	"pcm_24000": true,
	"pcm_32000": true,
	"pcm_44100": true,
	"pcm_48000": true,

	"ulaw_8000": true,
	"alaw_8000": true,

	"opus_48000_32":  true,
	"opus_48000_64":  true,
	"opus_48000_96":  true,
	"opus_48000_128": true,
	"opus_48000_192": true,
}

ValidOutputFormats lists the valid audio output formats. For highest quality, use pcm_48000 (lossless) or mp3_44100_192.

Functions

func IsForbiddenError added in v0.4.0

func IsForbiddenError(err error) bool

IsForbiddenError returns true if the error is a 403 Forbidden error.

func IsNotFoundError

func IsNotFoundError(err error) bool

IsNotFoundError returns true if the error is a 404 Not Found error.

func IsRateLimitError

func IsRateLimitError(err error) bool

IsRateLimitError returns true if the error is a 429 Too Many Requests error.

func IsUnauthorizedError

func IsUnauthorizedError(err error) bool

IsUnauthorizedError returns true if the error is a 401 Unauthorized error.

func PCMBytesToWAV added in v0.4.0

func PCMBytesToWAV(pcm []byte, sampleRate int) ([]byte, error)

PCMBytesToWAV wraps raw PCM bytes in a WAV header.

func PCMToWAV added in v0.4.0

func PCMToWAV(pcmData io.Reader, sampleRate int) ([]byte, error)

PCMToWAV wraps raw PCM audio data in a WAV header. ElevenLabs PCM is 16-bit signed little-endian mono.

Usage:

resp, _ := client.TextToSpeech().Generate(ctx, &TTSRequest{
    VoiceID:      voiceID,
    Text:         "Hello",
    OutputFormat: "pcm_44100",
})
wavData, _ := elevenlabs.PCMToWAV(resp.Audio, 44100)

func ParsePCMSampleRate added in v0.4.0

func ParsePCMSampleRate(format string) (int, error)

ParsePCMSampleRate extracts the sample rate from a PCM format string. Example: "pcm_44100" returns 44100.

Types

type APIError

type APIError struct {
	StatusCode int
	Message    string
	Detail     string
}

APIError represents an error returned by the ElevenLabs API.

func ParseAPIError added in v0.4.0

func ParseAPIError(err error) *APIError

ParseAPIError extracts API error details from an error returned by the SDK. It handles ogen's UnexpectedStatusCodeError and parses the response body to extract the ElevenLabs error message.

Usage:

resp, err := client.TextToSpeech().Generate(ctx, req)
if err != nil {
    if apiErr := elevenlabs.ParseAPIError(err); apiErr != nil {
        fmt.Printf("Status: %d, Message: %s\n", apiErr.StatusCode, apiErr.Message)
    }
    log.Fatal(err)
}

func (*APIError) Error

func (e *APIError) Error() string

Error implements the error interface.

type AlignmentCharacter

type AlignmentCharacter struct {
	// Text is the character text.
	Text string

	// Start is the start time in seconds.
	Start float64

	// End is the end time in seconds.
	End float64
}

AlignmentCharacter represents a character with timing information.

type AlignmentWord

type AlignmentWord struct {
	// Text is the word text.
	Text string

	// Start is the start time in seconds.
	Start float64

	// End is the end time in seconds.
	End float64

	// Loss is the confidence score for this word.
	Loss float64
}

AlignmentWord represents a word with timing information.

type AudioIsolationRequest

type AudioIsolationRequest struct {
	// Audio is the audio file to process (required).
	Audio io.Reader

	// Filename is the name of the file (required).
	Filename string
}

AudioIsolationRequest contains options for audio isolation.

type AudioIsolationService

type AudioIsolationService struct {
	// contains filtered or unexported fields
}

AudioIsolationService handles audio isolation (vocal/speech extraction).

func (*AudioIsolationService) Isolate

Isolate extracts vocals/speech from audio, removing background noise. Returns an io.Reader containing the isolated audio.

func (*AudioIsolationService) IsolateFile

func (s *AudioIsolationService) IsolateFile(ctx context.Context, audio io.Reader, filename string) (io.Reader, error)

IsolateFile is a convenience method to isolate vocals from an audio file.

func (*AudioIsolationService) IsolateStream

func (s *AudioIsolationService) IsolateStream(ctx context.Context, req *AudioIsolationRequest) (io.Reader, error)

IsolateStream extracts vocals/speech from audio with streaming output. Returns an io.Reader for streaming the isolated audio.

type Chapter

type Chapter struct {
	// ChapterID is the unique identifier.
	ChapterID string

	// Name is the chapter name.
	Name string

	// ConversionProgress is the conversion progress percentage.
	ConversionProgress float64

	// State is the current state.
	State string

	// LastConversionError is the last conversion error if any.
	LastConversionError string
}

Chapter represents a chapter within a project.

type ChapterSnapshot

type ChapterSnapshot struct {
	// ChapterSnapshotID is the unique identifier.
	ChapterSnapshotID string

	// ProjectID is the parent project ID.
	ProjectID string

	// ChapterID is the chapter ID.
	ChapterID string

	// Name is the snapshot name.
	Name string

	// CreatedAt is when the snapshot was created.
	CreatedAt time.Time
}

ChapterSnapshot represents a snapshot of a chapter.

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client is the main ElevenLabs client for interacting with the API.

func NewClient

func NewClient(opts ...Option) (*Client, error)

NewClient creates a new ElevenLabs client with the given options.

func (*Client) API

func (c *Client) API() *api.Client

API returns the underlying ogen-generated API client for advanced usage. Use this when you need access to API endpoints not covered by the high-level wrapper methods.

func (*Client) AudioIsolation

func (c *Client) AudioIsolation() *AudioIsolationService

AudioIsolation returns the audio isolation service.

func (*Client) Dubbing

func (c *Client) Dubbing() *DubbingService

Dubbing returns the dubbing service.

func (*Client) ForcedAlignment

func (c *Client) ForcedAlignment() *ForcedAlignmentService

ForcedAlignment returns the forced alignment service.

func (*Client) History

func (c *Client) History() *HistoryService

History returns the history service.

func (*Client) Models

func (c *Client) Models() *ModelsService

Models returns the models service.

func (*Client) Music

func (c *Client) Music() *MusicService

Music returns the music composition service.

func (*Client) PhoneNumbers

func (c *Client) PhoneNumbers() *PhoneNumberService

PhoneNumbers returns the phone number management service.

func (*Client) Projects

func (c *Client) Projects() *ProjectsService

Projects returns the projects (Studio) service.

func (*Client) Pronunciation

func (c *Client) Pronunciation() *PronunciationService

Pronunciation returns the pronunciation dictionary service.

func (*Client) SoundEffects

func (c *Client) SoundEffects() *SoundEffectsService

SoundEffects returns the sound effects service.

func (*Client) SpeechToSpeech

func (c *Client) SpeechToSpeech() *SpeechToSpeechService

SpeechToSpeech returns the speech-to-speech voice conversion service.

func (*Client) SpeechToText

func (c *Client) SpeechToText() *SpeechToTextService

SpeechToText returns the speech-to-text transcription service.

func (*Client) TextToDialogue

func (c *Client) TextToDialogue() *TextToDialogueService

TextToDialogue returns the text-to-dialogue service.

func (*Client) TextToSpeech

func (c *Client) TextToSpeech() *TextToSpeechService

TextToSpeech returns the text-to-speech service.

func (*Client) Twilio

func (c *Client) Twilio() *TwilioService

Twilio returns the Twilio phone integration service.

func (*Client) User

func (c *Client) User() *UserService

User returns the user service.

func (*Client) VoiceDesign

func (c *Client) VoiceDesign() *VoiceDesignService

VoiceDesign returns the voice design/generation service.

func (*Client) Voices

func (c *Client) Voices() *VoicesService

Voices returns the voices service.

func (*Client) WebSocketSTT

func (c *Client) WebSocketSTT() *WebSocketSTTService

WebSocketSTT returns the WebSocket speech-to-text service for real-time transcription.

func (*Client) WebSocketTTS

func (c *Client) WebSocketTTS() *WebSocketTTSService

WebSocketTTS returns the WebSocket text-to-speech service for real-time streaming.

type CompositionPlan

type CompositionPlan struct {
	// PositiveGlobalStyles are styles that should be present throughout the song.
	PositiveGlobalStyles []string

	// NegativeGlobalStyles are styles that should NOT be present in the song.
	NegativeGlobalStyles []string

	// Sections defines the structure of the song with individual sections.
	Sections []SongSection
}

CompositionPlan represents a detailed music composition plan. This can be used with GenerateDetailed for fine-grained control over music generation.

type CompositionPlanRequest

type CompositionPlanRequest struct {
	// Prompt is the text description of the music to plan.
	Prompt string

	// DurationMs is the target duration in milliseconds (3000-600000).
	DurationMs int

	// SourcePlan is an optional existing plan to use as a starting point.
	SourcePlan *CompositionPlan
}

CompositionPlanRequest contains options for generating a composition plan.

type CreateProjectRequest

type CreateProjectRequest struct {
	// Name is the project name (required).
	Name string

	// Description is an optional description.
	Description string

	// Author is an optional author name.
	Author string

	// Language is the two-letter language code (ISO 639-1).
	Language string

	// DefaultModelID is the model to use for TTS.
	DefaultModelID string

	// DefaultParagraphVoiceID is the default voice for paragraphs.
	DefaultParagraphVoiceID string

	// DefaultTitleVoiceID is the default voice for titles.
	DefaultTitleVoiceID string

	// FromURL is a URL to extract content from.
	FromURL string

	// ContentType is the content type (e.g., "Novel", "Short Story").
	ContentType string

	// Genres is a list of genres.
	Genres []string

	// QualityPreset is the output quality: "standard", "high", "ultra", "ultra lossless".
	QualityPreset string

	// AutoConvert automatically converts the project to audio.
	AutoConvert bool
}

CreateProjectRequest contains options for creating a project.

func (*CreateProjectRequest) Validate

func (r *CreateProjectRequest) Validate() error

Validate validates the create request.

type CreatePronunciationDictionaryRequest

type CreatePronunciationDictionaryRequest struct {
	// Name is the name of the dictionary (required).
	Name string

	// Description is an optional description.
	Description string

	// PLSContent is the PLS (Pronunciation Lexicon Specification) XML content.
	// Use this to provide pronunciation rules directly.
	// You can generate this from PronunciationRules using ToPLSString().
	PLSContent string

	// Rules is a convenient alternative to PLSContent.
	// If provided, it will be converted to PLS format automatically.
	// If both PLSContent and Rules are provided, PLSContent takes precedence.
	Rules PronunciationRules

	// Language is the language code for the rules (default: "en-US").
	// Only used when Rules is provided.
	Language string
}

CreatePronunciationDictionaryRequest contains options for creating a pronunciation dictionary.

type DialogueInput

type DialogueInput struct {
	// Text is the text to be spoken.
	Text string

	// VoiceID is the ID of the voice to use.
	VoiceID string
}

DialogueInput represents a single dialogue turn with text and voice.

type DialogueRequest

type DialogueRequest struct {
	// Inputs is a list of dialogue turns with text and voice pairs.
	Inputs []DialogueInput

	// ModelID is the model to use (default: eleven_multilingual_v2).
	ModelID string

	// LanguageCode is the ISO 639-1 language code (e.g., "en").
	LanguageCode string

	// Seed for deterministic generation (0-4294967295).
	Seed int
}

DialogueRequest contains options for dialogue generation.

type DialogueResponse

type DialogueResponse struct {
	// AudioBase64 is the base64-encoded audio data.
	AudioBase64 string

	// VoiceSegments contains timing info for each voice segment.
	VoiceSegments []VoiceSegment
}

DialogueResponse contains the dialogue generation result with timestamps.

type DubbingProject

type DubbingProject struct {
	// DubbingID is the unique identifier.
	DubbingID string

	// Name is the project name.
	Name string

	// Status is the current status (dubbed, dubbing, failed, cloning).
	Status string

	// TargetLanguages are the target languages for dubbing.
	TargetLanguages []string

	// SourceLanguage is the source language.
	SourceLanguage string

	// Error contains any error message if the project failed.
	Error string

	// CreatedAt is when the project was created.
	CreatedAt time.Time
}

DubbingProject represents a dubbing project.

func (*DubbingProject) IsComplete

func (p *DubbingProject) IsComplete() bool

IsComplete checks if a dubbing project is complete.

func (*DubbingProject) IsFailed

func (p *DubbingProject) IsFailed() bool

IsFailed checks if a dubbing project has failed.

func (*DubbingProject) IsProcessing

func (p *DubbingProject) IsProcessing() bool

IsProcessing checks if a dubbing project is still processing.

type DubbingRequest

type DubbingRequest struct {
	// Name is the name of the dubbing project.
	Name string

	// SourceURL is the URL of the source media (alternative to file upload).
	SourceURL string

	// File is the source media file (alternative to SourceURL).
	File io.Reader

	// SourceLanguage is the source language code (ISO 639-1).
	SourceLanguage string

	// TargetLanguage is the target language code (ISO 639-1).
	TargetLanguage string

	// NumSpeakers is the number of speakers (0 for auto-detection).
	NumSpeakers int

	// Watermark enables watermark (for free tier).
	Watermark bool

	// StartTime is the start time in seconds for dubbing.
	StartTime int

	// EndTime is the end time in seconds for dubbing.
	EndTime int

	// HighestResolution requests highest resolution output.
	HighestResolution bool

	// DropBackgroundAudio removes background audio.
	DropBackgroundAudio bool
}

DubbingRequest contains options for creating a dubbing project.

type DubbingResponse

type DubbingResponse struct {
	// DubbingID is the ID of the created project.
	DubbingID string

	// ExpectedDurationSeconds is the expected duration.
	ExpectedDurationSeconds float64
}

DubbingResponse contains the result of creating a dubbing project.

type DubbingService

type DubbingService struct {
	// contains filtered or unexported fields
}

DubbingService handles dubbing operations.

func (*DubbingService) CreateFromURL

func (s *DubbingService) CreateFromURL(ctx context.Context, req *DubbingRequest) (*DubbingResponse, error)

CreateFromURL creates a dubbing project from a URL source.

func (*DubbingService) Delete

func (s *DubbingService) Delete(ctx context.Context, dubbingID string) error

Delete deletes a dubbing project by ID.

func (*DubbingService) Get

func (s *DubbingService) Get(ctx context.Context, dubbingID string) (*DubbingProject, error)

Get returns a dubbing project metadata by ID.

func (*DubbingService) GetDubbedFile

func (s *DubbingService) GetDubbedFile(ctx context.Context, dubbingID, languageCode string) (io.Reader, error)

GetDubbedFile returns the dubbed audio/video file for a specific language.

type ForcedAlignmentRequest

type ForcedAlignmentRequest struct {
	// File is the audio file to align (required).
	File io.Reader

	// Filename is the name of the file (required).
	Filename string

	// Text is the text to align with the audio (required).
	Text string
}

ForcedAlignmentRequest contains options for forced alignment.

type ForcedAlignmentResponse

type ForcedAlignmentResponse struct {
	// Words contains word-level timing information.
	Words []AlignmentWord

	// Characters contains character-level timing information.
	Characters []AlignmentCharacter

	// Loss is the average alignment confidence score.
	Loss float64
}

ForcedAlignmentResponse contains the alignment result.

type ForcedAlignmentService

type ForcedAlignmentService struct {
	// contains filtered or unexported fields
}

ForcedAlignmentService handles forced alignment between audio and text.

func (*ForcedAlignmentService) Align

Align performs forced alignment between audio and text. This is useful for generating word-level timestamps for captions and subtitles.

func (*ForcedAlignmentService) AlignFile

func (s *ForcedAlignmentService) AlignFile(ctx context.Context, file io.Reader, filename, text string) (*ForcedAlignmentResponse, error)

AlignFile is a convenience method to align audio from a file reader with text.

type HistoryItem

type HistoryItem struct {
	// HistoryItemID is the unique identifier.
	HistoryItemID string

	// VoiceID is the ID of the voice used.
	VoiceID string

	// VoiceName is the name of the voice used.
	VoiceName string

	// VoiceCategory is the category of the voice.
	VoiceCategory string

	// ModelID is the ID of the model used.
	ModelID string

	// Text is the text that was converted to speech.
	Text string

	// State is the state of the history item.
	State string

	// Source is the source of the generation.
	Source string

	// ContentType is the content type of the audio.
	ContentType string

	// CharactersUsed is the number of characters used.
	CharactersUsed int

	// CreatedAt is when the item was created.
	CreatedAt time.Time
}

HistoryItem represents a speech generation history item.

type HistoryListOptions

type HistoryListOptions struct {
	// PageSize is the number of items per page.
	PageSize int

	// StartAfterHistoryItemID is for pagination (fetch items after this ID).
	StartAfterHistoryItemID string

	// VoiceID filters by voice ID.
	VoiceID string
}

HistoryListOptions contains options for listing history items.

type HistoryListResponse

type HistoryListResponse struct {
	// Items is the list of history items.
	Items []*HistoryItem

	// HasMore indicates if there are more items to fetch.
	HasMore bool

	// LastHistoryItemID is the ID of the last item (for pagination).
	LastHistoryItemID string
}

HistoryListResponse contains the list of history items and pagination info.

type HistoryService

type HistoryService struct {
	// contains filtered or unexported fields
}

HistoryService handles history operations.

func (*HistoryService) Delete

func (s *HistoryService) Delete(ctx context.Context, historyItemID string) error

Delete deletes a history item by ID.

func (*HistoryService) Get

func (s *HistoryService) Get(ctx context.Context, historyItemID string) (*HistoryItem, error)

Get returns a specific history item by ID.

func (*HistoryService) GetAudio

func (s *HistoryService) GetAudio(ctx context.Context, historyItemID string) (io.Reader, error)

GetAudio returns the audio for a history item.

func (*HistoryService) List

List returns a list of speech history items.

type Language

type Language struct {
	// LanguageID is the unique identifier (ISO code).
	LanguageID string

	// Name is the display name of the language.
	Name string
}

Language represents a language supported by a model.

type ListPhoneNumbersResponse

type ListPhoneNumbersResponse struct {
	PhoneNumbers []PhoneNumber `json:"phone_numbers"`
}

ListPhoneNumbersResponse is the response from listing phone numbers.

type Model

type Model struct {
	// ModelID is the unique identifier for the model.
	ModelID string

	// Name is the display name of the model.
	Name string

	// Description is the model description.
	Description string

	// CanDoTextToSpeech indicates if the model supports TTS.
	CanDoTextToSpeech bool

	// CanDoVoiceConversion indicates if the model supports voice conversion.
	CanDoVoiceConversion bool

	// CanBeFinetuned indicates if the model can be fine-tuned.
	CanBeFinetuned bool

	// CanUseStyle indicates if the model supports style settings.
	CanUseStyle bool

	// CanUseSpeakerBoost indicates if the model supports speaker boost.
	CanUseSpeakerBoost bool

	// Languages is the list of supported languages.
	Languages []*Language

	// MaxCharactersFreeUser is the max characters for free users.
	MaxCharactersFreeUser int

	// MaxCharactersSubscribedUser is the max characters for subscribed users.
	MaxCharactersSubscribedUser int

	// TokenCostFactor is the cost factor for the model.
	TokenCostFactor float64
}

Model represents an ElevenLabs model.

type ModelsService

type ModelsService struct {
	// contains filtered or unexported fields
}

ModelsService handles model operations.

func (*ModelsService) List

func (s *ModelsService) List(ctx context.Context) ([]*Model, error)

List returns all available models.

func (*ModelsService) ListTTSModels

func (s *ModelsService) ListTTSModels(ctx context.Context) ([]*Model, error)

ListTTSModels returns only models that support text-to-speech.

type MusicDetailedRequest

type MusicDetailedRequest struct {
	// Prompt is a simple text description (cannot be used with CompositionPlan).
	Prompt string

	// CompositionPlan is a detailed plan (cannot be used with Prompt).
	CompositionPlan *CompositionPlan

	// DurationMs is the length in milliseconds (only used with Prompt).
	DurationMs int

	// ForceInstrumental ensures no vocals (only used with Prompt).
	ForceInstrumental bool

	// Seed for deterministic generation.
	Seed int

	// WithTimestamps returns word timestamps in the response.
	WithTimestamps bool
}

MusicDetailedRequest contains options for detailed music generation.

type MusicDetailedResponse

type MusicDetailedResponse struct {
	// Audio is the generated music.
	Audio io.Reader

	// SongID is the unique identifier for this song.
	SongID string
}

MusicDetailedResponse contains the detailed music generation result.

type MusicRequest

type MusicRequest struct {
	// Prompt is a simple text description of the music to generate.
	// Cannot be used with CompositionPlan.
	Prompt string

	// DurationMs is the length of the song in milliseconds (3000-600000).
	// If not provided, the model will choose based on the prompt.
	DurationMs int

	// ForceInstrumental ensures the song has no vocals.
	ForceInstrumental bool

	// Seed for deterministic generation (optional).
	Seed int
}

MusicRequest contains options for music generation.

type MusicResponse

type MusicResponse struct {
	// Audio is the generated music.
	Audio io.Reader

	// SongID is the unique identifier for this song.
	SongID string
}

MusicResponse contains the music generation result.

type MusicService

type MusicService struct {
	// contains filtered or unexported fields
}

MusicService handles music composition and generation.

func (*MusicService) Generate

func (s *MusicService) Generate(ctx context.Context, req *MusicRequest) (*MusicResponse, error)

Generate creates music from a text prompt.

func (*MusicService) GenerateDetailed

func (s *MusicService) GenerateDetailed(ctx context.Context, req *MusicDetailedRequest) (*MusicDetailedResponse, error)

GenerateDetailed creates music with detailed options and metadata. Use either Prompt for simple generation or CompositionPlan for fine-grained control.

Example with prompt:

resp, err := client.Music().GenerateDetailed(ctx, &MusicDetailedRequest{
    Prompt:     "epic orchestral music",
    DurationMs: 60000,
})

Example with composition plan:

plan, _ := client.Music().GeneratePlan(ctx, &CompositionPlanRequest{Prompt: "pop song"})
resp, err := client.Music().GenerateDetailed(ctx, &MusicDetailedRequest{
    CompositionPlan: plan,
})

func (*MusicService) GenerateInstrumental

func (s *MusicService) GenerateInstrumental(ctx context.Context, prompt string, durationMs int) (io.Reader, error)

GenerateInstrumental generates instrumental music from a prompt.

func (*MusicService) GeneratePlan

GeneratePlan creates a composition plan from a text prompt. The returned plan can be modified and used with GenerateDetailed.

Example:

plan, err := client.Music().GeneratePlan(ctx, &MusicCompositionPlanRequest{
    Prompt:     "upbeat pop song about summer",
    DurationMs: 180000, // 3 minutes
})
// Modify the plan if needed
plan.Sections[0].Lines = []string{"Custom lyrics here"}
// Generate music from the plan
resp, err := client.Music().GenerateDetailed(ctx, &MusicDetailedRequest{
    CompositionPlan: plan,
})

func (*MusicService) GenerateStream

func (s *MusicService) GenerateStream(ctx context.Context, req *MusicRequest) (*MusicResponse, error)

GenerateStream creates music with streaming output.

func (*MusicService) SeparateStems

func (s *MusicService) SeparateStems(ctx context.Context, req *StemSeparationRequest) (io.Reader, error)

SeparateStems separates a song into individual stems (vocals, instruments, etc.).

Example:

f, _ := os.Open("song.mp3")
stems, err := client.Music().SeparateStems(ctx, &StemSeparationRequest{
    File:     f,
    Filename: "song.mp3",
})
// Save the separated stems (returned as a zip file)
output, _ := os.Create("stems.zip")
io.Copy(output, stems)

func (*MusicService) SeparateStemsFile

func (s *MusicService) SeparateStemsFile(ctx context.Context, filePath string) (io.Reader, error)

SeparateStemsFile is a convenience method to separate stems from a file path.

func (*MusicService) Simple

func (s *MusicService) Simple(ctx context.Context, prompt string) (io.Reader, error)

Simple generates music from a prompt with default settings.

type Option

type Option func(*clientOptions)

Option is a functional option for configuring the Client.

func WithAPIKey

func WithAPIKey(apiKey string) Option

WithAPIKey sets the API key for authentication.

func WithBaseURL

func WithBaseURL(baseURL string) Option

WithBaseURL sets the API base URL.

func WithHTTPClient

func WithHTTPClient(client *http.Client) Option

WithHTTPClient sets a custom HTTP client.

func WithTimeout

func WithTimeout(timeout time.Duration) Option

WithTimeout sets the request timeout.

type PhoneNumber

type PhoneNumber struct {
	ID          string `json:"phone_number_id"`
	PhoneNumber string `json:"phone_number"`
	Label       string `json:"label"`
	AgentID     string `json:"agent_id,omitempty"`
	Provider    string `json:"provider"` // "twilio", "sip"
	Status      string `json:"status"`
	CreatedAt   string `json:"created_at"`
}

PhoneNumber represents an ElevenLabs phone number.

type PhoneNumberService

type PhoneNumberService struct {
	// contains filtered or unexported fields
}

PhoneNumberService handles phone number management.

func (*PhoneNumberService) Delete

func (s *PhoneNumberService) Delete(ctx context.Context, phoneNumberID string) error

Delete removes a phone number from the workspace.

func (*PhoneNumberService) Get

func (s *PhoneNumberService) Get(ctx context.Context, phoneNumberID string) (*PhoneNumber, error)

Get retrieves a specific phone number by ID.

func (*PhoneNumberService) List

List lists all phone numbers in the workspace.

func (*PhoneNumberService) Update

func (s *PhoneNumberService) Update(ctx context.Context, phoneNumberID string, req *UpdatePhoneNumberRequest) (*PhoneNumber, error)

Update updates a phone number's settings.

type Project

type Project struct {
	// ProjectID is the unique identifier.
	ProjectID string

	// Name is the project name.
	Name string

	// Description is the project description.
	Description string

	// Author is the project author.
	Author string

	// Language is the two-letter language code (ISO 639-1).
	Language string

	// DefaultModelID is the default model for TTS.
	DefaultModelID string

	// DefaultParagraphVoiceID is the default voice for paragraphs.
	DefaultParagraphVoiceID string

	// DefaultTitleVoiceID is the default voice for titles.
	DefaultTitleVoiceID string

	// ContentType is the content type (e.g., "Novel", "Short Story").
	ContentType string

	// CoverImageURL is the cover image URL.
	CoverImageURL string

	// CreatedAt is the creation timestamp.
	CreatedAt time.Time

	// CanBeDownloaded indicates if the project can be downloaded.
	CanBeDownloaded bool

	// AccessLevel is the access level of the project.
	AccessLevel string
}

Project represents a Studio project.

type ProjectSnapshot

type ProjectSnapshot struct {
	// ProjectSnapshotID is the unique identifier.
	ProjectSnapshotID string

	// ProjectID is the parent project ID.
	ProjectID string

	// Name is the snapshot name.
	Name string

	// CreatedAt is when the snapshot was created.
	CreatedAt time.Time
}

ProjectSnapshot represents a snapshot of a project.

type ProjectsService

type ProjectsService struct {
	// contains filtered or unexported fields
}

ProjectsService handles Studio Projects operations. Projects (formerly known as "Studio") allow you to create long-form audio content like audiobooks, podcasts, and video course narration organized into chapters.

func (*ProjectsService) Convert

func (s *ProjectsService) Convert(ctx context.Context, projectID string) error

Convert initiates conversion of a project to audio.

func (*ProjectsService) ConvertChapter

func (s *ProjectsService) ConvertChapter(ctx context.Context, projectID, chapterID string) error

ConvertChapter initiates conversion of a chapter to audio.

func (*ProjectsService) Create

Create creates a new project.

func (*ProjectsService) Delete

func (s *ProjectsService) Delete(ctx context.Context, projectID string) error

Delete deletes a project.

func (*ProjectsService) DeleteChapter

func (s *ProjectsService) DeleteChapter(ctx context.Context, projectID, chapterID string) error

DeleteChapter deletes a chapter from a project.

func (*ProjectsService) DownloadSnapshotArchive

func (s *ProjectsService) DownloadSnapshotArchive(ctx context.Context, projectID, snapshotID string) (io.Reader, error)

DownloadSnapshotArchive downloads a project snapshot as a zip archive.

func (*ProjectsService) List

func (s *ProjectsService) List(ctx context.Context) ([]*Project, error)

List returns all projects.

func (*ProjectsService) ListChapterSnapshots

func (s *ProjectsService) ListChapterSnapshots(ctx context.Context, projectID, chapterID string) ([]*ChapterSnapshot, error)

ListChapterSnapshots returns all snapshots for a chapter.

func (*ProjectsService) ListChapters

func (s *ProjectsService) ListChapters(ctx context.Context, projectID string) ([]*Chapter, error)

ListChapters returns all chapters in a project.

func (*ProjectsService) ListSnapshots

func (s *ProjectsService) ListSnapshots(ctx context.Context, projectID string) ([]*ProjectSnapshot, error)

ListSnapshots returns all snapshots for a project.

func (*ProjectsService) StreamChapterAudio

func (s *ProjectsService) StreamChapterAudio(ctx context.Context, projectID, chapterID, snapshotID string) (io.Reader, error)

StreamChapterAudio streams audio from a chapter snapshot.

func (*ProjectsService) Update

func (s *ProjectsService) Update(ctx context.Context, projectID string, req *UpdateProjectRequest) error

Update updates a project. Note: Name, DefaultParagraphVoiceID, and DefaultTitleVoiceID are required fields.

type PronunciationDictionary

type PronunciationDictionary struct {
	// ID is the unique identifier.
	ID string

	// Name is the display name.
	Name string

	// Description is the dictionary description.
	Description string

	// LatestVersionID is the ID of the latest version.
	LatestVersionID string

	// RulesCount is the number of rules in the latest version.
	RulesCount int

	// CreatedBy is the user ID who created the dictionary.
	CreatedBy string

	// CreatedAt is when the dictionary was created.
	CreatedAt time.Time
}

PronunciationDictionary represents a pronunciation dictionary.

type PronunciationDictionaryListOptions

type PronunciationDictionaryListOptions struct {
	// PageSize is the number of items per page (max 100).
	PageSize int

	// Cursor is the pagination cursor.
	Cursor string
}

PronunciationDictionaryListOptions contains options for listing.

type PronunciationDictionaryListResponse

type PronunciationDictionaryListResponse struct {
	// Dictionaries is the list of pronunciation dictionaries.
	Dictionaries []*PronunciationDictionary

	// HasMore indicates if there are more items to fetch.
	HasMore bool

	// NextCursor is the cursor for pagination.
	NextCursor string
}

PronunciationDictionaryListResponse contains the list result.

type PronunciationRule

type PronunciationRule struct {
	// Grapheme is the text to match (required).
	Grapheme string `json:"grapheme"`

	// Alias is the replacement text (mutually exclusive with Phoneme).
	// This is the easier option - just specify what text should be read instead.
	Alias string `json:"alias,omitempty"`

	// Phoneme is the IPA pronunciation (mutually exclusive with Alias).
	// Use this for precise phonetic control.
	Phoneme string `json:"phoneme,omitempty"`
}

PronunciationRule defines how a word or phrase should be pronounced. Rules can use either an alias (text substitution) or IPA phonemes.

Example JSON:

[
  {"grapheme": "ADK", "alias": "Agent Development Kit"},
  {"grapheme": "kubectl", "alias": "kube control"},
  {"grapheme": "nginx", "phoneme": "ˈɛndʒɪnˈɛks"}
]

func (*PronunciationRule) Validate

func (r *PronunciationRule) Validate() error

Validate checks that the rule is valid.

type PronunciationRules

type PronunciationRules []PronunciationRule

PronunciationRules is a collection of pronunciation rules.

func LoadRulesFromJSON

func LoadRulesFromJSON(filename string) (PronunciationRules, error)

LoadRulesFromJSON loads pronunciation rules from a JSON file.

Example file content:

[
  {"grapheme": "ADK", "alias": "Agent Development Kit"},
  {"grapheme": "API", "alias": "A P I"},
  {"grapheme": "SQL", "alias": "sequel"}
]

func ParseRulesFromJSON

func ParseRulesFromJSON(data []byte) (PronunciationRules, error)

ParseRulesFromJSON parses pronunciation rules from JSON bytes.

func RulesFromMap

func RulesFromMap(m map[string]string) PronunciationRules

RulesFromMap creates pronunciation rules from a simple map. All entries are treated as alias substitutions.

Example:

rules := RulesFromMap(map[string]string{
    "ADK":     "Agent Development Kit",
    "kubectl": "kube control",
    "API":     "A P I",
})

func (PronunciationRules) Graphemes

func (rules PronunciationRules) Graphemes() []string

Graphemes returns a list of all graphemes (the words being defined).

func (PronunciationRules) SavePLS

func (rules PronunciationRules) SavePLS(filename, language string) error

SavePLS writes the rules to a PLS file.

func (PronunciationRules) String

func (rules PronunciationRules) String() string

String returns a human-readable summary of the rules.

func (PronunciationRules) ToPLS

func (rules PronunciationRules) ToPLS(language string) ([]byte, error)

ToPLS converts the rules to PLS (Pronunciation Lexicon Specification) XML format. This is the format required by ElevenLabs API.

func (PronunciationRules) ToPLSString

func (rules PronunciationRules) ToPLSString(language string) (string, error)

ToPLSString is a convenience method that returns the PLS as a string.

type PronunciationService

type PronunciationService struct {
	// contains filtered or unexported fields
}

PronunciationService handles pronunciation dictionary operations. Pronunciation dictionaries help ensure correct pronunciation of technical terms, names, and domain-specific vocabulary.

func (*PronunciationService) Archive

func (s *PronunciationService) Archive(ctx context.Context, dictionaryID string) error

Archive archives a pronunciation dictionary.

func (*PronunciationService) Create

Create creates a new pronunciation dictionary.

Example with rules:

dict, err := client.Pronunciation().Create(ctx, &CreatePronunciationDictionaryRequest{
    Name: "Tech Terms",
    Rules: elevenlabs.RulesFromMap(map[string]string{
        "ADK":     "Agent Development Kit",
        "kubectl": "kube control",
    }),
})

Example with PLS content:

dict, err := client.Pronunciation().Create(ctx, &CreatePronunciationDictionaryRequest{
    Name:       "Tech Terms",
    PLSContent: plsXMLString,
})

func (*PronunciationService) CreateFromJSON

func (s *PronunciationService) CreateFromJSON(ctx context.Context, name, jsonFilePath string) (*PronunciationDictionary, error)

CreateFromJSON creates a pronunciation dictionary from a JSON rules file.

Example JSON file:

[
  {"grapheme": "ADK", "alias": "Agent Development Kit"},
  {"grapheme": "kubectl", "alias": "kube control"}
]

func (*PronunciationService) CreateFromMap

func (s *PronunciationService) CreateFromMap(ctx context.Context, name string, rules map[string]string) (*PronunciationDictionary, error)

CreateFromMap creates a pronunciation dictionary from a simple map. All entries are treated as alias substitutions (text replacements).

Example:

dict, err := client.Pronunciation().CreateFromMap(ctx, "Tech Terms", map[string]string{
    "ADK":     "Agent Development Kit",
    "kubectl": "kube control",
    "API":     "A P I",
})

func (*PronunciationService) DownloadLatestPLS

func (s *PronunciationService) DownloadLatestPLS(ctx context.Context, dictionaryID string) (io.Reader, error)

DownloadLatestPLS downloads the PLS file for the latest version of a dictionary. This is a convenience method that first gets the dictionary metadata to find the latest version ID, then downloads that version.

func (*PronunciationService) Get

Get returns a pronunciation dictionary by ID.

func (*PronunciationService) GetVersionPLS

func (s *PronunciationService) GetVersionPLS(ctx context.Context, dictionaryID, versionID string) (io.Reader, error)

GetVersionPLS returns the PLS (Pronunciation Lexicon Specification) XML file for a specific version of a pronunciation dictionary.

The returned io.Reader contains the XML content that can be saved to a file or parsed directly.

Example:

pls, err := client.Pronunciation().GetVersionPLS(ctx, dictionaryID, versionID)
if err != nil {
    log.Fatal(err)
}
// Save to file
f, _ := os.Create("dictionary.pls")
io.Copy(f, pls)

func (*PronunciationService) List

List returns all pronunciation dictionaries.

func (*PronunciationService) RemoveRules

func (s *PronunciationService) RemoveRules(ctx context.Context, dictionaryID string, ruleStrings []string) error

RemoveRules removes pronunciation rules from a dictionary. The ruleStrings should be the original text strings to remove.

func (*PronunciationService) Rename

func (s *PronunciationService) Rename(ctx context.Context, dictionaryID, newName string) error

Rename renames a pronunciation dictionary.

type SIPOutboundCallRequest

type SIPOutboundCallRequest struct {
	// AgentID is the ElevenLabs agent ID to handle the call.
	AgentID string `json:"agent_id"`

	// ToNumber is the phone number to call (E.164 format).
	ToNumber string `json:"to_number"`

	// SIPTrunkID is the SIP trunk ID to use.
	SIPTrunkID string `json:"sip_trunk_id"`

	// FromNumber is the caller ID to display (must be verified).
	FromNumber string `json:"from_number,omitempty"`

	// CustomLLMExtraBody is additional data to pass to the LLM.
	CustomLLMExtraBody map[string]any `json:"custom_llm_extra_body,omitempty"`

	// DynamicVariables are variables to inject into the agent prompt.
	DynamicVariables map[string]string `json:"dynamic_variables,omitempty"`

	// FirstMessage overrides the agent's default first message.
	FirstMessage string `json:"first_message,omitempty"`

	// SystemPrompt overrides the agent's system prompt.
	SystemPrompt string `json:"system_prompt,omitempty"`
}

SIPOutboundCallRequest is the request to make an outbound call via SIP trunk.

type SIPOutboundCallResponse

type SIPOutboundCallResponse struct {
	// ConversationID is the ElevenLabs conversation ID for this call.
	ConversationID string `json:"conversation_id"`

	// Status is the initial call status.
	Status string `json:"status"`
}

SIPOutboundCallResponse is the response from making a SIP outbound call.

type STTTranscript

type STTTranscript struct {
	// Text is the transcribed text.
	Text string `json:"text"`

	// IsFinal indicates if this is a final (committed) result.
	IsFinal bool `json:"is_final"`

	// Words contains word-level timing if enabled.
	Words []STTWord `json:"words,omitempty"`

	// LanguageCode is the detected language.
	LanguageCode string `json:"language_code,omitempty"`
}

STTTranscript represents a transcription result.

type STTWord

type STTWord struct {
	Text      string  `json:"text"`
	Start     float64 `json:"start"`
	End       float64 `json:"end"`
	Type      string  `json:"type,omitempty"`       // "word" or "spacing"
	SpeakerID string  `json:"speaker_id,omitempty"` // Speaker identification
}

STTWord represents a single word with timing.

type SaveVoiceRequest

type SaveVoiceRequest struct {
	// GeneratedVoiceID from the design response.
	GeneratedVoiceID string

	// VoiceName is the name for the saved voice.
	VoiceName string

	// VoiceDescription describes the voice.
	VoiceDescription string

	// Labels are optional metadata tags.
	Labels map[string]string
}

SaveVoiceRequest contains options for saving a generated voice.

type SongSection

type SongSection struct {
	// SectionName is the name of the section (e.g., "intro", "verse", "chorus").
	SectionName string

	// DurationMs is the duration in milliseconds (3000-120000).
	DurationMs int

	// Lines are the lyrics for this section (max 200 chars per line).
	Lines []string

	// PositiveLocalStyles are styles for this specific section.
	PositiveLocalStyles []string

	// NegativeLocalStyles are styles to avoid in this section.
	NegativeLocalStyles []string
}

SongSection represents a section of a song in a composition plan.

type SoundEffectRequest

type SoundEffectRequest struct {
	// Text is the description of the sound effect to generate.
	// Examples: "car engine starting", "thunder and rain", "crowd cheering"
	Text string

	// DurationSeconds is the target duration (0.5 to 30 seconds).
	// If not set, the optimal duration will be guessed from the prompt.
	DurationSeconds float64

	// PromptInfluence controls how closely the generation follows the prompt (0.0 to 1.0).
	// Higher values = more faithful to prompt but less variation.
	// Default is 0.3.
	PromptInfluence float64

	// Loop creates a sound effect that loops smoothly.
	Loop bool

	// OutputFormat specifies the audio format (e.g., "mp3_44100_128").
	OutputFormat string
}

SoundEffectRequest contains options for generating a sound effect.

func (*SoundEffectRequest) Validate

func (r *SoundEffectRequest) Validate() error

Validate validates the sound effect request.

type SoundEffectResponse

type SoundEffectResponse struct {
	// Audio is the generated sound effect data.
	Audio io.Reader
}

SoundEffectResponse contains the generated sound effect.

type SoundEffectsService

type SoundEffectsService struct {
	// contains filtered or unexported fields
}

SoundEffectsService handles sound effect generation.

func (*SoundEffectsService) Generate

Generate creates a sound effect from a text description.

func (*SoundEffectsService) GenerateLoop

func (s *SoundEffectsService) GenerateLoop(ctx context.Context, description string, durationSeconds float64) (io.Reader, error)

GenerateLoop generates a looping sound effect.

func (*SoundEffectsService) Simple

func (s *SoundEffectsService) Simple(ctx context.Context, description string) (io.Reader, error)

Simple generates a sound effect with minimal configuration.

type SpeechToSpeechRequest

type SpeechToSpeechRequest struct {
	// VoiceID is the target voice to convert to.
	VoiceID string

	// Audio is the source audio data to convert.
	Audio io.Reader

	// AudioFilename is the filename for the audio (optional, helps with format detection).
	AudioFilename string

	// ModelID is the model to use. Defaults to "eleven_english_sts_v2".
	ModelID string

	// VoiceSettings configures the voice parameters.
	VoiceSettings *VoiceSettings

	// OutputFormat specifies the audio output format.
	// Examples: "mp3_44100_128", "pcm_16000", "pcm_22050"
	OutputFormat string

	// RemoveBackgroundNoise removes background noise from the source audio.
	RemoveBackgroundNoise bool

	// SeedAudio is optional seed audio to influence the conversion.
	SeedAudio io.Reader

	// SeedAudioFilename is the filename for the seed audio.
	SeedAudioFilename string
}

SpeechToSpeechRequest is a request to convert speech to a different voice.

func (*SpeechToSpeechRequest) Validate

func (r *SpeechToSpeechRequest) Validate() error

Validate validates the speech-to-speech request.

type SpeechToSpeechResponse

type SpeechToSpeechResponse struct {
	// Audio is the converted audio data.
	Audio io.Reader
}

SpeechToSpeechResponse contains the converted audio.

type SpeechToSpeechService

type SpeechToSpeechService struct {
	// contains filtered or unexported fields
}

SpeechToSpeechService handles voice conversion operations.

func (*SpeechToSpeechService) Convert

Convert converts speech from one voice to another.

func (*SpeechToSpeechService) ConvertStream

ConvertStream converts speech with streaming response.

func (*SpeechToSpeechService) Simple

func (s *SpeechToSpeechService) Simple(ctx context.Context, voiceID string, audio io.Reader) (io.Reader, error)

Simple is a convenience method for basic voice conversion.

type SpeechToTextService

type SpeechToTextService struct {
	// contains filtered or unexported fields
}

SpeechToTextService handles speech-to-text transcription.

func (*SpeechToTextService) Transcribe

Transcribe transcribes audio to text.

func (*SpeechToTextService) TranscribeURL

func (s *SpeechToTextService) TranscribeURL(ctx context.Context, url string) (*TranscriptionResponse, error)

TranscribeURL transcribes audio from a URL.

func (*SpeechToTextService) TranscribeWithDiarization

func (s *SpeechToTextService) TranscribeWithDiarization(ctx context.Context, url string) (*TranscriptionResponse, error)

TranscribeWithDiarization transcribes audio with speaker identification.

type StemSeparationRequest

type StemSeparationRequest struct {
	// File is the audio file to separate.
	File io.Reader

	// Filename is the name of the file.
	Filename string

	// StemVariation specifies which stem variation to use.
	// Options: "two_stems_v1" (vocals + music), "six_stems_v1" (vocals, drums, bass, other - default)
	StemVariation string
}

StemSeparationRequest contains options for stem separation.

type Subscription

type Subscription struct {
	// Tier is the subscription tier (e.g., "free", "starter", "creator").
	Tier string

	// Status is the subscription status.
	Status string

	// CharacterCount is the number of characters used.
	CharacterCount int

	// CharacterLimit is the maximum characters allowed.
	CharacterLimit int

	// VoiceLimit is the maximum number of voices allowed.
	VoiceLimit int

	// VoiceSlotsUsed is the number of voice slots used.
	VoiceSlotsUsed int

	// CanUseInstantVoiceCloning indicates if instant cloning is available.
	CanUseInstantVoiceCloning bool

	// CanUseProfessionalVoiceCloning indicates if pro cloning is available.
	CanUseProfessionalVoiceCloning bool

	// NextCharacterResetUnix is when characters reset (Unix timestamp).
	NextCharacterResetUnix int64
}

Subscription represents a user's subscription details.

func (*Subscription) CharactersRemaining

func (s *Subscription) CharactersRemaining() int

CharactersRemaining returns the number of characters remaining.

type TTSAlignment

type TTSAlignment struct {
	Characters     []string  `json:"characters"`
	CharacterStart []float64 `json:"character_start_times_seconds"`
	CharacterEnd   []float64 `json:"character_end_times_seconds"`
}

TTSAlignment contains word-level timing information.

type TTSRequest

type TTSRequest struct {
	// VoiceID is the voice to use for generation.
	VoiceID string

	// Text is the text to convert to speech.
	Text string

	// ModelID is the model to use. Defaults to DefaultModelID.
	ModelID string

	// VoiceSettings configures the voice parameters.
	// If nil, default settings will be used.
	VoiceSettings *VoiceSettings

	// OutputFormat specifies the audio output format.
	// Examples: "mp3_44100_128", "pcm_16000", "pcm_22050"
	OutputFormat string

	// LanguageCode is the ISO 639-1 language code for text normalization.
	LanguageCode string
}

TTSRequest is a request to generate speech from text.

func (*TTSRequest) Validate

func (r *TTSRequest) Validate() error

Validate validates the TTS request.

type TTSResponse

type TTSResponse struct {
	// Audio is the generated audio data.
	Audio io.Reader
}

TTSResponse contains the generated audio from text-to-speech.

type TextToDialogueService

type TextToDialogueService struct {
	// contains filtered or unexported fields
}

TextToDialogueService handles multi-voice dialogue generation.

func (*TextToDialogueService) Generate

Generate creates dialogue audio from multiple voice inputs. Returns an io.Reader containing the combined audio.

func (*TextToDialogueService) GenerateStream

func (s *TextToDialogueService) GenerateStream(ctx context.Context, req *DialogueRequest) (io.Reader, error)

GenerateStream creates dialogue audio with streaming output.

func (*TextToDialogueService) GenerateWithTimestamps

func (s *TextToDialogueService) GenerateWithTimestamps(ctx context.Context, req *DialogueRequest) (*DialogueResponse, error)

GenerateWithTimestamps creates dialogue audio with timing information.

func (*TextToDialogueService) Simple

func (s *TextToDialogueService) Simple(ctx context.Context, inputs []DialogueInput) (io.Reader, error)

Simple generates dialogue audio from text-voice pairs with default settings.

type TextToSpeechService

type TextToSpeechService struct {
	// contains filtered or unexported fields
}

TextToSpeechService handles text-to-speech operations.

func (*TextToSpeechService) Generate

func (s *TextToSpeechService) Generate(ctx context.Context, req *TTSRequest) (*TTSResponse, error)

Generate generates speech from text.

func (*TextToSpeechService) GenerateToWriter

func (s *TextToSpeechService) GenerateToWriter(ctx context.Context, req *TTSRequest, w io.Writer) error

GenerateToWriter generates speech and writes it to a writer.

func (*TextToSpeechService) Simple

func (s *TextToSpeechService) Simple(ctx context.Context, voiceID, text string) (io.Reader, error)

Simple is a convenience method that generates speech with minimal parameters.

type TranscriptionRequest

type TranscriptionRequest struct {
	// FileURL is the HTTPS URL of the file to transcribe.
	// Either FileURL or FileContent must be provided.
	FileURL string

	// FileContent is the base64-encoded file content.
	// Either FileURL or FileContent must be provided.
	FileContent string

	// LanguageCode is an ISO-639-1 or ISO-639-3 language code.
	// If not provided, language is auto-detected.
	LanguageCode string

	// Diarize enables speaker diarization (who said what).
	Diarize bool

	// NumSpeakers is the expected number of speakers (for diarization).
	NumSpeakers int

	// TagAudioEvents tags audio events like laughter, applause, etc.
	TagAudioEvents bool

	// ModelID is the transcription model to use (default: "scribe_v1").
	ModelID string
}

TranscriptionRequest contains options for transcription.

type TranscriptionResponse

type TranscriptionResponse struct {
	// Text is the full transcribed text.
	Text string

	// LanguageCode is the detected language.
	LanguageCode string

	// Words contains word-level details with timestamps.
	Words []TranscriptionWord

	// Utterances contains speaker-labeled segments (when diarization is enabled).
	Utterances []TranscriptionUtterance
}

TranscriptionResponse contains the transcription result.

type TranscriptionUtterance

type TranscriptionUtterance struct {
	// Text is the utterance text.
	Text string

	// Start is the start time in seconds.
	Start float64

	// End is the end time in seconds.
	End float64

	// Speaker is the speaker ID.
	Speaker string
}

TranscriptionUtterance represents a speaker segment.

type TranscriptionWord

type TranscriptionWord struct {
	// Text is the word text.
	Text string

	// Start is the start time in seconds.
	Start float64

	// End is the end time in seconds.
	End float64

	// Confidence is the confidence score (0-1).
	Confidence float64

	// Speaker is the speaker ID (when diarization is enabled).
	Speaker string

	// Type is the word type (e.g., "word", "punctuation").
	Type string
}

TranscriptionWord represents a single word with timing.

type TwilioOutboundCallRequest

type TwilioOutboundCallRequest struct {
	// AgentID is the ElevenLabs agent ID to handle the call.
	AgentID string `json:"agent_id"`

	// AgentPhoneNumberID is the ElevenLabs phone number ID to call from.
	AgentPhoneNumberID string `json:"agent_phone_number_id"`

	// ToNumber is the phone number to call (E.164 format).
	ToNumber string `json:"to_number"`

	// CustomLLMExtraBody is additional data to pass to the LLM.
	CustomLLMExtraBody map[string]any `json:"custom_llm_extra_body,omitempty"`

	// DynamicVariables are variables to inject into the agent prompt.
	DynamicVariables map[string]string `json:"dynamic_variables,omitempty"`

	// FirstMessage overrides the agent's default first message.
	FirstMessage string `json:"first_message,omitempty"`

	// SystemPrompt overrides the agent's system prompt.
	SystemPrompt string `json:"system_prompt,omitempty"`
}

TwilioOutboundCallRequest is the request to make an outbound call via Twilio.

type TwilioOutboundCallResponse

type TwilioOutboundCallResponse struct {
	// CallSID is the Twilio call SID.
	CallSID string `json:"call_sid"`

	// ConversationID is the ElevenLabs conversation ID for this call.
	ConversationID string `json:"conversation_id"`

	// Status is the initial call status.
	Status string `json:"status"`
}

TwilioOutboundCallResponse is the response from making an outbound call.

type TwilioRegisterCallRequest

type TwilioRegisterCallRequest struct {
	// AgentID is the ElevenLabs agent ID to handle the call.
	AgentID string `json:"agent_id"`

	// AgentPhoneNumberID is the ElevenLabs phone number ID (if using imported number).
	AgentPhoneNumberID string `json:"agent_phone_number_id,omitempty"`

	// CustomLLMExtraBody is additional data to pass to the LLM.
	CustomLLMExtraBody map[string]any `json:"custom_llm_extra_body,omitempty"`

	// DynamicVariables are variables to inject into the agent prompt.
	DynamicVariables map[string]string `json:"dynamic_variables,omitempty"`

	// FirstMessage overrides the agent's default first message.
	FirstMessage string `json:"first_message,omitempty"`

	// SystemPrompt overrides the agent's system prompt.
	SystemPrompt string `json:"system_prompt,omitempty"`
}

TwilioRegisterCallRequest is the request to register an incoming Twilio call.

type TwilioRegisterCallResponse

type TwilioRegisterCallResponse struct {
	// TwiML is the TwiML response to return to Twilio.
	TwiML string `json:"twiml"`

	// ConversationID is the ElevenLabs conversation ID for this call.
	ConversationID string `json:"conversation_id,omitempty"`
}

TwilioRegisterCallResponse is the response from registering a call.

type TwilioService

type TwilioService struct {
	// contains filtered or unexported fields
}

TwilioService handles Twilio phone integration for conversational AI.

func (*TwilioService) OutboundCall

OutboundCall initiates an outbound call via Twilio.

func (*TwilioService) RegisterCall

RegisterCall registers an incoming Twilio call with ElevenLabs. Returns TwiML that should be returned to Twilio's webhook.

func (*TwilioService) SIPOutboundCall

SIPOutboundCall initiates an outbound call via SIP trunk.

type UpdatePhoneNumberRequest

type UpdatePhoneNumberRequest struct {
	// Label is a descriptive label for the phone number.
	Label string `json:"label,omitempty"`

	// AgentID is the agent to associate with this phone number.
	AgentID string `json:"agent_id,omitempty"`
}

UpdatePhoneNumberRequest is the request to update a phone number.

type UpdateProjectRequest

type UpdateProjectRequest struct {
	// Name is the new project name (required).
	Name string

	// DefaultParagraphVoiceID is the new default paragraph voice (required).
	DefaultParagraphVoiceID string

	// DefaultTitleVoiceID is the new default title voice (required).
	DefaultTitleVoiceID string

	// Author is an optional author name.
	Author string

	// Title is an optional title.
	Title string
}

UpdateProjectRequest contains options for updating a project.

type User

type User struct {
	// UserID is the unique user identifier.
	UserID string

	// FirstName is the user's first name.
	FirstName string

	// Subscription contains the user's subscription details.
	Subscription *Subscription

	// CreatedAt is when the user was created.
	CreatedAt time.Time
}

User represents an ElevenLabs user.

type UserService

type UserService struct {
	// contains filtered or unexported fields
}

UserService handles user and subscription operations.

func (*UserService) GetCharactersRemaining

func (s *UserService) GetCharactersRemaining(ctx context.Context) (int, error)

GetCharactersRemaining returns the number of characters remaining in the current period.

func (*UserService) GetInfo

func (s *UserService) GetInfo(ctx context.Context) (*User, error)

GetInfo returns the current user's information including subscription.

func (*UserService) GetSubscription

func (s *UserService) GetSubscription(ctx context.Context) (*Subscription, error)

GetSubscription returns the current user's subscription details. This is a convenience method that calls GetInfo and returns just the subscription.

type ValidationError

type ValidationError struct {
	Field   string
	Message string
}

ValidationError represents a validation error.

func (*ValidationError) Error

func (e *ValidationError) Error() string

Error implements the error interface.

type Voice

type Voice struct {
	// VoiceID is the unique identifier for the voice.
	VoiceID string

	// Name is the display name of the voice.
	Name string

	// Category is the category of the voice (e.g., "premade", "cloned").
	Category string

	// Description is the description of the voice.
	Description string

	// PreviewURL is the URL to preview the voice.
	PreviewURL string

	// Labels contains additional metadata about the voice.
	Labels map[string]string
}

Voice represents an ElevenLabs voice.

type VoiceAccent

type VoiceAccent string

VoiceAccent represents accent options for voice generation.

const (
	VoiceAccentBritish    VoiceAccent = "british"
	VoiceAccentAmerican   VoiceAccent = "american"
	VoiceAccentAfrican    VoiceAccent = "african"
	VoiceAccentAustralian VoiceAccent = "australian"
	VoiceAccentIndian     VoiceAccent = "indian"
)

type VoiceAge

type VoiceAge string

VoiceAge represents the age options for voice generation.

const (
	VoiceAgeYoung      VoiceAge = "young"
	VoiceAgeMiddleAged VoiceAge = "middle_aged"
	VoiceAgeOld        VoiceAge = "old"
)

type VoiceDesignRequest

type VoiceDesignRequest struct {
	// Gender of the voice (required).
	Gender VoiceGender

	// Age category of the voice (required).
	Age VoiceAge

	// Accent of the voice (required).
	Accent VoiceAccent

	// AccentStrength controls accent prominence (0.3 to 2.0).
	AccentStrength float64

	// Text for voice preview (100-1000 characters).
	Text string
}

VoiceDesignRequest contains options for generating a random voice.

type VoiceDesignResponse

type VoiceDesignResponse struct {
	// Audio is the generated voice sample.
	Audio io.Reader

	// GeneratedVoiceID can be used to save this voice permanently.
	GeneratedVoiceID string
}

VoiceDesignResponse contains the generated voice preview.

type VoiceDesignService

type VoiceDesignService struct {
	// contains filtered or unexported fields
}

VoiceDesignService handles AI voice generation and design.

func (*VoiceDesignService) GeneratePreview

GeneratePreview creates a voice preview based on design parameters. Returns audio sample and a generated_voice_id that can be saved.

func (*VoiceDesignService) SaveVoice

func (s *VoiceDesignService) SaveVoice(ctx context.Context, req *SaveVoiceRequest) (*Voice, error)

SaveVoice saves a previously generated voice to your voice library.

func (*VoiceDesignService) Simple

func (s *VoiceDesignService) Simple(ctx context.Context, gender VoiceGender, age VoiceAge, accent VoiceAccent, previewText string) (*VoiceDesignResponse, error)

Simple generates a voice preview with common defaults.

type VoiceGender

type VoiceGender string

VoiceGender represents the gender options for voice generation.

const (
	VoiceGenderFemale VoiceGender = "female"
	VoiceGenderMale   VoiceGender = "male"
)

type VoiceSegment

type VoiceSegment struct {
	// VoiceID is the voice used for this segment.
	VoiceID string

	// StartTime is the start time in seconds.
	StartTime float64

	// EndTime is the end time in seconds.
	EndTime float64
}

VoiceSegment represents a segment of audio for a specific voice.

type VoiceSettings

type VoiceSettings struct {
	// Stability determines how stable the voice is (0.0 to 1.0).
	// Lower values introduce broader emotional range.
	Stability float64

	// SimilarityBoost determines how closely the AI should adhere to
	// the original voice (0.0 to 1.0).
	SimilarityBoost float64

	// Style determines the style exaggeration (0.0 to 1.0).
	// Higher values amplify the original speaker's style.
	Style float64

	// Speed adjusts the speed of the voice (0.25 to 4.0).
	// 1.0 is the default speed.
	Speed float64

	// UseSpeakerBoost boosts similarity to the original speaker.
	UseSpeakerBoost bool
}

VoiceSettings contains the voice configuration for text-to-speech.

func DefaultVoiceSettings

func DefaultVoiceSettings() *VoiceSettings

DefaultVoiceSettings returns sensible default voice settings.

func VoiceSettingsForAudiobook added in v0.4.0

func VoiceSettingsForAudiobook() *VoiceSettings

VoiceSettingsForAudiobook returns settings tuned for audiobook narration. Clear, consistent, easy to listen to for extended periods.

func VoiceSettingsForCoursera added in v0.4.0

func VoiceSettingsForCoursera() *VoiceSettings

VoiceSettingsForCoursera returns settings tuned for Coursera courses. Slightly expressive, engaging for mixed media content.

func VoiceSettingsForEdX added in v0.4.0

func VoiceSettingsForEdX() *VoiceSettings

VoiceSettingsForEdX returns settings tuned for edX courses. Very stable, highly intelligible, slightly faster for dense academic content.

func VoiceSettingsForInstagram added in v0.4.0

func VoiceSettingsForInstagram() *VoiceSettings

VoiceSettingsForInstagram returns settings tuned for Instagram content. Energetic but polished, suitable for brand content.

func VoiceSettingsForPodcast added in v0.4.0

func VoiceSettingsForPodcast() *VoiceSettings

VoiceSettingsForPodcast returns settings tuned for podcast content. Natural conversational tone for long-form audio content.

func VoiceSettingsForTikTok added in v0.4.0

func VoiceSettingsForTikTok() *VoiceSettings

VoiceSettingsForTikTok returns settings tuned for TikTok content. Designed for immediate engagement in the first 1-3 seconds.

func VoiceSettingsForUdemy added in v0.4.0

func VoiceSettingsForUdemy() *VoiceSettings

VoiceSettingsForUdemy returns settings tuned for Udemy courses. Neutral, clear, consistent, safe for long lectures.

func VoiceSettingsForYouTube added in v0.4.0

func VoiceSettingsForYouTube() *VoiceSettings

VoiceSettingsForYouTube returns settings tuned for YouTube content. Designed to hold attention for 5-20 minutes without sounding robotic or theatrical.

func (*VoiceSettings) Validate

func (vs *VoiceSettings) Validate() error

Validate validates the voice settings.

type VoicesService

type VoicesService struct {
	// contains filtered or unexported fields
}

VoicesService handles voice operations.

func (*VoicesService) Delete

func (s *VoicesService) Delete(ctx context.Context, voiceID string) error

Delete deletes a voice by ID.

func (*VoicesService) Get

func (s *VoicesService) Get(ctx context.Context, voiceID string) (*Voice, error)

Get returns a voice by ID.

func (*VoicesService) GetDefaultSettings

func (s *VoicesService) GetDefaultSettings(ctx context.Context) (*VoiceSettings, error)

GetDefaultSettings returns the default voice settings.

func (*VoicesService) GetSettings

func (s *VoicesService) GetSettings(ctx context.Context, voiceID string) (*VoiceSettings, error)

GetSettings returns the settings for a voice.

func (*VoicesService) List

func (s *VoicesService) List(ctx context.Context) ([]*Voice, error)

List returns all available voices.

type WebSocketSTTConnection

type WebSocketSTTConnection struct {
	// contains filtered or unexported fields
}

WebSocketSTTConnection represents an active WebSocket STT connection.

func (*WebSocketSTTConnection) Close

func (wsc *WebSocketSTTConnection) Close() error

Close closes the WebSocket connection gracefully.

func (*WebSocketSTTConnection) Commit added in v0.7.0

func (wsc *WebSocketSTTConnection) Commit() error

Commit forces a commit of the current transcript segment. This sends an empty audio chunk with commit=true.

func (*WebSocketSTTConnection) Errors

func (wsc *WebSocketSTTConnection) Errors() <-chan error

Errors returns a channel that receives errors from the connection.

func (*WebSocketSTTConnection) SendAudio

func (wsc *WebSocketSTTConnection) SendAudio(audio []byte) error

SendAudio sends audio data for transcription. The audio should be in the format specified in WebSocketSTTOptions.AudioFormat.

func (*WebSocketSTTConnection) SendAudioWithCommit added in v0.7.0

func (wsc *WebSocketSTTConnection) SendAudioWithCommit(audio []byte, commit bool) error

SendAudioWithCommit sends audio data and optionally commits the transcript. When commit is true, the server will finalize the current transcript segment. This is useful for manual commit strategy.

func (*WebSocketSTTConnection) SessionID added in v0.7.0

func (wsc *WebSocketSTTConnection) SessionID() string

SessionID returns the session ID assigned by the server. This is available after the connection is established.

func (*WebSocketSTTConnection) StreamAudio

func (wsc *WebSocketSTTConnection) StreamAudio(ctx context.Context, audioStream <-chan []byte) (<-chan *STTTranscript, <-chan error)

StreamAudio is a convenience method that streams audio from a channel. It handles committing automatically when the input channel closes.

func (*WebSocketSTTConnection) Transcripts

func (wsc *WebSocketSTTConnection) Transcripts() <-chan *STTTranscript

Transcripts returns a channel that receives transcription results.

type WebSocketSTTOptions

type WebSocketSTTOptions struct {
	// ModelID is the transcription model to use.
	// Default: "scribe_v2_realtime"
	ModelID string

	// AudioFormat specifies the audio encoding format.
	// Options: "pcm_8000", "pcm_16000", "pcm_22050", "pcm_24000", "pcm_44100", "pcm_48000", "ulaw_8000"
	// Default: "pcm_16000"
	AudioFormat string

	// LanguageCode is the expected language (e.g., "en", "es").
	// If not specified, language will be auto-detected.
	LanguageCode string

	// IncludeTimestamps enables word-level timing information.
	IncludeTimestamps bool

	// IncludeLanguageDetection includes detected language in responses.
	IncludeLanguageDetection bool

	// CommitStrategy determines how transcripts are committed.
	// Options: "manual" (default), "vad" (voice activity detection)
	CommitStrategy string

	// VADSilenceThresholdSecs is the silence duration to trigger commit in VAD mode.
	// Default: 1.5
	VADSilenceThresholdSecs float64

	// VADThreshold is the VAD sensitivity threshold.
	// Default: 0.4
	VADThreshold float64

	// MinSpeechDurationMs is the minimum speech duration in milliseconds.
	// Default: 100
	MinSpeechDurationMs int

	// MinSilenceDurationMs is the minimum silence duration in milliseconds.
	// Default: 100
	MinSilenceDurationMs int
}

WebSocketSTTOptions configures the WebSocket STT connection.

func DefaultWebSocketSTTOptions

func DefaultWebSocketSTTOptions() *WebSocketSTTOptions

DefaultWebSocketSTTOptions returns default options for real-time STT.

type WebSocketSTTService

type WebSocketSTTService struct {
	// contains filtered or unexported fields
}

WebSocketSTTService handles real-time speech-to-text via WebSocket.

func (*WebSocketSTTService) Connect

Connect establishes a WebSocket connection for real-time STT.

type WebSocketTTSConnection

type WebSocketTTSConnection struct {
	// contains filtered or unexported fields
}

WebSocketTTSConnection represents an active WebSocket TTS connection.

func (*WebSocketTTSConnection) Alignments

func (wsc *WebSocketTTSConnection) Alignments() <-chan *TTSAlignment

Alignments returns a channel that receives word alignment information.

func (*WebSocketTTSConnection) Audio

func (wsc *WebSocketTTSConnection) Audio() <-chan []byte

Audio returns a channel that receives audio chunks as they are generated.

func (*WebSocketTTSConnection) Close

func (wsc *WebSocketTTSConnection) Close() error

Close closes the WebSocket connection gracefully.

func (*WebSocketTTSConnection) Done added in v0.7.0

func (wsc *WebSocketTTSConnection) Done() <-chan struct{}

Done returns a channel that is closed when all audio has been received after Flush(). Use this to wait for completion before closing the connection.

func (*WebSocketTTSConnection) Errors

func (wsc *WebSocketTTSConnection) Errors() <-chan error

Errors returns a channel that receives errors from the connection.

func (*WebSocketTTSConnection) Flush

func (wsc *WebSocketTTSConnection) Flush() error

Flush signals that no more text will be sent and flushes remaining audio. This should be called when the text stream is complete. After calling Flush, use Done() to wait for all audio to be received.

func (*WebSocketTTSConnection) SendText

func (wsc *WebSocketTTSConnection) SendText(text string) error

SendText sends text to be converted to speech. The text can be sent in chunks as it becomes available (e.g., from an LLM stream).

func (*WebSocketTTSConnection) SendTextWithContext

func (wsc *WebSocketTTSConnection) SendTextWithContext(text, contextID string) error

SendTextWithContext sends text with a specific context ID for multi-context sessions.

func (*WebSocketTTSConnection) StreamText

func (wsc *WebSocketTTSConnection) StreamText(ctx context.Context, textStream <-chan string) (<-chan []byte, <-chan error)

StreamText is a convenience method that sends all text from a channel and returns audio. It handles flushing automatically when the input channel closes.

func (*WebSocketTTSConnection) TriggerGeneration

func (wsc *WebSocketTTSConnection) TriggerGeneration() error

TriggerGeneration forces audio generation for buffered text.

type WebSocketTTSOptions

type WebSocketTTSOptions struct {
	// ModelID is the model to use. Defaults to "eleven_turbo_v2_5" for low latency.
	ModelID string

	// OutputFormat specifies the audio output format.
	// Recommended for real-time: "pcm_16000", "pcm_22050", "pcm_24000", "pcm_44100"
	// Also supports: "mp3_44100_64", "mp3_44100_96", "mp3_44100_128", "mp3_44100_192"
	OutputFormat string

	// VoiceSettings configures the voice parameters.
	VoiceSettings *VoiceSettings

	// OptimizeStreamingLatency reduces latency at the cost of quality (0-4).
	// 0 = no optimization, 4 = maximum optimization.
	OptimizeStreamingLatency int

	// EnableSSMLParsing enables SSML parsing for the input text.
	EnableSSMLParsing bool

	// LanguageCode is the ISO language code (e.g., "en", "es").
	LanguageCode string

	// ChunkLengthSchedule controls text chunking for audio generation.
	// Array of integers representing character counts before generating audio.
	ChunkLengthSchedule []int

	// InactivityTimeout is the context timeout in seconds (default 20).
	InactivityTimeout int

	// PronunciationDictionaryIDs is a list of pronunciation dictionary IDs to use.
	PronunciationDictionaryIDs []string
}

WebSocketTTSOptions configures the WebSocket TTS connection.

func DefaultWebSocketTTSOptions

func DefaultWebSocketTTSOptions() *WebSocketTTSOptions

DefaultWebSocketTTSOptions returns default options optimized for low latency.

type WebSocketTTSService

type WebSocketTTSService struct {
	// contains filtered or unexported fields
}

WebSocketTTSService handles real-time text-to-speech via WebSocket.

Stream Completion Behavior

ElevenLabs WebSocket TTS does not send an explicit "end of stream" signal. After calling Flush(), the server generates remaining audio and then waits for more input. If no input arrives within the inactivity timeout (default 20 seconds), the server sends an "input_timeout_exceeded" error and closes the connection.

For applications that need faster stream completion detection, set a shorter InactivityTimeout in WebSocketTTSOptions (e.g., 5 seconds) and treat the timeout as a successful completion if audio was received after flush.

func (*WebSocketTTSService) Connect

Connect establishes a WebSocket connection for real-time TTS.

Directories

Path Synopsis
cmd
openapi-convert command
Command openapi-convert converts OpenAPI 3.1 specs to 3.0.3 for ogen compatibility.
Command openapi-convert converts OpenAPI 3.1 specs to 3.0.3 for ogen compatibility.
ttsscript command
Command ttsscript generates TTS audio from a JSON script file using ElevenLabs.
Command ttsscript generates TTS audio from a JSON script file using ElevenLabs.
examples
basic command
Example basic shows how to use the ElevenLabs SDK for common operations.
Example basic shows how to use the ElevenLabs SDK for common operations.
retryhttp command
Example demonstrating retry middleware with ElevenLabs client.
Example demonstrating retry middleware with ElevenLabs client.
simple command
speech-to-speech command
Example: Speech-to-Speech - Voice conversion
Example: Speech-to-Speech - Voice conversion
ttsscript command
Example: Using ttsscript to generate multilingual TTS audio
Example: Using ttsscript to generate multilingual TTS audio
twilio command
Example: Twilio Integration - Phone call handling
Example: Twilio Integration - Phone call handling
websocket-stt command
Example: WebSocket STT - Real-time speech-to-text streaming
Example: WebSocket STT - Real-time speech-to-text streaming
websocket-tts command
Example: WebSocket TTS - Real-time text-to-speech streaming
Example: WebSocket TTS - Real-time text-to-speech streaming
internal
api
Code generated by ogen, DO NOT EDIT.
Code generated by ogen, DO NOT EDIT.
Package omnivoice provides OmniVoice provider implementations using the ElevenLabs API.
Package omnivoice provides OmniVoice provider implementations using the ElevenLabs API.
agent
Package agent provides an OmniVoice Agent provider implementation using ElevenLabs.
Package agent provides an OmniVoice Agent provider implementation using ElevenLabs.
stt
Package stt provides an OmniVoice STT provider implementation using ElevenLabs.
Package stt provides an OmniVoice STT provider implementation using ElevenLabs.
tts
Package tts provides an OmniVoice TTS provider implementation using ElevenLabs.
Package tts provides an OmniVoice TTS provider implementation using ElevenLabs.
Package ttsscript provides a structured format for authoring multilingual TTS (Text-to-Speech) scripts that can be compiled to various output formats.
Package ttsscript provides a structured format for authoring multilingual TTS (Text-to-Speech) scripts that can be compiled to various output formats.
Package voices provides reference information for ElevenLabs voices.
Package voices provides reference information for ElevenLabs voices.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL