omnivoice-deepgram

module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 22, 2026 License: MIT

README

OmniVoice Deepgram Provider

Build Status Lint Status Go Report Card Docs License

OmniVoice provider implementation for Deepgram speech-to-text and text-to-speech services.

This package adapts the official Deepgram Go SDK to the OmniVoice interfaces, enabling Deepgram's STT and TTS capabilities within the OmniVoice framework.

OmniVoice Feature Support

This table shows which OmniVoice abstracted capabilities are supported by this provider.

Core Voice Capabilities
Capability Supported Notes
STT (Speech-to-Text) Full capability
STT Streaming Real-time via WebSocket
STT Batch From audio bytes via REST
STT File From file path via REST
STT URL From URL via REST
TTS (Text-to-Speech) Aura voices via REST and WebSocket
TTS Synthesize Non-streaming via REST API
TTS Streaming Real-time via WebSocket
TTS Voice List Static list of Aura voices
Voice Agent N/A (use with agent orchestration)
STT Features
Feature Supported Notes
Interim results Real-time partial transcripts
Final results Complete utterance transcripts
Speech start detection EventSpeechStart events
Speech end detection EventSpeechEnd / utterance end
Speaker diarization Multi-speaker identification
Keyword boosting Boost specific terms
Punctuation Optional auto-punctuation
Word-level timestamps Per-word timing data
Confidence scores Per-word and per-utterance
TTS Features
Feature Supported Notes
Non-streaming synthesis REST API returns full audio
Streaming synthesis WebSocket streams audio chunks
Streaming input Pipe LLM output directly to TTS
Sentence splitting Automatic splitting for natural speech
Voice selection Aura 1 and Aura 2 voices
Output formats mp3, linear16, mulaw, alaw, opus, flac
Sample rate control Configurable output sample rate
Transport Layer
Transport Supported Notes
WebSocket Native streaming transport
HTTP Batch/pre-recorded API
WebRTC Use with transport provider
SIP Use with transport provider
PSTN Use with transport provider
Call System Integration
Call System Supported Notes
Twilio Use with omnivoice-twilio
RingCentral Use with call system provider
Zoom Use with call system provider
LiveKit Use with call system provider
Daily Use with call system provider

Legend: ✅ Supported | ❌ Not implemented | — Not applicable (use with other providers)

Features

Speech-to-Text (STT)
  • Real-time streaming transcription via WebSocket
  • Support for telephony audio formats (mu-law, a-law)
  • Interim and final transcription results
  • Speech start/end detection for natural turn-taking
  • Speaker diarization support
  • Keyword boosting
Text-to-Speech (TTS)
  • Non-streaming synthesis via REST API
  • Real-time streaming synthesis via WebSocket
  • Streaming input support (pipe LLM output directly to TTS)
  • Automatic sentence splitting for natural speech
  • Multiple Aura voices (male/female, US/UK/IE accents)
  • Multiple output formats (mp3, linear16, mulaw, opus, etc.)
  • Configurable sample rate

Installation

go get github.com/agentplexus/omnivoice-deepgram

Usage

Batch Transcription (File/URL)
import (
    deepgramstt "github.com/agentplexus/omnivoice-deepgram/omnivoice/stt"
    "github.com/agentplexus/omnivoice/stt"
)

// Create provider with API key
provider, err := deepgramstt.New(deepgramstt.WithAPIKey("your-api-key"))
if err != nil {
    log.Fatal(err)
}

config := stt.TranscriptionConfig{
    Model:    "nova-2",
    Language: "en-US",
}

// Transcribe from URL
result, err := provider.TranscribeURL(ctx, "https://example.com/audio.mp3", config)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Transcript: %s\n", result.Text)
fmt.Printf("Duration: %v\n", result.Duration)

// Access word-level timestamps
for _, segment := range result.Segments {
    for _, word := range segment.Words {
        fmt.Printf("%s: %v - %v\n", word.Text, word.StartTime, word.EndTime)
    }
}

// Transcribe from file
result, err = provider.TranscribeFile(ctx, "/path/to/audio.mp3", config)

// Transcribe from bytes
audioData, _ := os.ReadFile("/path/to/audio.mp3")
result, err = provider.Transcribe(ctx, audioData, config)
Streaming Transcription (Real-time)
import (
    deepgramstt "github.com/agentplexus/omnivoice-deepgram/omnivoice/stt"
    "github.com/agentplexus/omnivoice/stt"
)

// Create provider with API key
provider, err := deepgramstt.New(deepgramstt.WithAPIKey("your-api-key"))
if err != nil {
    log.Fatal(err)
}

// Configure for telephony audio
config := stt.TranscriptionConfig{
    Model:      "nova-2",
    Language:   "en-US",
    Encoding:   "mulaw",
    SampleRate: 8000,
}

// Start streaming transcription
writer, events, err := provider.TranscribeStream(ctx, config)
if err != nil {
    log.Fatal(err)
}

// Send audio data
go func() {
    defer writer.Close()
    io.Copy(writer, audioSource)
}()

// Receive transcription events
for event := range events {
    switch event.Type {
    case stt.EventTranscript:
        if event.IsFinal {
            fmt.Println("Final:", event.Transcript)
        }
    case stt.EventSpeechStart:
        fmt.Println("Speech started")
    case stt.EventSpeechEnd:
        fmt.Println("Speech ended")
    case stt.EventError:
        log.Printf("Error: %v", event.Error)
    }
}
Basic Text-to-Speech
import (
    deepgramtts "github.com/agentplexus/omnivoice-deepgram/omnivoice/tts"
    "github.com/agentplexus/omnivoice/tts"
)

// Create TTS provider with API key
provider, err := deepgramtts.New(deepgramtts.WithAPIKey("your-api-key"))
if err != nil {
    log.Fatal(err)
}

// Configure synthesis
config := tts.SynthesisConfig{
    VoiceID:      "aura-asteria-en",  // Female US voice
    OutputFormat: "mp3",
    SampleRate:   24000,
}

// Synthesize text to speech
result, err := provider.Synthesize(ctx, "Hello, world!", config)
if err != nil {
    log.Fatal(err)
}

// result.Audio contains the synthesized audio bytes
fmt.Printf("Generated %d bytes of audio\n", len(result.Audio))
Streaming Text-to-Speech
// Start streaming synthesis
chunkCh, err := provider.SynthesizeStream(ctx, "Hello, this is streaming TTS.", config)
if err != nil {
    log.Fatal(err)
}

// Receive audio chunks as they're generated
for chunk := range chunkCh {
    if chunk.Error != nil {
        log.Printf("Error: %v", chunk.Error)
        break
    }
    if len(chunk.Audio) > 0 {
        // Process or play audio chunk
        audioPlayer.Write(chunk.Audio)
    }
    if chunk.IsFinal {
        fmt.Println("Synthesis complete")
    }
}
List Available Voices
voices, err := provider.ListVoices(ctx)
if err != nil {
    log.Fatal(err)
}

for _, voice := range voices {
    fmt.Printf("%s: %s (%s, %s)\n", voice.ID, voice.Name, voice.Language, voice.Gender)
}
Streaming Input from LLM

Stream text from an LLM directly to TTS for low-latency voice responses:

// Create a pipe to connect LLM output to TTS input
pr, pw := io.Pipe()

// Start streaming synthesis from the reader
chunkCh, err := provider.SynthesizeFromReader(ctx, pr, config)
if err != nil {
    log.Fatal(err)
}

// Simulate streaming LLM output in a goroutine
go func() {
    defer pw.Close()

    // Write text chunks as they arrive from LLM
    pw.Write([]byte("Hello! "))
    pw.Write([]byte("This is streaming from an LLM. "))
    pw.Write([]byte("Each sentence is synthesized as it arrives."))
}()

// Receive audio chunks as they're generated
for chunk := range chunkCh {
    if chunk.Error != nil {
        log.Printf("Error: %v", chunk.Error)
        break
    }
    if len(chunk.Audio) > 0 {
        audioPlayer.Write(chunk.Audio)
    }
}
With OmniVoice Pipeline

For a complete voice agent example using Deepgram STT and TTS with Twilio Media Streams, see the omnivoice-examples repository.

Supported Audio Formats

Format Encoding Value Typical Use
mu-law mulaw Twilio, telephony
A-law alaw European telephony
Linear PCM linear16 General audio
FLAC flac Compressed lossless
Opus opus WebRTC
MP3 mp3 Compressed lossy

Configuration Options

Option Description Default
Model Deepgram model nova-2
Language Language code en-US
SampleRate Audio sample rate 8000
Channels Audio channels 1
EnablePunctuation Add punctuation false
EnableSpeakerDiarization Identify speakers false
Keywords Words to boost []

Requirements

License

MIT License - see LICENSE for details.

Directories

Path Synopsis
Package omnivoice provides OmniVoice provider implementations using Deepgram.
Package omnivoice provides OmniVoice provider implementations using Deepgram.
stt
Package stt provides an OmniVoice STT provider implementation using Deepgram.
Package stt provides an OmniVoice STT provider implementation using Deepgram.
tts
Package tts provides an OmniVoice TTS provider implementation using Deepgram.
Package tts provides an OmniVoice TTS provider implementation using Deepgram.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL