openai

package module

v0.1.1 Latest Latest Go to latest Published: Mar 22, 2026 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/plexusone/omnivoice-openai

Links

Open Source Insights

README ¶

OmniVoice OpenAI Provider

OpenAI audio provider for the OmniVoice voice pipeline framework.

Features

STT (Speech-to-Text): Whisper transcription with word and segment timestamps
TTS (Text-to-Speech): OpenAI audio synthesis with multiple voices
OmniVoice Integration: Implements stt.Provider and tts.Provider interfaces

Installation

go get github.com/plexusone/omnivoice-openai

Usage

Direct Client Usage

package main

import (
    "context"
    "log"

    "github.com/plexusone/omnivoice-openai"
)

func main() {
    // Create client from environment variable
    client, err := openai.NewClientFromEnv()
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    // Transcribe audio
    resp, err := client.Transcribe(ctx, openai.TranscriptionRequest{
        Audio:    audioData,
        Filename: "audio.mp3",
    })
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("Transcription: %s", resp.Text)

    // Synthesize speech
    ttsResp, err := client.Synthesize(ctx, openai.TTSRequest{
        Input: "Hello, world!",
        Voice: openai.VoiceAlloy,
    })
    if err != nil {
        log.Fatal(err)
    }
    // ttsResp.Audio contains the MP3 audio data
}

OmniVoice Provider Usage

package main

import (
    "context"

    "github.com/plexusone/omnivoice-core/stt"
    "github.com/plexusone/omnivoice-core/tts"
    openaistt "github.com/plexusone/omnivoice-openai/omnivoice/stt"
    openaitts "github.com/plexusone/omnivoice-openai/omnivoice/tts"
)

func main() {
    ctx := context.Background()

    // Create STT provider
    sttProvider := openaistt.NewProvider()
    transcription, err := sttProvider.Transcribe(ctx, audioData)

    // Create TTS provider
    ttsProvider := openaitts.NewProvider()
    audio, err := ttsProvider.Synthesize(ctx, "Hello, world!")
}

Configuration

Set the OPENAI_API_KEY environment variable or pass the API key directly:

client := openai.NewClient("your-api-key")

Available Voices

Voice	Description
alloy	Neutral, balanced
ash	Warm, engaging
ballad	Melodic, expressive
coral	Clear, articulate
echo	Smooth, natural
fable	Storytelling, dramatic
nova	Bright, energetic
onyx	Deep, resonant
sage	Calm, wise
shimmer	Light, cheerful
verse	Poetic, rhythmic
marin	Friendly, approachable
cedar	Grounded, trustworthy

License

MIT License - see LICENSE for details.

Documentation ¶

Overview ¶

Package openai provides a Go client for OpenAI's audio APIs (Whisper STT and TTS).

Index ¶

Constants
type Client
- func NewClient(apiKey string) *Client
- func NewClientFromEnv() (*Client, error)
type Segment
type TTSRequest
type TTSResponse
type TranscriptionRequest
type TranscriptionResponse
type WordTimestamp

Constants ¶

View Source

const (
	VoiceAlloy   = "alloy"
	VoiceAsh     = "ash"
	VoiceBallad  = "ballad"
	VoiceCoral   = "coral"
	VoiceEcho    = "echo"
	VoiceFable   = "fable"
	VoiceOnyx    = "onyx"
	VoiceNova    = "nova"
	VoiceSage    = "sage"
	VoiceShimmer = "shimmer"
	VoiceVerse   = "verse"
	VoiceMarin   = "marin"
	VoiceCedar   = "cedar"
)

Voice constants for TTS.

View Source

const (
	ModelWhisper1 = "whisper-1"
	ModelTTS1     = "tts-1"
	ModelTTS1HD   = "tts-1-hd"
)

Model constants.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Client ¶

type Client struct {
	// contains filtered or unexported fields
}

Client wraps the OpenAI API client for audio operations.

func NewClient ¶

func NewClient(apiKey string) *Client

NewClient creates a new OpenAI client with the given API key.

func NewClientFromEnv ¶

func NewClientFromEnv() (*Client, error)

NewClientFromEnv creates a new OpenAI client using the OPENAI_API_KEY environment variable.

func (*Client) Synthesize ¶

func (c *Client) Synthesize(ctx context.Context, req TTSRequest) (*TTSResponse, error)

Synthesize converts text to speech.

func (*Client) SynthesizeStream ¶

func (c *Client) SynthesizeStream(ctx context.Context, req TTSRequest) (io.ReadCloser, error)

SynthesizeStream converts text to speech with streaming output.

func (*Client) Transcribe ¶

func (c *Client) Transcribe(ctx context.Context, req TranscriptionRequest) (*TranscriptionResponse, error)

Transcribe converts audio to text using Whisper.

func (*Client) TranscribeFile ¶

func (c *Client) TranscribeFile(ctx context.Context, filePath string, req TranscriptionRequest) (*TranscriptionResponse, error)

TranscribeFile transcribes audio from a file path.

type Segment ¶

type Segment struct {
	ID               int64   `json:"id"`
	Seek             int64   `json:"seek"`
	Start            float64 `json:"start"`
	End              float64 `json:"end"`
	Text             string  `json:"text"`
	Temperature      float64 `json:"temperature"`
	AvgLogprob       float64 `json:"avg_logprob"`
	CompressionRatio float64 `json:"compression_ratio"`
	NoSpeechProb     float64 `json:"no_speech_prob"`
}

Segment represents a transcription segment.

type TTSRequest ¶

type TTSRequest struct {
	// Input is the text to convert to speech (max 4096 characters).
	Input string

	// Model is the TTS model: "tts-1" (faster) or "tts-1-hd" (higher quality).
	Model string

	// Voice is the voice to use: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse, marin, cedar.
	Voice string

	// ResponseFormat is the audio format: "mp3", "opus", "aac", "flac", "wav", "pcm".
	ResponseFormat string

	// Speed is the speech speed (0.25 to 4.0, default 1.0).
	Speed float64
}

TTSRequest configures a text-to-speech request.

type TTSResponse ¶

type TTSResponse struct {
	// Audio is the generated audio data.
	Audio []byte

	// Format is the audio format.
	Format string
}

TTSResponse contains the generated audio.

type TranscriptionRequest ¶

type TranscriptionRequest struct {
	// Audio is the audio data to transcribe.
	Audio []byte

	// Filename is the name of the audio file (used for format detection).
	Filename string

	// Model is the Whisper model to use (default: "whisper-1").
	Model string

	// Language is the language of the audio (ISO-639-1 code, e.g., "en").
	// Leave empty for automatic detection.
	Language string

	// Prompt is optional text to guide the model's style or continue a previous segment.
	Prompt string

	// ResponseFormat is the output format: "json", "text", "srt", "verbose_json", "vtt".
	ResponseFormat string

	// Temperature is the sampling temperature (0-1). Lower is more deterministic.
	Temperature float64

	// TimestampGranularities specifies timestamp detail: "word", "segment", or both.
	TimestampGranularities []string
}

TranscriptionRequest configures a Whisper transcription request.

type TranscriptionResponse ¶

type TranscriptionResponse struct {
	// Text is the transcribed text.
	Text string

	// Language is the detected language.
	Language string

	// Duration is the audio duration in seconds.
	Duration float64

	// Words contains word-level timestamps (if requested).
	Words []WordTimestamp

	// Segments contains segment-level timestamps (if requested).
	Segments []Segment
}

TranscriptionResponse contains the Whisper transcription result.

type WordTimestamp ¶

type WordTimestamp struct {
	Word  string  `json:"word"`
	Start float64 `json:"start"`
	End   float64 `json:"end"`
}

WordTimestamp represents a word with timing information.

Source Files ¶

View all Source files

openai.go

Directories ¶

Path	Synopsis
omnivoice Package omnivoice provides omnivoice STT and TTS provider implementations for OpenAI.	Package omnivoice provides omnivoice STT and TTS provider implementations for OpenAI.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL