openai

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 22, 2026 License: MIT Imports: 9 Imported by: 0

README

OmniVoice OpenAI Provider

Go CI Go Lint Go SAST Go Report Card Docs Visualization License

OpenAI audio provider for the OmniVoice voice pipeline framework.

Features

  • STT (Speech-to-Text): Whisper transcription with word and segment timestamps
  • TTS (Text-to-Speech): OpenAI audio synthesis with multiple voices
  • OmniVoice Integration: Implements stt.Provider and tts.Provider interfaces

Installation

go get github.com/plexusone/omnivoice-openai

Usage

Direct Client Usage
package main

import (
    "context"
    "log"

    "github.com/plexusone/omnivoice-openai"
)

func main() {
    // Create client from environment variable
    client, err := openai.NewClientFromEnv()
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    // Transcribe audio
    resp, err := client.Transcribe(ctx, openai.TranscriptionRequest{
        Audio:    audioData,
        Filename: "audio.mp3",
    })
    if err != nil {
        log.Fatal(err)
    }
    log.Printf("Transcription: %s", resp.Text)

    // Synthesize speech
    ttsResp, err := client.Synthesize(ctx, openai.TTSRequest{
        Input: "Hello, world!",
        Voice: openai.VoiceAlloy,
    })
    if err != nil {
        log.Fatal(err)
    }
    // ttsResp.Audio contains the MP3 audio data
}
OmniVoice Provider Usage
package main

import (
    "context"

    "github.com/plexusone/omnivoice-core/stt"
    "github.com/plexusone/omnivoice-core/tts"
    openaistt "github.com/plexusone/omnivoice-openai/omnivoice/stt"
    openaitts "github.com/plexusone/omnivoice-openai/omnivoice/tts"
)

func main() {
    ctx := context.Background()

    // Create STT provider
    sttProvider := openaistt.NewProvider()
    transcription, err := sttProvider.Transcribe(ctx, audioData)

    // Create TTS provider
    ttsProvider := openaitts.NewProvider()
    audio, err := ttsProvider.Synthesize(ctx, "Hello, world!")
}

Configuration

Set the OPENAI_API_KEY environment variable or pass the API key directly:

client := openai.NewClient("your-api-key")

Available Voices

Voice Description
alloy Neutral, balanced
ash Warm, engaging
ballad Melodic, expressive
coral Clear, articulate
echo Smooth, natural
fable Storytelling, dramatic
nova Bright, energetic
onyx Deep, resonant
sage Calm, wise
shimmer Light, cheerful
verse Poetic, rhythmic
marin Friendly, approachable
cedar Grounded, trustworthy

License

MIT License - see LICENSE for details.

Documentation

Overview

Package openai provides a Go client for OpenAI's audio APIs (Whisper STT and TTS).

Index

Constants

View Source
const (
	VoiceAlloy   = "alloy"
	VoiceAsh     = "ash"
	VoiceBallad  = "ballad"
	VoiceCoral   = "coral"
	VoiceEcho    = "echo"
	VoiceFable   = "fable"
	VoiceOnyx    = "onyx"
	VoiceNova    = "nova"
	VoiceSage    = "sage"
	VoiceShimmer = "shimmer"
	VoiceVerse   = "verse"
	VoiceMarin   = "marin"
	VoiceCedar   = "cedar"
)

Voice constants for TTS.

View Source
const (
	ModelWhisper1 = "whisper-1"
	ModelTTS1     = "tts-1"
	ModelTTS1HD   = "tts-1-hd"
)

Model constants.

Variables

This section is empty.

Functions

This section is empty.

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client wraps the OpenAI API client for audio operations.

func NewClient

func NewClient(apiKey string) *Client

NewClient creates a new OpenAI client with the given API key.

func NewClientFromEnv

func NewClientFromEnv() (*Client, error)

NewClientFromEnv creates a new OpenAI client using the OPENAI_API_KEY environment variable.

func (*Client) Synthesize

func (c *Client) Synthesize(ctx context.Context, req TTSRequest) (*TTSResponse, error)

Synthesize converts text to speech.

func (*Client) SynthesizeStream

func (c *Client) SynthesizeStream(ctx context.Context, req TTSRequest) (io.ReadCloser, error)

SynthesizeStream converts text to speech with streaming output.

func (*Client) Transcribe

Transcribe converts audio to text using Whisper.

func (*Client) TranscribeFile

func (c *Client) TranscribeFile(ctx context.Context, filePath string, req TranscriptionRequest) (*TranscriptionResponse, error)

TranscribeFile transcribes audio from a file path.

type Segment

type Segment struct {
	ID               int64   `json:"id"`
	Seek             int64   `json:"seek"`
	Start            float64 `json:"start"`
	End              float64 `json:"end"`
	Text             string  `json:"text"`
	Temperature      float64 `json:"temperature"`
	AvgLogprob       float64 `json:"avg_logprob"`
	CompressionRatio float64 `json:"compression_ratio"`
	NoSpeechProb     float64 `json:"no_speech_prob"`
}

Segment represents a transcription segment.

type TTSRequest

type TTSRequest struct {
	// Input is the text to convert to speech (max 4096 characters).
	Input string

	// Model is the TTS model: "tts-1" (faster) or "tts-1-hd" (higher quality).
	Model string

	// Voice is the voice to use: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse, marin, cedar.
	Voice string

	// ResponseFormat is the audio format: "mp3", "opus", "aac", "flac", "wav", "pcm".
	ResponseFormat string

	// Speed is the speech speed (0.25 to 4.0, default 1.0).
	Speed float64
}

TTSRequest configures a text-to-speech request.

type TTSResponse

type TTSResponse struct {
	// Audio is the generated audio data.
	Audio []byte

	// Format is the audio format.
	Format string
}

TTSResponse contains the generated audio.

type TranscriptionRequest

type TranscriptionRequest struct {
	// Audio is the audio data to transcribe.
	Audio []byte

	// Filename is the name of the audio file (used for format detection).
	Filename string

	// Model is the Whisper model to use (default: "whisper-1").
	Model string

	// Language is the language of the audio (ISO-639-1 code, e.g., "en").
	// Leave empty for automatic detection.
	Language string

	// Prompt is optional text to guide the model's style or continue a previous segment.
	Prompt string

	// ResponseFormat is the output format: "json", "text", "srt", "verbose_json", "vtt".
	ResponseFormat string

	// Temperature is the sampling temperature (0-1). Lower is more deterministic.
	Temperature float64

	// TimestampGranularities specifies timestamp detail: "word", "segment", or both.
	TimestampGranularities []string
}

TranscriptionRequest configures a Whisper transcription request.

type TranscriptionResponse

type TranscriptionResponse struct {
	// Text is the transcribed text.
	Text string

	// Language is the detected language.
	Language string

	// Duration is the audio duration in seconds.
	Duration float64

	// Words contains word-level timestamps (if requested).
	Words []WordTimestamp

	// Segments contains segment-level timestamps (if requested).
	Segments []Segment
}

TranscriptionResponse contains the Whisper transcription result.

type WordTimestamp

type WordTimestamp struct {
	Word  string  `json:"word"`
	Start float64 `json:"start"`
	End   float64 `json:"end"`
}

WordTimestamp represents a word with timing information.

Directories

Path Synopsis
Package omnivoice provides omnivoice STT and TTS provider implementations for OpenAI.
Package omnivoice provides omnivoice STT and TTS provider implementations for OpenAI.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL