Documentation
¶
Overview ¶
Package openai provides a Go client for OpenAI's audio APIs (Whisper STT and TTS).
Index ¶
- Constants
- type Client
- func (c *Client) Synthesize(ctx context.Context, req TTSRequest) (*TTSResponse, error)
- func (c *Client) SynthesizeStream(ctx context.Context, req TTSRequest) (io.ReadCloser, error)
- func (c *Client) Transcribe(ctx context.Context, req TranscriptionRequest) (*TranscriptionResponse, error)
- func (c *Client) TranscribeFile(ctx context.Context, filePath string, req TranscriptionRequest) (*TranscriptionResponse, error)
- type Segment
- type TTSRequest
- type TTSResponse
- type TranscriptionRequest
- type TranscriptionResponse
- type WordTimestamp
Constants ¶
const ( VoiceAlloy = "alloy" VoiceAsh = "ash" VoiceBallad = "ballad" VoiceCoral = "coral" VoiceEcho = "echo" VoiceFable = "fable" VoiceOnyx = "onyx" VoiceNova = "nova" VoiceSage = "sage" VoiceShimmer = "shimmer" VoiceVerse = "verse" VoiceMarin = "marin" VoiceCedar = "cedar" )
Voice constants for TTS.
const ( ModelWhisper1 = "whisper-1" ModelTTS1 = "tts-1" ModelTTS1HD = "tts-1-hd" )
Model constants.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client wraps the OpenAI API client for audio operations.
func NewClientFromEnv ¶
NewClientFromEnv creates a new OpenAI client using the OPENAI_API_KEY environment variable.
func (*Client) Synthesize ¶
func (c *Client) Synthesize(ctx context.Context, req TTSRequest) (*TTSResponse, error)
Synthesize converts text to speech.
func (*Client) SynthesizeStream ¶
func (c *Client) SynthesizeStream(ctx context.Context, req TTSRequest) (io.ReadCloser, error)
SynthesizeStream converts text to speech with streaming output.
func (*Client) Transcribe ¶
func (c *Client) Transcribe(ctx context.Context, req TranscriptionRequest) (*TranscriptionResponse, error)
Transcribe converts audio to text using Whisper.
func (*Client) TranscribeFile ¶
func (c *Client) TranscribeFile(ctx context.Context, filePath string, req TranscriptionRequest) (*TranscriptionResponse, error)
TranscribeFile transcribes audio from a file path.
type Segment ¶
type Segment struct {
ID int64 `json:"id"`
Seek int64 `json:"seek"`
Start float64 `json:"start"`
End float64 `json:"end"`
Text string `json:"text"`
Temperature float64 `json:"temperature"`
AvgLogprob float64 `json:"avg_logprob"`
CompressionRatio float64 `json:"compression_ratio"`
NoSpeechProb float64 `json:"no_speech_prob"`
}
Segment represents a transcription segment.
type TTSRequest ¶
type TTSRequest struct {
// Input is the text to convert to speech (max 4096 characters).
Input string
// Model is the TTS model: "tts-1" (faster) or "tts-1-hd" (higher quality).
Model string
// Voice is the voice to use: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse, marin, cedar.
Voice string
// ResponseFormat is the audio format: "mp3", "opus", "aac", "flac", "wav", "pcm".
ResponseFormat string
// Speed is the speech speed (0.25 to 4.0, default 1.0).
Speed float64
}
TTSRequest configures a text-to-speech request.
type TTSResponse ¶
type TTSResponse struct {
// Audio is the generated audio data.
Audio []byte
// Format is the audio format.
Format string
}
TTSResponse contains the generated audio.
type TranscriptionRequest ¶
type TranscriptionRequest struct {
// Audio is the audio data to transcribe.
Audio []byte
// Filename is the name of the audio file (used for format detection).
Filename string
// Model is the Whisper model to use (default: "whisper-1").
Model string
// Language is the language of the audio (ISO-639-1 code, e.g., "en").
// Leave empty for automatic detection.
Language string
// Prompt is optional text to guide the model's style or continue a previous segment.
Prompt string
// ResponseFormat is the output format: "json", "text", "srt", "verbose_json", "vtt".
ResponseFormat string
// Temperature is the sampling temperature (0-1). Lower is more deterministic.
Temperature float64
// TimestampGranularities specifies timestamp detail: "word", "segment", or both.
TimestampGranularities []string
}
TranscriptionRequest configures a Whisper transcription request.
type TranscriptionResponse ¶
type TranscriptionResponse struct {
// Text is the transcribed text.
Text string
// Language is the detected language.
Language string
// Duration is the audio duration in seconds.
Duration float64
// Words contains word-level timestamps (if requested).
Words []WordTimestamp
// Segments contains segment-level timestamps (if requested).
Segments []Segment
}
TranscriptionResponse contains the Whisper transcription result.
type WordTimestamp ¶
type WordTimestamp struct {
Word string `json:"word"`
Start float64 `json:"start"`
End float64 `json:"end"`
}
WordTimestamp represents a word with timing information.