Speech-to-text transcription using the Zerfoo OpenAI-compatible API server.
How it works
Loads a Whisper GGUF model via inference.LoadFile
Implements the serve.Transcriber interface to bridge the model to the API server
Starts an in-process HTTP server using serve.NewServer with serve.WithTranscriber
Sends the audio file to /v1/audio/transcriptions using multipart form upload
Prints the transcription result
This demonstrates embedding a full OpenAI-compatible transcription service inside a Go application. The same endpoint works with any OpenAI client library.
Prerequisites
Requires a Whisper-architecture GGUF model file.
Usage
go build -o audio-transcription ./examples/audio-transcription/
# Transcribe an audio file
./audio-transcription --model path/to/whisper.gguf --audio recording.wav
# With language hint
./audio-transcription --model path/to/whisper.gguf --audio recording.mp3 --language en
# With GPU
./audio-transcription --model path/to/whisper.gguf --device cuda --audio recording.wav
Command audio-transcription demonstrates speech-to-text using the Zerfoo
OpenAI-compatible API server.
It starts an in-process API server with a Whisper model, then sends an audio
file to the /v1/audio/transcriptions endpoint -- the same API that OpenAI
clients use. This shows how to embed a full transcription service inside a
Go application.
Usage:
go build -o audio-transcription ./examples/audio-transcription/
./audio-transcription --model path/to/whisper.gguf --audio recording.wav