Documentation
¶
Overview ¶
This is the higher-level speech-to-text whisper.cpp API for go
Index ¶
Constants ¶
const SampleBits = whisper.SampleBits
SampleBits is the number of bytes per sample.
const SampleRate = whisper.SampleRate
SampleRate is the sample rate of the audio data.
Variables ¶
var ( ErrUnableToLoadModel = errors.New("unable to load model") ErrInternalAppError = errors.New("internal application error") ErrProcessingFailed = errors.New("processing failed") ErrUnsupportedLanguage = errors.New("unsupported language") ErrModelNotMultilingual = errors.New("model is not multilingual") )
Functions ¶
This section is empty.
Types ¶
type Context ¶
type Context interface { SetLanguage(string) error // Set the language to use for speech recognition, use "auto" for auto detect language. SetTranslate(bool) // Set translate flag IsMultilingual() bool // Return true if the model is multilingual. Language() string // Get language DetectedLanguage() string // Get detected language SetOffset(time.Duration) // Set offset SetDuration(time.Duration) // Set duration SetThreads(uint) // Set number of threads to use SetSplitOnWord(bool) // Set split on word flag SetTokenThreshold(float32) // Set timestamp token probability threshold SetTokenSumThreshold(float32) // Set timestamp token sum probability threshold SetMaxSegmentLength(uint) // Set max segment length in characters SetTokenTimestamps(bool) // Set token timestamps flag SetMaxTokensPerSegment(uint) // Set max tokens per segment (0 = no limit) SetAudioCtx(uint) // Set audio encoder context SetMaxContext(n int) // Set maximum number of text context tokens to store SetBeamSize(n int) // Set Beam Size SetEntropyThold(t float32) // Set Entropy threshold SetInitialPrompt(prompt string) // Set initial prompt SetTemperature(t float32) // Set temperature SetTemperatureFallback(t float32) // Set temperature incrementation // Process mono audio data and return any errors. // If defined, newly generated segments are passed to the // callback function during processing. Process([]float32, EncoderBeginCallback, SegmentCallback, ProgressCallback) error // After process is called, return segments until the end of the stream // is reached, when io.EOF is returned. NextSegment() (Segment, error) IsBEG(Token) bool // Test for "begin" token IsSOT(Token) bool // Test for "start of transcription" token IsEOT(Token) bool // Test for "end of transcription" token IsPREV(Token) bool // Test for "start of prev" token IsSOLM(Token) bool // Test for "start of lm" token IsNOT(Token) bool // Test for "No timestamps" token IsLANG(Token, string) bool // Test for token associated with a specific language IsText(Token) bool // Test for text token // Timings PrintTimings() ResetTimings() SystemInfo() string }
Context is the speech recognition context.
type EncoderBeginCallback ¶
type EncoderBeginCallback func() bool
EncoderBeginCallback is the callback function for checking if we want to continue processing. It is called during the Process function
type Model ¶
type Model interface { io.Closer // Return a new speech-to-text context. NewContext() (Context, error) // Return true if the model is multilingual. IsMultilingual() bool // Return all languages supported. Languages() []string }
Model is the interface to a whisper model. Create a new model with the function whisper.New(string)
type ProgressCallback ¶
type ProgressCallback func(int)
ProgressCallback is the callback function for reporting progress during processing. It is called during the Process function
type Segment ¶
type Segment struct { // Segment Number Num int // Time beginning and end timestamps for the segment. Start, End time.Duration // The text of the segment. Text string // The tokens of the segment. Tokens []Token }
Segment is the text result of a speech recognition.
type SegmentCallback ¶
type SegmentCallback func(Segment)
SegmentCallback is the callback function for processing segments in real time. It is called during the Process function