Documentation
¶
Overview ¶
Speech recognition with Next-gen Kaldi.
sherpa-ncnn is an open-source speech recognition framework for Next-gen Kaldi. It depends only on ncnn, supporting both streaming and non-streaming speech recognition.
It does not need to access the network during recognition and everything runs locally.
It supports a variety of platforms, such as Linux (x86_64, aarch64, arm), Windows (x86_64, x86), macOS (x86_64, arm64), RISC-V, etc.
Usage examples:
Real-time speech recognition from a microphone
Decode a file
Please see https://github.com/k2-fsa/sherpa-ncnn/tree/master/go-api-examples/decode-file
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DeleteRecognizer ¶
func DeleteRecognizer(recognizer *Recognizer)
Free the internal pointer inside the recognizer to avoid memory leak.
func DeleteStream ¶
func DeleteStream(stream *Stream)
Delete the internal pointer inside the stream to avoid memory leak.
Types ¶
type DecoderConfig ¶
type DecoderConfig struct { // Decoding method. Supported values are: // greedy_search, modified_beam_search DecodingMethod string // Number of active paths for modified_beam_search. // It is ignored when decoding_method is greedy_search. NumActivePaths int }
Configuration for the beam search decoder
type FeatureConfig ¶
type FeatureConfig struct { // Sample rate expected by the model. It is 16000 for all // pre-trained models provided by us SampleRate int // Feature dimension expected by the model. It is 80 for all // pre-trained models provided by us FeatureDim int }
Configuration for the feature extractor
type ModelConfig ¶
type ModelConfig struct { EncoderParam string // Path to the encoder.ncnn.param EncoderBin string // Path to the encoder.ncnn.bin DecoderParam string // Path to the decoder.ncnn.param DecoderBin string // Path to the decoder.ncnn.bin JoinerParam string // Path to the joiner.ncnn.param JoinerBin string // Path to the joiner.ncnn.bin Tokens string // Path to tokens.txt NumThreads int // Number of threads to use for neural network computation }
Please refer to https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/ to download pre-trained models
type Recognizer ¶
type Recognizer struct {
// contains filtered or unexported fields
}
The online recognizer class. It wraps a pointer from C.
func NewRecognizer ¶
func NewRecognizer(config *RecognizerConfig) *Recognizer
The user is responsible to invoke DeleteRecognizer() to free the returned recognizer to avoid memory leak
func (*Recognizer) Decode ¶
func (recognizer *Recognizer) Decode(s *Stream)
Decode the stream. Before calling this function, you have to ensure that recognizer.IsReady(s) returns true. Otherwise, you will be SAD.
You usually use it like below:
for recognizer.IsReady(s) { recognizer.Decode(s) }
func (*Recognizer) GetResult ¶
func (recognizer *Recognizer) GetResult(s *Stream) *RecognizerResult
Get the current result of stream since the last invoke of Reset()
func (*Recognizer) IsEndpoint ¶
func (recognizer *Recognizer) IsEndpoint(s *Stream) bool
Return true if an endpoint is detected.
You usually use it like below:
if recognizer.IsEndpoint(s) { // do your own stuff after detecting an endpoint recognizer.Reset(s) }
func (*Recognizer) IsReady ¶
func (recognizer *Recognizer) IsReady(s *Stream) bool
Check whether the stream has enough feature frames for decoding. Return true if this stream is ready for decoding. Return false otherwise.
You will usually use it like below:
for recognizer.IsReady(s) { recognizer.Decode(s) }
func (*Recognizer) Reset ¶
func (recognizer *Recognizer) Reset(s *Stream)
After calling this function, the internal neural network model states are reset and IsEndpoint(s) would return false. GetResult(s) would also return an empty string.
type RecognizerConfig ¶
type RecognizerConfig struct { Feat FeatureConfig Model ModelConfig Decoder DecoderConfig EnableEndpoint int // 1 to enable endpoint detection. // Please see // https://k2-fsa.github.io/sherpa/ncnn/endpoint.html // for the meaning of Rule1MinTrailingSilence, Rule2MinTrailingSilence // and Rule3MinUtteranceLength. Rule1MinTrailingSilence float32 Rule2MinTrailingSilence float32 Rule3MinUtteranceLength float32 HotwordsFile string HotwordsScore float32 }
Configuration for the online/streaming recognizer.
type RecognizerResult ¶
type RecognizerResult struct {
Text string
}
It contains the recognition result for a online stream.
type Stream ¶
type Stream struct {
// contains filtered or unexported fields
}
The online stream class. It wraps a pointer from C.
func NewStream ¶
func NewStream(recognizer *Recognizer) *Stream
The user is responsible to invoke DeleteStream() to free the returned stream to avoid memory leak
func (*Stream) AcceptWaveform ¶
Input audio samples for the stream.
sampleRate is the actual sample rate of the input audio samples. If it is different from the sample rate expected by the feature extractor, we will do resampling inside.
samples contains audio samples. Each sample is in the range [-1, 1]
func (*Stream) InputFinished ¶
func (s *Stream) InputFinished()
Signal that there will be no incoming audio samples. After calling this function, you cannot call Stream.AcceptWaveform any longer.
The main purpose of this function is to flush the remaining audio samples buffered inside for feature extraction.