sherpa_ncnn

package

v2.1.11+incompatible Latest Latest Go to latest Published: Apr 15, 2025 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/k2-fsa/sherpa-ncnn

Links

Open Source Insights

Documentation ¶

Overview ¶

Speech recognition with Next-gen Kaldi.

sherpa-ncnn is an open-source speech recognition framework for Next-gen Kaldi. It depends only on ncnn, supporting both streaming and non-streaming speech recognition.

It does not need to access the network during recognition and everything runs locally.

It supports a variety of platforms, such as Linux (x86_64, aarch64, arm), Windows (x86_64, x86), macOS (x86_64, arm64), RISC-V, etc.

Usage examples:

Real-time speech recognition from a microphone
Please see https://github.com/k2-fsa/sherpa-ncnn/tree/master/go-api-examples/real-time-speech-recognition-from-microphone
Decode a file
Please see https://github.com/k2-fsa/sherpa-ncnn/tree/master/go-api-examples/decode-file

Index ¶

func DeleteRecognizer(recognizer *Recognizer)
func DeleteStream(stream *Stream)
type DecoderConfig
type FeatureConfig
type ModelConfig
type Recognizer
- func NewRecognizer(config *RecognizerConfig) *Recognizer
type RecognizerConfig
type RecognizerResult
type Stream
- func NewStream(recognizer *Recognizer) *Stream
- func (s *Stream) AcceptWaveform(sampleRate int, samples []float32)
- func (s *Stream) InputFinished()

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func DeleteRecognizer ¶

func DeleteRecognizer(recognizer *Recognizer)

Free the internal pointer inside the recognizer to avoid memory leak.

func DeleteStream ¶

func DeleteStream(stream *Stream)

Delete the internal pointer inside the stream to avoid memory leak.

Types ¶

type DecoderConfig ¶

type DecoderConfig struct {
	// Decoding method. Supported values are:
	// greedy_search, modified_beam_search
	DecodingMethod string

	// Number of active paths for modified_beam_search.
	// It is ignored when decoding_method is greedy_search.
	NumActivePaths int
}

Configuration for the beam search decoder

type FeatureConfig ¶

type FeatureConfig struct {
	// Sample rate expected by the model. It is 16000 for all
	// pre-trained models provided by us
	SampleRate int
	// Feature dimension expected by the model. It is 80 for all
	// pre-trained models provided by us
	FeatureDim int
}

Configuration for the feature extractor

type ModelConfig ¶

type ModelConfig struct {
	EncoderParam string // Path to the encoder.ncnn.param
	EncoderBin   string // Path to the encoder.ncnn.bin
	DecoderParam string // Path to the decoder.ncnn.param
	DecoderBin   string // Path to the decoder.ncnn.bin
	JoinerParam  string // Path to the joiner.ncnn.param
	JoinerBin    string // Path to the joiner.ncnn.bin
	Tokens       string // Path to tokens.txt
	NumThreads   int    // Number of threads to use for neural network computation
}

Please refer to https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/ to download pre-trained models

type Recognizer ¶

type Recognizer struct {
	// contains filtered or unexported fields
}

The online recognizer class. It wraps a pointer from C.

func NewRecognizer ¶

func NewRecognizer(config *RecognizerConfig) *Recognizer

The user is responsible to invoke DeleteRecognizer() to free the returned recognizer to avoid memory leak

func (*Recognizer) Decode ¶

func (recognizer *Recognizer) Decode(s *Stream)

Decode the stream. Before calling this function, you have to ensure that recognizer.IsReady(s) returns true. Otherwise, you will be SAD.

You usually use it like below:

for recognizer.IsReady(s) {
  recognizer.Decode(s)
}

func (*Recognizer) GetResult ¶

func (recognizer *Recognizer) GetResult(s *Stream) *RecognizerResult

Get the current result of stream since the last invoke of Reset()

func (*Recognizer) IsEndpoint ¶

func (recognizer *Recognizer) IsEndpoint(s *Stream) bool

Return true if an endpoint is detected.

You usually use it like below:

if recognizer.IsEndpoint(s) {
   // do your own stuff after detecting an endpoint

   recognizer.Reset(s)
}

func (*Recognizer) IsReady ¶

func (recognizer *Recognizer) IsReady(s *Stream) bool

Check whether the stream has enough feature frames for decoding. Return true if this stream is ready for decoding. Return false otherwise.

You will usually use it like below:

for recognizer.IsReady(s) {
   recognizer.Decode(s)
}

func (*Recognizer) Reset ¶

func (recognizer *Recognizer) Reset(s *Stream)

After calling this function, the internal neural network model states are reset and IsEndpoint(s) would return false. GetResult(s) would also return an empty string.

type RecognizerConfig ¶

type RecognizerConfig struct {
	Feat    FeatureConfig
	Model   ModelConfig
	Decoder DecoderConfig

	EnableEndpoint int // 1 to enable endpoint detection.

	// Please see
	// https://k2-fsa.github.io/sherpa/ncnn/endpoint.html
	// for the meaning of Rule1MinTrailingSilence, Rule2MinTrailingSilence
	// and Rule3MinUtteranceLength.
	Rule1MinTrailingSilence float32
	Rule2MinTrailingSilence float32
	Rule3MinUtteranceLength float32

	HotwordsFile  string
	HotwordsScore float32
}

Configuration for the online/streaming recognizer.

type RecognizerResult ¶

type RecognizerResult struct {
	Text string
}

It contains the recognition result for a online stream.

type Stream ¶

type Stream struct {
	// contains filtered or unexported fields
}

The online stream class. It wraps a pointer from C.

func NewStream ¶

func NewStream(recognizer *Recognizer) *Stream

The user is responsible to invoke DeleteStream() to free the returned stream to avoid memory leak

func (*Stream) AcceptWaveform ¶

func (s *Stream) AcceptWaveform(sampleRate int, samples []float32)

Input audio samples for the stream.

sampleRate is the actual sample rate of the input audio samples. If it is different from the sample rate expected by the feature extractor, we will do resampling inside.

samples contains audio samples. Each sample is in the range [-1, 1]

func (*Stream) InputFinished ¶

func (s *Stream) InputFinished()

Signal that there will be no incoming audio samples. After calling this function, you cannot call Stream.AcceptWaveform any longer.

The main purpose of this function is to flush the remaining audio samples buffered inside for feature extraction.

Source Files ¶

View all Source files

sherpa_ncnn.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL