sherpa_ncnn

package
v2.1.11+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 15, 2025 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Overview

Speech recognition with Next-gen Kaldi.

sherpa-ncnn is an open-source speech recognition framework for Next-gen Kaldi. It depends only on ncnn, supporting both streaming and non-streaming speech recognition.

It does not need to access the network during recognition and everything runs locally.

It supports a variety of platforms, such as Linux (x86_64, aarch64, arm), Windows (x86_64, x86), macOS (x86_64, arm64), RISC-V, etc.

Usage examples:

  1. Real-time speech recognition from a microphone

    Please see https://github.com/k2-fsa/sherpa-ncnn/tree/master/go-api-examples/real-time-speech-recognition-from-microphone

  2. Decode a file

    Please see https://github.com/k2-fsa/sherpa-ncnn/tree/master/go-api-examples/decode-file

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DeleteRecognizer

func DeleteRecognizer(recognizer *Recognizer)

Free the internal pointer inside the recognizer to avoid memory leak.

func DeleteStream

func DeleteStream(stream *Stream)

Delete the internal pointer inside the stream to avoid memory leak.

Types

type DecoderConfig

type DecoderConfig struct {
	// Decoding method. Supported values are:
	// greedy_search, modified_beam_search
	DecodingMethod string

	// Number of active paths for modified_beam_search.
	// It is ignored when decoding_method is greedy_search.
	NumActivePaths int
}

Configuration for the beam search decoder

type FeatureConfig

type FeatureConfig struct {
	// Sample rate expected by the model. It is 16000 for all
	// pre-trained models provided by us
	SampleRate int
	// Feature dimension expected by the model. It is 80 for all
	// pre-trained models provided by us
	FeatureDim int
}

Configuration for the feature extractor

type ModelConfig

type ModelConfig struct {
	EncoderParam string // Path to the encoder.ncnn.param
	EncoderBin   string // Path to the encoder.ncnn.bin
	DecoderParam string // Path to the decoder.ncnn.param
	DecoderBin   string // Path to the decoder.ncnn.bin
	JoinerParam  string // Path to the joiner.ncnn.param
	JoinerBin    string // Path to the joiner.ncnn.bin
	Tokens       string // Path to tokens.txt
	NumThreads   int    // Number of threads to use for neural network computation
}

Please refer to https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/ to download pre-trained models

type Recognizer

type Recognizer struct {
	// contains filtered or unexported fields
}

The online recognizer class. It wraps a pointer from C.

func NewRecognizer

func NewRecognizer(config *RecognizerConfig) *Recognizer

The user is responsible to invoke DeleteRecognizer() to free the returned recognizer to avoid memory leak

func (*Recognizer) Decode

func (recognizer *Recognizer) Decode(s *Stream)

Decode the stream. Before calling this function, you have to ensure that recognizer.IsReady(s) returns true. Otherwise, you will be SAD.

You usually use it like below:

for recognizer.IsReady(s) {
  recognizer.Decode(s)
}

func (*Recognizer) GetResult

func (recognizer *Recognizer) GetResult(s *Stream) *RecognizerResult

Get the current result of stream since the last invoke of Reset()

func (*Recognizer) IsEndpoint

func (recognizer *Recognizer) IsEndpoint(s *Stream) bool

Return true if an endpoint is detected.

You usually use it like below:

if recognizer.IsEndpoint(s) {
   // do your own stuff after detecting an endpoint

   recognizer.Reset(s)
}

func (*Recognizer) IsReady

func (recognizer *Recognizer) IsReady(s *Stream) bool

Check whether the stream has enough feature frames for decoding. Return true if this stream is ready for decoding. Return false otherwise.

You will usually use it like below:

for recognizer.IsReady(s) {
   recognizer.Decode(s)
}

func (*Recognizer) Reset

func (recognizer *Recognizer) Reset(s *Stream)

After calling this function, the internal neural network model states are reset and IsEndpoint(s) would return false. GetResult(s) would also return an empty string.

type RecognizerConfig

type RecognizerConfig struct {
	Feat    FeatureConfig
	Model   ModelConfig
	Decoder DecoderConfig

	EnableEndpoint int // 1 to enable endpoint detection.

	// Please see
	// https://k2-fsa.github.io/sherpa/ncnn/endpoint.html
	// for the meaning of Rule1MinTrailingSilence, Rule2MinTrailingSilence
	// and Rule3MinUtteranceLength.
	Rule1MinTrailingSilence float32
	Rule2MinTrailingSilence float32
	Rule3MinUtteranceLength float32

	HotwordsFile  string
	HotwordsScore float32
}

Configuration for the online/streaming recognizer.

type RecognizerResult

type RecognizerResult struct {
	Text string
}

It contains the recognition result for a online stream.

type Stream

type Stream struct {
	// contains filtered or unexported fields
}

The online stream class. It wraps a pointer from C.

func NewStream

func NewStream(recognizer *Recognizer) *Stream

The user is responsible to invoke DeleteStream() to free the returned stream to avoid memory leak

func (*Stream) AcceptWaveform

func (s *Stream) AcceptWaveform(sampleRate int, samples []float32)

Input audio samples for the stream.

sampleRate is the actual sample rate of the input audio samples. If it is different from the sample rate expected by the feature extractor, we will do resampling inside.

samples contains audio samples. Each sample is in the range [-1, 1]

func (*Stream) InputFinished

func (s *Stream) InputFinished()

Signal that there will be no incoming audio samples. After calling this function, you cannot call Stream.AcceptWaveform any longer.

The main purpose of this function is to flush the remaining audio samples buffered inside for feature extraction.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL