webrtcvad

package module

v0.0.0-...-66a4d20 Latest Latest Go to latest Published: Jan 9, 2025 License: BSD-3-Clause, MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/aflyinghusky/go-webrtcvad

Links

Open Source Insights

README ¶

A quick n' dirty Go port of py-webrtcvad Voice Activity Detector (VAD).

A VAD classifies a piece of audio data as being voiced or unvoiced. It can be useful for telephony and speech recognition.

The VAD that Google developed for the WebRTC project is reportedly one of the best available, being fast, modern and free.

Overview

A VAD classifies a piece of audio data as being voiced or unvoiced. It is highly beneficial for telephony systems (like FreeSWITCH) and speech recognition pipelines, particularly when dealing with real-time audio streams.

The VAD originally developed for the WebRTC project by Google is one of the best available, being:

Fast Modern Free Installation Go-get the package. You don’t need to have WebRTC installed.


go get github.com/aflyingHusky/go-webrtcvad

Usage Example: Processing Audio in FreeSWITCH Environments This example demonstrates how to use the VAD with raw audio data, tailored for PBX systems:

package main

import (
	"fmt"
	"io"
	"log"

	"github.com/aflyingHusky/go-webrtcvad"
	"github.com/youpy/go-wav" // For reading WAV files
)

func main() {
	// Load audio (PCM, single-channel, 16-bit, 8kHz for FreeSWITCH)
	reader, err := wav.NewReader("test.wav")
	if err != nil {
		log.Fatal(err)
	}

	// Initialize VAD
	vad, err := webrtcvad.New()
	if err != nil {
		log.Fatal(err)
	}

	// Set VAD mode for PBX (mode 3 = very aggressive)
	if err := vad.SetMode(3); err != nil {
		log.Fatal(err)
	}

	rate := 8000 // FreeSWITCH-compatible sample rate (Hz)
	frame := make([]byte, 320) // 20ms frames for 8kHz audio

	// Validate rate and frame length
	if ok := vad.ValidRateAndFrameLength(rate, len(frame)); !ok {
		log.Fatal("invalid rate or frame length for FreeSWITCH")
	}

	for {
		_, err := io.ReadFull(reader, frame)
		if err == io.EOF || err == io.ErrUnexpectedEOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}

		// Process the audio frame
		active, err := vad.Process(rate, frame)
		if err != nil {
			log.Fatal(err)
		}

		fmt.Printf("Speech detected: %v\n", active)
	}
}

Adjustments for FreeSWITCH Sampling Rate: FreeSWITCH typically processes audio at 8000 Hz (8kHz), so the example uses this rate. Frame Size: The frame size is set to 320 bytes (20ms of audio for 8kHz) to match PBX real-time requirements. Aggressiveness Mode: Set to 3 (very aggressive) for environments with significant background noise, ensuring minimal false positives. Notes Ensure that your input audio conforms to PCM, mono, 16-bit, and matches the specified sample rate (e.g., 8kHz). For real-time use cases, this VAD can be integrated with FreeSWITCH modules for speech detection, such as mod_audio_fork or WebSocket audio streams. Future Enhancements Add FreeSWITCH-specific examples and integrations. Extend support for dynamic mode adjustments based on live call conditions. Optimize performance for concurrent call processing in PBX environments. Credits Original library by maxhawkins. WAV reader by youpy. License MIT License

Documentation ¶

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type VAD ¶

type VAD struct {
	// contains filtered or unexported fields
}

func New ¶

func New() (*VAD, error)

func (*VAD) Process ¶

func (v *VAD) Process(fs int, audioFrame []byte) (activeVoice bool, err error)

func (*VAD) SetMode ¶

func (v *VAD) SetMode(mode int) error

func (*VAD) ValidRateAndFrameLength ¶

func (v *VAD) ValidRateAndFrameLength(rate int, frameLength int) bool

Source Files ¶

View all Source files

vad.go

Directories ¶

Path	Synopsis
example

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL