yask

package module
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 20, 2022 License: MIT Imports: 11 Imported by: 0

README

English | Русский

Build Status Go Report Card

yask

Tools for work with the synthesis and speech recognition service Yandex Speech Kit (more about in https://cloud.yandex.ru/docs/speechkit/) for golang programming language. Used to synthesize speech from text and recognize text from a sound stream.

Before start to use, you must register at https://cloud.yandex.ru/ to get the API key and directory identifier (more about https://cloud.yandex.ru/docs).

Audio stream formats
Speech synthesis from text

As a result of the example, get a file in wav format, ready for playback in any player program. The default bitrate is 8000.

import (
	"log"
	"os"

	"github.com/fcg-xvii/go-tools/speech/yask"
)

func main() {
	yaFolderID := "b1g..."    // yandex folder id
	yaAPIKey := "AQVNy..."    // yandex api yandex
	text := "Hi It's test of speech synthesis" // text for synthesis

	// init config for synthesis (по умоланию установлен формат lpcm)
	config := yask.TTSDefaultConfigText(yaFolderID, yaAPIKey, text)

    // By default language in config russian. For english must setup 
    // english language and voice
    config.Lang = yask.LangEN
    config.Voice = yask.VoiceNick


	// speech synthesis
	r, err := yask.TextToSpeech(config)
	if err != nil {
		log.Println(err)
		return
	}

    // open file for save result
	f, err := os.OpenFile("tts.wav", os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0655)
	if err != nil {
		log.Println(err)
		return
	}
	defer f.Close()

    // lpcm encoding to wav format
	if err := yask.EncodePCMToWav(r, f, config.Rate, 16, 1); err != nil {
		log.Println(err)
		return
	}
}
Speech recognition to text

Example of recognition of short audio. The example uses a wav file that can be used with a configuration format value of lpcm

package main

import (
	"log"
	"os"

	"github.com/fcg-xvii/go-tools/speech/yask"
)

func main() {
	yaFolderID := "b1g4..." // yandex folder id
	yaAPIKey := "AQVNyr..." // yandex api key
	dataFileName := "data.wav" // audio file in wav format for recodnition to text

    // open audio file
	f, err := os.Open(dataFileName)
	if err != nil {
		log.Println(err)
		return
	}
	defer f.Close()

    // init config for recodnition
	config := yask.STTConfigDefault(yaFolderID, yaAPIKey, f)

    // setup english language
    config.Lang = yask.LangEN

    // recodnition speech to text
	text, err := yask.SpeechToTextShort(config)
	if err != nil {
		log.Println(err)
		return
	}

	log.Println(text)
}

License

The MIT License (MIT), see LICENSE.

Documentation

Index

Constants

View Source
const (
	// YaSTTUrl is url for send speech to text requests
	YaSTTUrl = "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize"

	// YaTTSUrl is url for send text to speech requests
	YaTTSUrl = "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize"

	// FormatLPCM is PCM audio format (wav) without wav header (more details in https://en.wikipedia.org/wiki/Pulse-code_modulation)
	FormatLPCM = "lpcm"
	// FormatOgg is audio ogg format
	FormatOgg = "oggopus"

	// Rate8k is rate of 8kHz
	Rate8k int = 8000
	// Rate16k is rate of 16kHz
	Rate16k int = 16000
	// Rate48k is rate of 48kHz
	Rate48k int = 48000

	// LangRU is russian language
	LangRU = "ru-Ru"
	// LangEN is english language
	LangEN = "en-US"
	// LangTR is turkish language
	LangTR = "tr-TR"

	// SpeedStandard is standart speed of voice (1.0)
	SpeedStandard float32 = 1.0
	// SpeedMostFastest is maximum speed voice (3.0)
	SpeedMostFastest float32 = 3.0
	// SpeedSlowest is minimum speed of voice (0.1)
	SpeedSlowest float32 = 0.1

	// VoiceOksana is Oksana voice (russian, female, standard)
	VoiceOksana = "oksana"
	// VoiceJane is Jane voice (russian, female, standard)
	VoiceJane = "jane"
	// VoiceOmazh is Omazh voice (russian, female, standard)
	VoiceOmazh = "omazh"
	// VoiceZahar is Zahar voice (russian, male, standard)
	VoiceZahar = "zahar"
	// VoiceErmil is Ermil voice (russian, male, standard)
	VoiceErmil = "ermil"
	// VoiceSilaerkan is Silaerkan voice (turkish, female, standard)
	VoiceSilaerkan = "silaerkan"
	// VoiceErkanyavas is Erkanyavas voice (turkish, male, standard)
	VoiceErkanyavas = "erkanyavas"
	// VoiceAlyss is Alyss voice (english, female, standard)
	VoiceAlyss = "alyss"
	// VoiceNick is Nick voice (engish, male, standard)
	VoiceNick = "nick"
	// VoiceAlena is Alena voice (russian, female, premium)
	VoiceAlena = "alena"
	// VoiceFilipp is Filipp voice (russian, male, premium)
	VoiceFilipp = "filipp"

	// EmotionGood is good voice emotion
	EmotionGood = "good"
	// EmotionEvil is evil voice emotion
	EmotionEvil = "evil"
	// EmotionNeutral is neutral voice emotion
	EmotionNeutral = "neutral"

	// TopicGeneral is current version of voice model (available in all languages)
	TopicGeneral = "general"
	// TopicGeneralRC is experimental version of voice model (russian language)
	TopicGeneralRC = "general:rc"
	// TopicGeneralDeprecated is deprecated version of voice model (russian language)
	TopicGeneralDeprecated = "general:deprecated"
	// TopicMaps is model for addresses anc company names
	TopicMaps = "maps"

	// SexAll is male and female
	SexAll = 0
	// SexMale is male
	SexMale = 1
	// SexFemale is female
	SexFemale = 2
)

Variables

This section is empty.

Functions

func EncodePCMToWav

func EncodePCMToWav(in io.Reader, out io.WriteSeeker, sampleRate, bitDepth, numChans int) error

EncodePCMToWav encode input stream of pcm audio format to wav and write to out stream

func SpeechToTextShort

func SpeechToTextShort(conf *STTConfig) (string, error)

SpeechToTextShort returns text from a PCM or OGG sound stream using the service Yandex Speech Kit

func TextToSpeech

func TextToSpeech(config *TTSConfig) (io.ReadCloser, error)

TextToSpeech returns PCM or OGG sound stream using the service Yandex Speech Kit. Result PCM stream can be converted to Wav stream using EncodePCMToWav

Types

type STTConfig

type STTConfig struct {
	Lang            string
	Topic           string
	ProfanityFilter bool
	Format          string
	Rate            int
	YaFolderID      string
	YaAPIKey        string
	Data            io.Reader
}

STTConfig is config for speech to text methods

func STTConfigDefault

func STTConfigDefault(yaFolderID, yaAPIKey string, data io.Reader) *STTConfig

STTConfigDefault returns STTConfig with default parameters

type TTSConfig

type TTSConfig struct {
	Text       string
	SSML       string
	Lang       string
	Voice      string
	Emotion    string
	Speed      float32
	Format     string
	Rate       int
	YaFolderID string
	YaAPIKey   string
}

TTSConfig is config for text to speeh method

func TTSDefaultConfigSSML

func TTSDefaultConfigSSML(yaFolderID, yaAPIKey, SSML string) *TTSConfig

TTSDefaultConfigSSML returns config with default parameters for raw text recognition and use in TextToSpeech method more details of SSML language in https://cloud.yandex.ru/docs/speechkit/tts/ssml

func TTSDefaultConfigText

func TTSDefaultConfigText(yaFolderID, yaAPIKey, text string) *TTSConfig

TTSDefaultConfigText returns config with default parameters for raw text recognition and use in TextToSpeech method

type Voice

type Voice struct {
	NameEn  string `json:"name_en"`
	MameRu  string `json:"name_ru"`
	Voice   string `json:"voice"`
	Lang    string `json:"lang"`
	Male    bool   `json:"is_male"`
	Premium bool   `json:"is_premium"`
}

Voice is struct of voice object into

func Voices

func Voices(lang string, sex, premium int) (res []Voice)

Voices returns slice of available vioces lang: empty (all alngs) ru-RU, en-EN, tr-TR sex: 0 - all, 1 - male, 2 - female premium: 0 - all, 1 - standard only, 2 - premium only

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL