aurora

package module
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2018 License: MIT Imports: 4 Imported by: 0

README

Aurora Golang SDK

Overview

Aurora is the enterprise end-to-end speech solution. This Golang SDK will allow you to quickly and easily use the Aurora service to integrate voice capabilities into your application.

The SDK is currently in a pre-alpha release phase. Bugs and limited functionality should be expected.

Installation

The Recommended Golang version is 1.9+

The Go SDK currently does not bundle the necessary system headers and binaries to interact with audio hardware in a cross-platform manner. For this reason, before using the SDK, you need to install PortAudio. The Go binding we use needs to link the headers from PortAudio, so you'll also need pkg-config.

macOS
$ brew install portaudio pkg-config
$ go get -u github.com/auroraapi/aurora-go
Linux
$ sudo apt-get install libportaudio-dev pkg-config
$ go get -u github.com/auroraapi/aurora-go

This will install PortAudio and pkg-config. Use yum if your distribution uses RPM-based packages. If your distribution does not have PortAudio in its repository, install PortAudio via source.

Basic Usage

First, make sure you have an account with Aurora and have created an Application.

Text to Speech (TTS)
package main

import (
  "github.com/auroraapi/aurora-go"
)

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  // Create a Text object and query the TTS service
  speech, err := aurora.NewText("Hello world").Speech()
  if err != nil {
    return
  }

  // Play the resulting audio...
  speech.Audio.Play()

  // ...or save it into a file
  speech.Audio.WriteToFile("test.wav")
}
Speech to Text (STT)
Convert a WAV file to Speech
package main

import (
  "fmt"
  "github.com/auroraapi/aurora-go"
  "github.com/auroraapi/aurora-go/audio"
)

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  // Load a WAV file
  audio, err := audio.NewFileFromFileName("test.wav")
  if err != nil {
    return
  }

  speech := aurora.NewSpeech(audio)
  text, err := speech.Text()
  if err != nil {
    return
  }

  fmt.Printf("Transcribed: %s\n", text.Text)
}
Convert a previous Text API call to Speech
package main

import (
  "fmt"
  "github.com/auroraapi/aurora-go"
)

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  // Call the TTS API to convert "Hello world" to speech
  speech, err := aurora.NewText("Hello world").Speech()
  if err != nil {
    return
  }

  // Convert the generated speech back to text
  text, err := speech.Text()
  if err != nil {
    return
  }

  fmt.Println(text.Text) // "hello world"
}
Listen for a specified amount of time
package main

import (
  "fmt"
  "github.com/auroraapi/aurora-go"
)

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  // Create listen parameters. You should call this method so that the default
  // values get set. Then override them with whatever you want
  params := aurora.NewListenParams()
  params.Length = 3.0

  // Listen for 3 seconds
  speech, err := aurora.Listen(params)
  if err != nil {
    return
  }

  // Convert the recorded speech to text
  text, err := speech.Text()
  if err != nil {
    return
  }

  fmt.Println(text.Text)
}
Listen for an unspecified amount of time

Calling this API will start listening and will automatically stop listening after a certain amount of silence (default is 0.5 seconds).

package main

import (
  "fmt"
  "github.com/auroraapi/aurora-go"
)

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  // Create listen parameters. You should call this method so that the default
  // values get set. Then override them with whatever you want
  params := aurora.NewListenParams()
  params.SilenceLen = 1.0

  // Listen until 1 second of silence
  speech, err := aurora.Listen(params)
  if err != nil {
    return
  }

  // Convert the recorded speech to text
  text, err := speech.Text()
  if err != nil {
    return
  }

  fmt.Println(text.Text)
}
Continuously listen

Continuously listen and retrieve speech segments. Note: you can do anything with these speech segments, but here we'll convert them to text. Just like the previous example, these segments are demarcated by silence (0.5 seconds by default) and can be changed by setting the SilenceLen parameter. Additionally, you can make these segments fixed length (as in the example before the previous) by setting the Length parameter.

package main

import (
  "fmt"
  "github.com/auroraapi/aurora-go"
)

// this callback is passed to ContinuouslyListen. It is called every time a
// Speech object is available. The return value specifies whether or not we
// should continue to listen (true if so, false otherwise)
func listenCallback(s *aurora.Speech, err error) bool {
  if err != nil {
    // returning false in this function will stop listening
    // and quit ContinuouslyListen
    return false
  }

  // convert detected speech to text
  text, err := s.Text()
  if err != nil {
    return false
  }

  fmt.Println(text.Text)

  // Continue listening
  return true
}

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  // Create listen parameters. You should call this method so that the default
  // values get set. Then override them with whatever you want
  params := aurora.NewListenParams()

  // Continuously listen and convert to speech (blocks) with default params
  aurora.ContinuouslyListen(params, listenCallback)

  // Reduce the amount of silence in between speech segments
  params.SilenceLen = 0.5
  aurora.ContinuouslyListen(params, listenCallback)

  // Fixed-length speech segments of 3 seconds (overrides SilenceLen parameter)
  params.Length = 3.0
  aurora.ContinuouslyListen(params, listenCallback)
}
Listen and Transcribe

If you already know that you wanted the recorded speech to be converted to text, you can do it in one step, reducing the amount of code you need to write and also reducing latency. Using the ListenAndTranscribe method, the audio that is recorded automatically starts uploading as soon as you call the method and transcription begins. When the audio recording ends, you get back the final transcription.

package main

import (
  "fmt"
  "github.com/auroraapi/aurora-go"
)

// this callback is passed to ContinuouslyListenAndTranscribe. It is called
// every time a Text object is available. The return value specifies whether
// or not we should continue to listen (true if so, false otherwise)
func listenCallback(t *aurora.Text, err error) bool {
  if err != nil {
    // returning false in this function will stop listening
    // and quit ContinuouslyListen
    return false
  }

  // Print and continue listening
  fmt.Println(t.Text)
  return true
}

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  // Create listen parameters. You should call this method so that the default
  // values get set. Then override them with whatever you want
  params := aurora.NewListenParams()

  // Listen and transcribe once
  t, err := aurora.ListenAndTranscribe(params)
  listenCallback(t, err)

  // Continuously listen. while recording, this method also streams the data
  // to the backend. Once recording is finished, a transcript is almost
  // instantly available. The callback here receives an *aurora.Text (as opposed
  // to the *aurora.Speech object in regular ContinuouslyListen).
  aurora.ContinuouslyListenAndTranscribe(params, listenCallback)
}
Listen and echo example
package main

import (
  "fmt"
  "github.com/auroraapi/aurora-go"
)

func listenCallback(t *aurora.Text, err error) bool {
  if err != nil {
    return false
  }

  // Perform STT on the transcribed text
  s, err := t.Speech()
  if err != nil {
    return false
  }

  // Speak and continue listening
  s.Audio.Play()
  return true
}

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  params := aurora.NewListenParams()
  aurora.ContinuouslyListenAndTranscribe(params, listenCallback)
}
Interpret (Language Understanding)

The interpret service allows you to take any Aurora Text object and understand the user's intent and extract additional query information. Interpret can only be called on Text objects and return Interpret objects after completion. To convert a user's speech into and Interpret object, it must be converted to text first.

Basic example
package main

import (
  "fmt"
  "github.com/auroraapi/aurora-go"
)

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  // Create a Text object
  text := aurora.NewText("What is the time in Los Angeles?")

  // Call the interpret service
  i, err := text.Interpret()
  if err != nil {
    return
  }

  // Print the detected intent (string) and entities (map[string]string)
  fmt.Printf("Intent:   %s\nEntities: %v\n", i.Intent, i.Entities)

  // This should print:
  // Intent:   time
  // Entities: map[location: los angeles]
}
User query example
package main

import (
  "bufio"
  "fmt"
  "os"
  "github.com/auroraapi/aurora-go"
)

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  // Read line-by-line from stdin
  r := bufio.NewReader(os.Stdin)
  for {
    t, _ := r.ReadString('\n')

    // Interpret what the user type
    i, err := aurora.NewText(t).Interpret()
    if err != nil {
      break
    }

    // Print out the intent and entities
    fmt.Printf("%s %v\n", i.Intent, i.Entities)
  }
}
Smart Lamp

This example shows how easy it is to voice-enable a smart lamp. It responds to queries in the form of "turn on the lights" or "turn off the lamp". You define what object you're listening for (so that you can ignore queries like "turn on the music").

package main

import (
  "github.com/auroraapi/aurora-go"
)

// handle what the user said
func listenCallback(t *aurora.Text, err error) bool {
  if err != nil {
    return true
  }
  i, err := t.Interpret()
  if err != nil {
    return true
  }

  intent := i.Intent
  object := i.Entities["object"]
  validWords := []string{ "light", "lights", "lamp" }

  for _, word := range validWords {
    if object == word {
      if intent == "turn_on" {
        // turn on the lamp
      } else if intent == "turn_off" {
        // turn off the lamp
      }

      break
    }
  }
  return true
}

func main() {
  // Set your application settings
  aurora.Config.AppID = "YOUR_APP_ID"
  aurora.Config.AppToken = "YOUR_APP_TOKEN"
  aurora.Config.DeviceID = "YOUR_DEVICE_ID"

  params := aurora.NewListenParams()
  aurora.ContinuouslyListenAndTranscribe(params, listenCallback)
}

Documentation

Overview

Package aurora is an SDK to interact with the Aurora API, making it easy to integrate voice user interfaces into your application.

Index

Constants

View Source
const (
	// ListenDefaultLength is the default length of time (in seconds) to listen for.
	ListenDefaultLength = 0.0
	// ListenDefaultSilenceLen is the default amount of silence (in seconds)
	// that the recording framework will allow before stoping.
	ListenDefaultSilenceLen = 0.5
)

Variables

View Source
var Config = config.C

Config is an alias for config.C so that you can easily override the SDK configuation by typing something like:

aurora.Config.AppID    = "My ID"
aurora.Config.AppToken = "My Token"
aurora.Config.DeviceID = "My Device"

Functions

func ContinuouslyListen

func ContinuouslyListen(params *ListenParams, handleFunc SpeechHandleFunc)

This function accepts another function as an argument that is called each time a speech utterance is decoded. See the documentation for `SpeechHandleFunc` for more information.

func ContinuouslyListenAndTranscribe

func ContinuouslyListenAndTranscribe(params *ListenParams, handleFunc TextHandleFunc)

ContinuouslyListenAndTranscribe is a combination of `ContinuouslyListen` and `ListenAndTranscribe`. See the documentation for those two functions to understand how it works. The difference is that this handler function receives objects of type *Text instead of *Speech. See the documentation for `TextHandleFunc` for more information on that.

Types

type Interpret

type Interpret struct {
	// Intent represents the intent of the user. This can be an empty string if
	// the intent of the user was unclear. Otherwise, it will be one of the
	// pre-determined values listed in the Aurora dashboard.
	Intent string
	// Entities contain auxiliary information about the user's utterance. This
	// can be an empty map if no such information was detected. Otherwise, it
	// will be a key-value listing according to the entities described on the
	// Aurora dashboard.
	Entities map[string]string
}

Interpret contains the results from a call to the Aurora Interpret service.

func NewInterpret

func NewInterpret(res *api.InterpretResponse) *Interpret

NewInterpret takes a response from the API and creates an Interpet object out of it. It doesn't really make sense for a developer to call this, but it is left exported in case it makes sense in the future.

type ListenParams

type ListenParams struct {
	// Length specifies how long to listen for in seconds. A value of 0
	// means that the recording framework will continue to listen until
	// the specified amount of silence. A value greater than 0 will
	// override any value set to `SilenceLen`
	Length float64
	// SilenceLen is how long of silence (in seconds) will be allowed before
	// automatically stopping. This value is only taken into consideration if
	// `Length` is 0
	SilenceLen float64
}

ListenParams configures how the recording framework should listen for speech.

func NewListenParams

func NewListenParams() *ListenParams

NewListenParams creates the default set of ListenParams. You should call this function to get the default and then replace the ones you want to customize.

type Speech

type Speech struct {
	// Audio is the underlying audio that this struct wraps. You can create
	// a speech object and set this directly if you want to operate on some
	// pre-recorded audio
	Audio *audio.File
}

Speech represents a user's utterance. It has high-level methods that let you operate on the speech (like convert it to text) and also allows you to access the underlying audio data to manipulate it, save it, play it, etc.

func Listen

func Listen(params *ListenParams) (*Speech, error)

`Listen` takes in `ListenParams` and generates a speech object based on those parameters by recording from the default input device.

Note that the `ListenParams` is expected to contain values for every field, including defaults for fields that you did not want to change. To avoid having to do this, you should call `NewListenParams` to obtain an instance of `ListenParams` with all of the default filled out, and then over- ride them with the ones you want to change. Alternatively, you can pass `nil` to simply use the default parameters.

Currently, this function uses the default audio input interface (an option to change this will be available at a later time).

func NewSpeech

func NewSpeech(newAudio *audio.File) *Speech

NewSpeech creates a Speech object from the given audio file.

func (*Speech) Text

func (t *Speech) Text() (*Text, error)

Text calls the Aurora STT API and converts a user's utterance into a text transcription. This is populated into a `Text` object, allowing you to chain and combine high-level abstractions.

type SpeechHandleFunc

type SpeechHandleFunc func(s *Speech, err error) bool

SpeechHandleFunc is the type of function that is passed to `ContinuouslyListen`. It is called every time a speech utterance is decoded and passed to the function. If there was an error, that is passed as well. The function must return a boolean that indicates whether or not to continue listening (true to continue listening, false to stop listening).

type Text

type Text struct {
	// Text is the actual text that this object encapsulates
	Text string
}

Text encapsulates some text, whether it is obtained from STT, a user input, or generated programmatically, and allows high-level operations to be conducted and chained on it (like converting to speech, or calling Interpret).

func ListenAndTranscribe

func ListenAndTranscribe(params *ListenParams) (*Text, error)

ListenAndTranscribe starts listening with the given parameters, except instead of waiting for the audio to finish capturing and returning a Speech object, it directly streams it to the API, transcribing it in real-time. When the transcription completes, this function directly returns a Text object. This reduces latency by a significant amount if you already know you want to transcribe the audio.

Note that the `ListenParams` is expected to contain values for every field, including defaults for fields that you did not want to change. To avoid having to do this, you should call `NewListenParams` to obtain an instance of `ListenParams` with all of the default filled out, and then override them with the ones you want to change. Alternatively, you can pass `nil` to simply use the default parameters.

func NewText

func NewText(text string) *Text

NewText creates a Text object from the given text.

func (*Text) Interpret

func (t *Text) Interpret() (*Interpret, error)

Interpret calls the Aurora Interpret service on the text encapsulated in this object and converts it to an `Interpret` object, which contains the results from the API call.

func (*Text) Speech

func (t *Text) Speech() (*Speech, error)

Speech calls the Aurora TTS service on the text encapsulated in this object and converts it to a `Speech` object. Further operations can then be done on it, such as saving to file or speaking the resulting audio.

type TextHandleFunc

type TextHandleFunc func(t *Text, err error) bool

TextHandleFunc is the type of function that is passed to `ContinuouslyListenAndTranscribe`. It is called every time a speech utterance is decoded and converted to text. The resulting text object is passed to the function. If there was an error, that is passed as well. The function must return a boolean that indicates whether or not to continue listening (true to continue listening, false to stop listening).

Directories

Path Synopsis
api
Package api provides methods to contact the Aurora API and process the given requests.
Package api provides methods to contact the Aurora API and process the given requests.
backend
Package backend provides a generic interface to make calls to the Aurora backend.
Package backend provides a generic interface to make calls to the Aurora backend.
Package audio contains structs and functions to allow operating on audio data
Package audio contains structs and functions to allow operating on audio data
Package config contains configuration information that is used by the rest of the SDK.
Package config contains configuration information that is used by the rest of the SDK.
Package error is a collection of error-related functionality that aims to unify all of the errors across the SDK.
Package error is a collection of error-related functionality that aims to unify all of the errors across the SDK.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL