Documentation
¶
Overview ¶
Package symbolset is used to define symbol sets, such as NST-SAMPA, Wikispeech-SAMPA, and so on.
Each symbol set is defined in a .sym file including each symbol's corresponding IPA representation:
DESCRIPTION SYMBOL IPA IPA UNICODE CATEGORY
Sample lines (Swedish Wikispeech SAMPA):
DESCRIPTION SYMBOL IPA IPA UNICODE CATEGORY sil i: iː U+0069U+02D0 Syllabic aula au a⁀ʊ U+0061U+2040U+028A Syllabic bok b b U+0062 NonSyllabic forna rn ɳ U+0273 NonSyllabic syllable delimiter . . U+002E SyllableDelimiter accent I " ˈ U+02C8 Stress accent II "" ˈ̀ U+02C8U+0300 Stress secondary stress % ˌ U+02CC Stress
Note that the header is required on the first line. As you can see in the examples, the IPA UNICODE is specified on the format U+<NUMBER> (no space between symbols in sequence).
Each symbol set has a name, extracted from the .sym file name.
Legal categories (pre-defined in code):
Syllabic: syllabic phonemes (typically vowels and syllabic consonants) NonSyllabic: non-syllabic phonemes (typically consonants) Stress: stress and accent symbols (primary, secondary, tone accents, etc) PhonemeDelimiter: phoneme delimiters (white space, empty string, etc) SyllableDelimiter: syllable delimiters MorphemeDelimiter: morpheme delimiters that need not align with morpheme boundaries in the decompounded orthography CompoundDelimiter: compound delimiters that should be aligned with compound boundaries in the decompounded orthography WordDelimiter: word delimiters
For real world examples (used for unit tests), see the test_data folder: https://github.com/stts-se/pronlex/tree/master/symbolset/test_data
Index ¶
- Variables
- func LoadSymbolSetsFromDir(dirName string) (map[string]SymbolSet, error)
- type IPASymbol
- type Symbol
- type SymbolCat
- type SymbolSet
- func LoadSymbolSet(fName string) (SymbolSet, error)
- func LoadSymbolSetWithName(name string, fName string) (SymbolSet, error)
- func NewSymbolSet(name string, symbols []Symbol) (SymbolSet, error)
- func NewSymbolSetWithTests(name string, symbols []Symbol, testLines []string, checkForDups bool) (SymbolSet, error)
- func (ss SymbolSet) ContainsSymbols(trans string, symbols []Symbol) (bool, error)
- func (ss SymbolSet) ConvertFromIPA(trans string) (string, error)
- func (ss SymbolSet) ConvertToIPA(trans string) (string, error)
- func (ss SymbolSet) Get(symbol string) (Symbol, error)
- func (ss SymbolSet) GetFromIPA(ipa string) (Symbol, error)
- func (ss SymbolSet) SplitIPATranscription(input string) ([]string, error)
- func (ss SymbolSet) SplitTranscription(input string) ([]string, error)
- func (ss SymbolSet) ValidIPASymbol(symbol string) bool
- func (ss SymbolSet) ValidSymbol(symbol string) bool
- type Type
Constants ¶
This section is empty.
Variables ¶
var SymbolSetSuffix = ".sym"
SymbolSetSuffix defines the filename extension for symbol sets
Functions ¶
Types ¶
type Symbol ¶
Symbol represent a phoneme, stress or delimiter symbol used in transcriptions, including the IPA symbol with unicode
type SymbolCat ¶
type SymbolCat int
SymbolCat is used to categorize transcription symbols.
const ( // Syllabic is used for syllabic phonemes (typically vowels and syllabic consonants) Syllabic SymbolCat = iota // NonSyllabic is used for non-syllabic phonemes (typically consonants) NonSyllabic // Stress is used for stress and accent symbols (primary, secondary, tone accents, etc) Stress // PhonemeDelimiter is used for phoneme delimiters (white space, empty string, etc) PhonemeDelimiter // SyllableDelimiter is used for syllable delimiters SyllableDelimiter // MorphemeDelimiter is used for morpheme delimiters that need not align with // morpheme boundaries in the decompounded orthography MorphemeDelimiter // CompoundDelimiter is used for compound delimiters that should be aligned // with compound boundaries in the decompounded orthography CompoundDelimiter // WordDelimiter is used for word delimiters WordDelimiter )
type SymbolSet ¶
type SymbolSet struct { Name string Type Type Symbols []Symbol // Phonemes: actual phonemes (syllabic non-syllabic) Phonemes []Symbol // PhoneticSymbols: Phonemes and stress PhoneticSymbols []Symbol PhonemeRe *regexp.Regexp SyllabicRe *regexp.Regexp NonSyllabicRe *regexp.Regexp SymbolRe *regexp.Regexp PhonemeDelimiter Symbol // contains filtered or unexported fields }
SymbolSet is a struct for package private usage. To create a new 'SymbolSet' instance, use NewSymbolSet
func LoadSymbolSet ¶
LoadSymbolSet loads a SymbolSet from file
func LoadSymbolSetWithName ¶
LoadSymbolSetWithName loads a SymbolSet from file, and names the SymbolSet
func NewSymbolSet ¶
NewSymbolSet is a constructor for 'symbols' with built-in error checks
func NewSymbolSetWithTests ¶
func NewSymbolSetWithTests(name string, symbols []Symbol, testLines []string, checkForDups bool) (SymbolSet, error)
NewSymbolSetWithTests is a constructor for 'symbols' with built-in error checks
func (SymbolSet) ContainsSymbols ¶
ContainsSymbols checks if a transcription contains a certain phoneme symbol
func (SymbolSet) ConvertFromIPA ¶
ConvertFromIPA maps one input IPA transcription into the current symbol set
func (SymbolSet) ConvertToIPA ¶
ConvertToIPA maps one input transcription string into an IPA transcription
func (SymbolSet) GetFromIPA ¶
GetFromIPA searches the SymbolSet for a symbol with the given IPA symbol string
func (SymbolSet) SplitIPATranscription ¶
SplitIPATranscription splits the input transcription into separate symbols
func (SymbolSet) SplitTranscription ¶
SplitTranscription splits the input transcription into separate symbols
func (SymbolSet) ValidIPASymbol ¶
ValidIPASymbol checks if a string is a valid symbol or not
func (SymbolSet) ValidSymbol ¶
ValidSymbol checks if a string is a valid symbol or not
type Type ¶
type Type int
Type is used for accent placement, etc.
const ( // CMU is used for the phone set used in the CMU lexicon CMU Type = iota // SAMPA is used for SAMPA transcriptions (http://www.phon.ucl.ac.uk/home/sampa/) SAMPA // IPA is used for IPA transcriptions IPA // Other is used for symbol sets not defined in the types above Other )
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
Package converter is used to convert between symbol sets from different languages.
|
Package converter is used to convert between symbol sets from different languages. |
Package mapper is used to map between different phonetic symbol sets, such as NST-SAMPA to Wikispeech-SAMPA, IPA to SAMPA, and so on.
|
Package mapper is used to map between different phonetic symbol sets, such as NST-SAMPA to Wikispeech-SAMPA, IPA to SAMPA, and so on. |