Documentation ¶
Overview ¶
Package hangulize transcribes non-Korean words into Hangul.
"Hello!" -> "헬로!"
Hangulize was inspired by Brian Jongseong Park (http://iceager.egloos.com/2610028). Based on this idea, the original Hangulize was developed in Python and went out in 2010 (https://github.com/sublee/hangulize). Since then, serving as a web application on https://hangulize.org/, it has been of great help for Korean translators.
This Go re-implementation is a reboot of Hangulize with feature improvements.
Procedure ¶
Basically, Hangulize transcribes with 5 steps. These steps include "Normalize", "Group", "Rewrite", "Transcribe", and "Syllabify". To clarify these concepts, let's consider an imaginary example of "Hello!" in English into "헬로!" (actually, English is not supported yet).
First, Hangulize normalizes letter cases:
"Hello" -> "hello!"
And then, it groups letters by meanings:
"hello!" -> "hello", "!"
After that, grouped chunks are rewritten as source language-specific rules. This step is usually for minimizing the differences between pronunciation and spelling:
"hello", "!" -> "heˈlō", "!"
And it transcribes rewritten chunks into Hangul Jamo phonemes.
"heˈlō", "!" -> "ㅎㅔ-ㄹㄹㅗ", "!"
Finally, it composes Jamo phonemes into Hangul syllabic blocks and joins all groups.
"ㅎㅔ-ㄹㄹㅗ", "!" -> "헬로!"
Extended Procedure ¶
Some languages, such as Japanese, may require 2 more steps: "Transliterate" and "Localize". The prior is before the Normalize step, and the latter is after the Syllabify step.
Japanese uses Kanji which is an ideogram. There is the Kanji-to-Kana mapping called Furigana. To get Furigana from Kanji, we need a lexical analysis based on several dictionaries. The Transliterate step guesses the phonograms from a spelling based on lexical analysis.
"日本語" -> "ニホンゴ"
Furthermore, Japanese uses the full-width characters for puctuations while Korean and European languages use the half-width. The full-width puctuations need to be replaced with the half-width and a space to generate a comfortable Korean word. The Localize step replaces them.
"이마、아이니유키마스" -> "이마, 아이니유키마스"
Spec ¶
A spec is written by the HSL format which is a configuration DSL for Hangulize 2. One spec is for one language transcription system. So we need to describe about the language at the first:
lang: id = "ita" codes = "it", "ita" # ISO 639-1 and 3 codes english = "Italian" korean = "이탈리아어" script = "roman"
Then write about yourself and the stage of this spec:
config: author = "John Doe <john@example.com>" stage = "draft"
We will write many patterns in rewrite/transcribe rules soon. Some expressions may appear many times annoyingly. To not repeat ourselves, we can use variables and macros.
A variable is a combination of letters. Variable in pattern will match with one of the letters. Variable "foo" can be referenced with "<foo>" in the patterns.
vars: "vowels" = "a", "e", "i", "o", "u"
A macro expression is replaced with the target before parsing the patterns. "@" is the common macro for "<vowels>" variable:
macros: "@" = "<vowels>"
Now we can write "rewrite" rules. There are Pattern and RPattern. Pattern matches with letters in a word. RPattern represents how the matched letters should be replaced. A replaced word by a rule would become as the input for the next rule:
rewrite: "^gli$" -> "li" "^gli{@}" -> "li" "{@}gli" -> "li" "gn{@}" -> "nJ"
Pattern is based on Regular Expression but it has it's own custom syntax. We call it "HRE" which means "Hangulize-specific Regular Expression".
"transcribe" rules are exactly same with "rewrite" rules. But it's RPatterns represent Hangul Jamo phonemes. In contrast to "rewrite", a replaced word won't become as the input for the next rules:
transcribe: "b" -> "ㅂ" "d" -> "ㄷ" "f" -> "ㅍ" "g" -> "ㄱ"
Finally, we should write expected transcription examples. They are used for unit testing. Verify your spec yourself:
test: "allegretto" -> "알레그레토" "gita" -> "지타" "bisnonno" -> "비스논노" "Pinocchio" -> "피노키오"
Example ¶
package main import ( "fmt" "github.com/hangulize/hangulize" ) func main() { // Person names from http://iceager.egloos.com/2610028 catalin, _ := hangulize.Hangulize("ron", "Cătălin Moroşanu") fmt.Println(catalin) jerrel, _ := hangulize.Hangulize("nld", "Jerrel Venetiaan") fmt.Println(jerrel) vitor, _ := hangulize.Hangulize("por", "Vítor Constâncio") fmt.Println(vitor) }
Output: 커털린 모로샤누 예럴 페네티안 비토르 콘스탄시우
Index ¶
- Variables
- func Hangulize(lang string, word string) (string, error)
- func ListLangs() []string
- func Translits() map[string]Translit
- func UnloadSpec(lang string)
- func UnuseTranslit(scheme string) bool
- func UseTranslit(t Translit) bool
- type Config
- type Hangulizer
- type Language
- type Rule
- type Spec
- type Step
- type Trace
- type Traces
- type Translit
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var AllSteps = []Step{ Input, Transliterate, Normalize, Group, Rewrite, Transcribe, Syllabify, Localize, }
AllSteps is the array of all steps.
var ErrSpecNotFound = errors.New("spec not found")
ErrSpecNotFound occurs when the spec for the given language is not found.
var ErrTranslit = errors.New("translit error")
ErrTranslit occurs when a transliteration has been failed.
var ErrTranslitNotImported = errors.New("translit not imported")
ErrTranslitNotImported occurs when the selected spec requires a Translit but it has not been imported yet.
Functions ¶
func Hangulize ¶
Hangulize transcribes a non-Korean word into Hangul, which is the Korean alphabet.
For example, it will transcribe "Владивосто́к" in Russian into "블라디보스토크".
It is the most simple and useful API of thie package.
Example (Cappuccino) ¶
package main import ( "fmt" "github.com/hangulize/hangulize" ) func main() { cappuccino, _ := hangulize.Hangulize("ita", "Cappuccino") fmt.Println(cappuccino) }
Output: 카푸치노
Example (Nietzsche) ¶
package main import ( "fmt" "github.com/hangulize/hangulize" ) func main() { nietzsche, _ := hangulize.Hangulize("deu", "Friedrich Wilhelm Nietzsche") fmt.Println(nietzsche) }
Output: 프리드리히 빌헬름 니체
Example (ShinkaiMakoto) ¶
package main import ( "fmt" "github.com/hangulize/hangulize" ) func main() { // import "github.com/hangulize/hangulize/translit" // translit.Install() shinkai, _ := hangulize.Hangulize("jpn", "新海誠") fmt.Println(shinkai) }
Output: 신카이 마코토
func ListLangs ¶
func ListLangs() []string
ListLangs returns the language name list of bundled specs. The bundled spec can be loaded by LoadSpec.
Example ¶
Here're all supported languages.
package main import ( "fmt" "github.com/hangulize/hangulize" ) func main() { for _, lang := range hangulize.ListLangs() { fmt.Println(lang) } }
Output: aze bel bul cat ces chi cym deu ell epo est fin grc hbs hun isl ita jpn jpn-ck kat-1 kat-2 lat lav lit mkd nld pol por por-br ron rus slk slv spa sqi swe tur ukr vie wlm
func UnloadSpec ¶ added in v0.2.11
func UnloadSpec(lang string)
UnloadSpec flushes a cached spec to get free memory.
func UnuseTranslit ¶ added in v0.5.0
UnuseTranslit removes an imported Translit from the default registry.
func UseTranslit ¶ added in v0.5.0
UseTranslit imports a Translit into the default registry.
Types ¶
type Hangulizer ¶
type Hangulizer interface { Spec() *Spec Translits() map[string]Translit UseTranslit(Translit) bool UnuseTranslit(scheme string) bool Hangulize(word string) (string, error) HangulizeTrace(word string) (string, Traces, error) }
Hangulizer is dedicated for a specific language. transcribes a provides the transcription logic for the underlying spec.
func New ¶ added in v0.5.0
func New(spec *Spec) Hangulizer
New creates a hangulizer for a Spec.
Example ¶
package main import ( "fmt" "github.com/hangulize/hangulize" ) func main() { spec, _ := hangulize.LoadSpec("nld") h := hangulize.New(spec) gogh, _ := h.Hangulize("Vincent van Gogh") fmt.Println(gogh) }
Output: 빈센트 반고흐
type Language ¶
type Language struct { ID string // Arbitrary, but identifiable language ID. Codes [2]string // [0]: ISO 639-1 code, [1]: ISO 639-3 code English string // The language name in English. Korean string // The language name in Korean. Script string Translit []string }
Language identifies a natural language.
type Rule ¶
Rule is a pair of Pattern and RPattern.
type Spec ¶
type Spec struct { // Meta information sections Lang Language Config Config // Helper setting sections Macros map[string]string Vars map[string][]string Normalize map[string][]string // Rewrite/Transcribe Rewrite []Rule Transcribe []Rule // Test examples Test [][2]string // Source code Source string // contains filtered or unexported fields }
Spec represents a transactiption specification for a language.
func LoadSpec ¶
LoadSpec finds a bundled spec by the given language name. Once it loads a spec, it will cache the spec.
type Step ¶ added in v0.3.0
type Step int
Step is an identifier for the each step in the Hangulize procedure.
const ( // Input step just records the beginning. Input Step // Transliterate step converts the spelling to the phonograms. Transliterate // Normalize step eliminates letter case to make the next steps work easier. Normalize // Group step associates meaningful letters. Group // Rewrite step minimizes the gap between pronunciation and spelling. Rewrite // Transcribe step determines Hangul spelling for the pronunciation. Transcribe // Syllabify step composes Jamo phonemes into Hangul syllabic blocks. Syllabify // Localize step converts foreign punctuations to fit in Korean. Localize )
type Trace ¶
Trace is emitted when a replacement occurs. It is used for tracing of the Hangulize procedure internal.
type Translit ¶ added in v0.5.0
type Translit interface { // Scheme returns the identifier string of a Translit. Scheme() string // Transliterate transliterates the given word. Transliterate(string) (string, error) }
Translit is an interface for a transliterator. It may convert a word from one script to another script. It also may guess phonograms from the spelling based on lexical analysis.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
cmd
|
|
internal
|
|
jamo
Package jamo implements a Hangul composer.
|
Package jamo implements a Hangul composer. |
subword
Package subword implements a word replacement with a level.
|
Package subword implements a word replacement with a level. |
pkg
|
|
hre
Package hre provides the regular expression dialect for Hangulize called HRE.
|
Package hre provides the regular expression dialect for Hangulize called HRE. |
hsl
Package hsl implements a parser for the HSL format which is used for Hangulize.
|
Package hsl implements a parser for the HSL format which is used for Hangulize. |
furigana
Package furigana implements the hangulize.Translit interface for Japanese Kanji.
|
Package furigana implements the hangulize.Translit interface for Japanese Kanji. |
pinyin
Package pinyin implements the hangulize.Translit interface for Chinese Hanzu.
|
Package pinyin implements the hangulize.Translit interface for Chinese Hanzu. |