kana

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2022 License: MIT Imports: 2 Imported by: 0

README

Kana Tools

A Go package for Wapuro Romaji, Katakana, and Hiragana Detection and Conversion

kana-tools Coverage Status Go Report Card Go Reference

Kana Tools provides Romaji ←→ Kana transliteration based on Wāpuro rōmaji (ワープロローマ字) romanization.

Where possible, the library uses a static rather than computational approach in order to perform conversions, relying on order-of-operations to ensure the correct output and provide a higher degree of wapuro conformity and maintainability.

Usage
import "github.com/mochi-co/kana-tools"
// Convert Hiragana and Katakana to Romaji
// ToRomaji(s string, vocalize bool) string
kana.ToRomaji("ひらがな", false) // -> "hiragana"
kana.ToRomaji("カタカナ", false) // -> "katakana"
kana.ToRomaji("ひらがな and カタカナ", false) // -> "hiragana and katakana"
// Convert Hiragana and Katakana to Cased Romaji
// ToRomajiCased(s string, vocalize bool) string
kana.ToRomajiCased("ひらがな", false) // -> "hiragana"
kana.ToRomajiCased("カタカナ", false) // -> "KATAKANA"
kana.ToRomajiCased("ひらがな and カタカナ", false) // -> "hiragana and KATAKANA"
// By default, ToRomaji outputs the literal transliteration of the kana.
// This means that づ and ぢ are du and di, respectively. To return the correct
// vocal pronunciation of a romaji string, use `Vocalized(s string):`
kana.ToRomaji("つづく", false) // -> "tsuduku"
kana.Vocalized(kana.ToRomaji("つづく", true)) // -> "tsuzuku"
// Convert Romaji and Katakana to Hiragana
kana.ToHiragana("hiragana") // -> "ひらがな"
kana.ToHiragana("hiragana + カタカナ") // -> "ひらがな + かたかな"
// Convert Romaji and Hiragana to Katakana
kana.ToKatakana("katakana") // -> "カタカナ"
kana.ToKatakana("katakana + ひらがな") // -> "カタカナ + ヒラガナ"
// Convert Romaji to Hiragana and Katakana (case sensitive romaji)
kana.ToKana("hiragana + KATAKANA") // -> "ひらがな + カタカナ"
// String IS Hiragana
kana.IsHiragana("たべる") // -> true
kana.IsHiragana("食べる") // -> false
// String CONTAINS Hiragana
kana.ContainsHiragana("たべる") // -> true
kana.ContainsHiragana("食べる") // -> true
kana.ContainsHiragana("カタカナ") // -> false
// String IS Katakana
kana.IsKatakana("バナナ") // -> true
kana.IsKatakana("バナナ茶") // -> false
// String CONTAINS Katakana
kana.ContainsKatakana("バナナ") // -> true
kana.ContainsKatakana("バナナ茶") // -> true
kana.ContainsKatakana("ひらがな") // -> false
// String IS Kanji
kana.IsKatakana("水") // -> true
kana.IsKatakana("also 茶") // -> false
// String CONTAINS Kanji
kana.ContainsKatakana("食べる") // -> true
kana.ContainsKatakana("also 茶") // -> true
kana.ContainsKatakana("ひらがな + カタカナ") // -> false
// Extract Kanji from String
kana.ExtractKanji("また、平易な日本語で伝える週刊ニュースも放送します。日本語") 
// -> []string{"平", "易", "日", "本", "語", "伝", "週", "刊", "放", "送", "日", "本", "語"}
Linguistic Considerations

A number of rule considerations and assumptions have been made while creating this library in order to conform to Wapuro romanization.

  • Long Vowels are indicated using using repeating characters instead of macrons/circumflexes: oo/おお instead of ō:
    • benkyou/べんきょう, not benkyō.
    • toukyou/とうきょう, not Tōkyō.
    • obaasan/おばあさん, not obāsan.
    • Chōonpu (ー) are preferred for katakana and loan words, and will preserved or converted to minus-dashes.
      • セーラー, not セエラア, becoming se-ra-
      • パーティー, not パアティィ, becoming pa-ti-
  • Particles are always converted literally:
    • は is ha, not wa.
    • を is wo, not o.
    • へ is he, not e, etc.
  • Moraic N's are used to disambiguate ん and な,に,ぬ,ね,の,にゃ,にゅ,にょ:
    • かんい is kan'i
    • しにょう is shin'you
    • ぜんにん is zennin
    • ぜんいん is zen'in
    • あんない is annai
  • Long Consonants marked with sokuons are doubled:
    • いっしょ is issho
    • ぱっぱ is pappa
    • ざっし is zasshi
    • cch uses the Revised Hepburn intepretation (tch) for alignment with English phonology:
      • まっちゃ is matcha, not maccha
      • こっち is kotchi, not kocchi
  • la, li, lu, le, lo are converted to ra, ri, ru, re, ro before transliteratio.
  • Nihon-Shiki romanization is used to map input-ambiguous characters:
    • di and DI are ぢ and ヂ
    • du and DU are づ and ヅ
    • Use the vocalized=true parameter on ToRomaji and ToRomajiCased to return the returned romaji into the normalized pronunciation - di as ji, du as zu.
  • じゃ, じゅ and じょ are ja, ju, and jo, however, jya, jyu, and jyo are also valid for a one-way romaji→kana conversion.
  • Isolated small vowel kana are romanized with 'x' prefixes, if they are not part of a larger composite:
    • フォト becomes "foto", as the ォ is part of the larger composite フォ.
    • The unnatural spelling パーティィ or ぱーてぃぃ becomes pa-tixi, not pa-tii or pa-texixi. The correct spelling is パーティ (pa-ti).
    • xa and XA are ぁ and ァ
    • xi and XI are ぃ and ィ
    • xu and XU are ぅ and ゥ
    • xe and XE are ぇ and ェ
    • xo and XO are ぉ and ォ
    • Dangling x's that remain after all other transliterations are converted into っ and ッ for hiragana and katakana respectively. The unnatural sequence "xx" will always become っっ or ッッ.

Review tables.go for romaji and kana character mapping references.

Contributions

Open an issue to report a bug, ask a question, or make a feature request!

License

MIT License.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ContainsHiragana

func ContainsHiragana(s string) bool

ContainsHiragana returns true if a string contains any hiragana characters.

func ContainsKanji

func ContainsKanji(s string) bool

ContainsKanji returns true if a string contains any kanji characters.

func ContainsKatakana

func ContainsKatakana(s string) bool

ContainsKatakana returns true if a string contains any katakana characters.

func ExtractKanji

func ExtractKanji(s string) []string

ExtractKanji returns a slice containing all kanji characters found in a string, in the order in which they were found. If a kanji exists multiple times in a string, then each instance of the kanji will be returned.

func HiraganaToKatakana

func HiraganaToKatakana(r rune) rune

HiraganaToKatakana replaces a single hiragana character with the unicode equivalent katakana character.

func IsHiragana

func IsHiragana(s string) bool

IsHiragana returns true if every element of a string is hiragana, except for characters indicated in sanitizeIsChecks (spaces and dashes).

func IsKanji

func IsKanji(s string) bool

IsKanji returns true if every element of a string is a kanji character, except for characters indicated in sanitizeIsChecksKanji (spaces).

func IsKatakana

func IsKatakana(s string) bool

IsKatakana returns true if every element of a string is katakana, except for characters indicated in sanitizeIsChecks (spaces and dashes).

func KatakanaToHiragana

func KatakanaToHiragana(r rune) rune

KatakanaToHiragana replaces a single katakana character with the unicode equivalent hiragana character.

func ToHiragana

func ToHiragana(s string) string

ToHiragana converts wapuro-hepburn romaji into the equivalent hiragana.

func ToKana

func ToKana(s string) string

ToKana converts wapuro-hepburn uppercase and lowercase romaji into katakana and hiragana respectively.

func ToKatakana

func ToKatakana(s string) string

ToKatakana converts wapuro-hepburn romaji into the equivalent katakana.

func ToRomaji

func ToRomaji(s string, vocalize bool) string

ToRomaji converts hiragana and/or katakana to lowercase romaji. By default, the literal transliteration of づ and ぢ are used, returnin du and di, respectively. Set vocalize to true to return the romaji in its correctly pronounced form - zu and ji.

func ToRomajiCased

func ToRomajiCased(s string, vocalize bool) string

ToRomajiCased converts hiragana and/or katakana to cased romaji, where hiragana and katakana are presented in lowercase and uppercase respectively.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL