gomojimoji

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 25, 2022 License: Apache-2.0 Imports: 2 Imported by: 0

README

(go) mojimoji

Go Reference

This is a port of the excellent mojimoji library written in Python to Golang.

It provides two functions:

  • HanToZen - half-width to full-width character conversion.
  • ZenToHan - half-width to full-width character conversion.

Each of the functions allow the following options:

  • ASCII - enable or disable ASCII translation.
  • Digits - enable or disable Digits translation.
  • Kana - enable or disable Kana translation.

All options are enabled by default, see examples on their usage.

Logic is implemented as of commit aca2661.

Examples

HanToZen
fmt.Println(HanToZen("ニュージーランド"))
fmt.Println(HanToZen("ニュージーランド Auckland 6012", ASCII(true), Digits(false), Kana(false)))

// Output:
// ニュージーランド
// ニュージーランド Auckland 6012
ZenToHan
fmt.Println(ZenToHan("ニュージーランド"))
fmt.Println(ZenToHan("ニュージーランド Auckland 0123", Kana(false), Digits(true)))

// Output:
// ニュージーランド
// ニュージーランド Auckland 0123

Benchmark

Original library etc.

Original mojimoji, zenhan and unicodedata on my system, for comparison:

In [4]: s = u'ABCDEFG012345' * 10

In [5]: %time for n in range(1000000): mojimoji.zen_to_han(s)
CPU times: user 3.24 s, sys: 1.28 ms, total: 3.24 s
Wall time: 3.24 s

In [6]: %time for n in range(1000000): zenhan.z2h(s)
CPU times: user 26.2 s, sys: 16.3 ms, total: 26.2 s
Wall time: 26.2 s

In [7]: %time for n in range(1000000): unicodedata.normalize('NFKC', s)
CPU times: user 3.12 s, sys: 15.4 ms, total: 3.13 s
Wall time: 3.14 s
This library

ZenToHan and HanToZen use different approaches:

  • ZenToHan uses string.Builder, which is simpler to implement.
  • HanToZen uses direct slice operations to allow for seeking when needed.

ZenToHan:

mojimoji (master)> go test -bench=BenchmarkZenToHanConv
goos: darwin
goarch: amd64
pkg: github.com/rusq/gomojimoji
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkZenToHanConv-16               1        2880823810 ns/op
--- BENCH: BenchmarkZenToHanConv-16
    mojimoji_test.go:98: 2.88079814s
PASS
ok      github.com/rusq/gomojimoji      2.977s

HanToZen:

mojimoji (master)> go test -bench=BenchmarkHanToZen    
goos: darwin
goarch: amd64
pkg: github.com/rusq/gomojimoji
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkHanToZenConv-16               1        2712209539 ns/op
--- BENCH: BenchmarkHanToZenConv-16
    mojimoji_test.go:107: 2.712166151s
PASS
ok      github.com/rusq/gomojimoji      2.804s

Documentation

Overview

Package mojimoji is a port of mojimoji package to Go. Original: https://github.com/studio-ousia/mojimoji

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ASCII_ZENKAKU_CHARS = []rune{'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '!', '”', '#', '$', '%', '&', '’', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '¥', ']', '^', '_', '‘', '{', '|', '}', '~', '\u3000'}
	ASCII_HANKAKU_CHARS = []rune{'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '!', '"', '#', '$', '%', '&', '\'', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '¥', ']', '^', '_', '`', '{', '|', '}', '~', ' '}

	KANA_ZENKAKU_CHARS = []rune{'ア', 'イ', 'ウ', 'エ', 'オ', 'カ', 'キ', 'ク', 'ケ', 'コ', 'サ', 'シ', 'ス', 'セ', 'ソ', 'タ', 'チ', 'ツ', 'テ', 'ト', 'ナ', 'ニ', 'ヌ', 'ネ', 'ノ', 'ハ', 'ヒ', 'フ', 'ヘ', 'ホ', 'マ', 'ミ', 'ム', 'メ', 'モ', 'ヤ', 'ユ', 'ヨ', 'ラ', 'リ', 'ル', 'レ', 'ロ', 'ワ', 'ヲ', 'ン', 'ァ', 'ィ', 'ゥ', 'ェ', 'ォ', 'ッ', 'ャ', 'ュ', 'ョ', '。', '、', '・', '゛', '゜', '「', '」', 'ー'}
	KANA_HANKAKU_CHARS = []rune{'ア', 'イ', 'ウ', 'エ', 'オ', 'カ', 'キ', 'ク', 'ケ', 'コ', 'サ', 'シ', 'ス', 'セ', 'ソ', 'タ', 'チ', 'ツ', 'テ', 'ト', 'ナ', 'ニ', 'ヌ', 'ネ', 'ノ', 'ハ', 'ヒ', 'フ', 'ヘ', 'ホ', 'マ', 'ミ', 'ム', 'メ', 'モ', 'ヤ', 'ユ', 'ヨ', 'ラ', 'リ', 'ル', 'レ', 'ロ', 'ワ', 'ヲ', 'ン', 'ァ', 'ィ', 'ゥ', 'ェ', 'ォ', 'ッ', 'ャ', 'ュ', 'ョ', '。', '、', '・', '゙', '゚', '「', '」', 'ー'}

	DIGIT_ZENKAKU_CHARS = []rune{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}
	DIGIT_HANKAKU_CHARS = []rune{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}

	KANA_TEN_MAP = map[rune]rune{
		'ガ': 'カ', 'ギ': 'キ', 'グ': 'ク', 'ゲ': 'ケ', 'ゴ': 'コ',
		'ザ': 'サ', 'ジ': 'シ', 'ズ': 'ス', 'ゼ': 'セ', 'ゾ': 'ソ',
		'ダ': 'タ', 'ヂ': 'チ', 'ヅ': 'ツ', 'デ': 'テ', 'ド': 'ト',
		'バ': 'ハ', 'ビ': 'ヒ', 'ブ': 'フ', 'ベ': 'ヘ', 'ボ': 'ホ',
		'ヴ': 'ウ',
	}

	KANA_MARU_MAP = map[rune]rune{
		'パ': 'ハ', 'ピ': 'ヒ', 'プ': 'フ', 'ペ': 'ヘ', 'ポ': 'ホ',
	}
)

Character tables are taken from the original mojimoji python library.

Functions

func HanToZen

func HanToZen(text string, opt ...Option) string

HanToZen converts text to full-width runes. By default all runes are converted, optionally caller can switch off rune-set by passing Option.

Example
fmt.Println(HanToZen("ニュージーランド"))
fmt.Println(HanToZen("ニュージーランド Auckland 6012", ASCII(true), Digits(false), Kana(false)))
Output:

ニュージーランド
ニュージーランド Auckland 6012

func ZenToHan

func ZenToHan(text string, opt ...Option) string

ZenToHan converts text to half-width runes. By default all runes are converted, optionally caller can switch off rune-set by passing Option.

Example
fmt.Println(ZenToHan("ニュージーランド"))
fmt.Println(ZenToHan("ニュージーランド Auckland 0123", Kana(false), Digits(true)))
Output:

ニュージーランド
ニュージーランド Auckland 0123

Types

type Option

type Option func(*options)

Option is the option function signature.

func ASCII

func ASCII(enable bool) Option

ASCII enables or disables conversion of ASCII runes (A-Za-z).

func Digits

func Digits(enable bool) Option

Digits enables or disables conversion of digit runes (0-9).

func Kana

func Kana(enable bool) Option

Kana enables or disables conversion of Kana runes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL