utf8

package

v1.23.0 Latest Latest Go to latest Published: Aug 14, 2024 License: MIT Imports: 0 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/shogo82148/std

Links

Open Source Insights

Documentation ¶

Overview ¶

utf8パッケージはUTF-8でエンコードされたテキストをサポートするための関数や定数を実装しています。ルーンとUTF-8バイトシーケンスの変換を行うための関数も含まれています。詳細はhttps://en.wikipedia.org/wiki/UTF-8を参照してください。

Index ¶

Constants
func AppendRune(p []byte, r rune) []byte
func DecodeLastRune(p []byte) (r rune, size int)
func DecodeLastRuneInString(s string) (r rune, size int)
func DecodeRune(p []byte) (r rune, size int)
func DecodeRuneInString(s string) (r rune, size int)
func EncodeRune(p []byte, r rune) int
func FullRune(p []byte) bool
func FullRuneInString(s string) bool
func RuneCount(p []byte) int
func RuneCountInString(s string) (n int)
func RuneLen(r rune) int
func RuneStart(b byte) bool
func Valid(p []byte) bool
func ValidRune(r rune) bool
func ValidString(s string) bool

Examples ¶

AppendRune
DecodeLastRune
DecodeLastRuneInString
DecodeRune
DecodeRuneInString
EncodeRune
EncodeRune (OutOfRange)
FullRune
FullRuneInString
RuneCount
RuneCountInString
RuneLen
RuneStart
Valid
ValidRune
ValidString

Constants ¶

View Source

const (
	RuneError = '\uFFFD'
	RuneSelf  = 0x80
	MaxRune   = '\U0010FFFF'
	UTFMax    = 4
)

エンコーディングに基本的な数値。

Variables ¶

This section is empty.

Functions ¶

func AppendRune ¶ added in v1.18.0

func AppendRune(p []byte, r rune) []byte

AppendRuneは、rのUTF-8エンコーディングをpの末尾に追加し、拡張されたバッファを返します。もしルーンが範囲外であれば、 RuneError のエンコーディングを追加します。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	buf1 := utf8.AppendRune(nil, 0x10000)
	buf2 := utf8.AppendRune([]byte("init"), 0x10000)
	fmt.Println(string(buf1))
	fmt.Println(string(buf2))
}

Output:

𐀀
init𐀀

func DecodeLastRune ¶

func DecodeLastRune(p []byte) (r rune, size int)

DecodeLastRuneはpの最後のUTF-8エンコーディングを展開し、ルーンとそのバイト幅を返します。もしpが空なら、(RuneError, 0)を返します。それ以外の場合、エンコーディングが無効なら、(RuneError, 1)を返します。これらは正しい、空でないUTF-8に対しては不可能な結果です。

エンコーディングが無効な場合は、それが不正なUTF-8である、範囲外のルーンをエンコードしている、または値のための最短可能なUTF-8エンコーディングでない場合です。それ以外の検証は行われません。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	b := []byte("Hello, 世界")

	for len(b) > 0 {
		r, size := utf8.DecodeLastRune(b)
		fmt.Printf("%c %v\n", r, size)

		b = b[:len(b)-size]
	}
}

Output:

界 3
世 3
  1
, 1
o 1
l 1
l 1
e 1
H 1

func DecodeLastRuneInString ¶

func DecodeLastRuneInString(s string) (r rune, size int)

DecodeLastRuneInStringは DecodeLastRune と同様ですが、入力は文字列です。もしsが空なら、(RuneError, 0)を返します。それ以外の場合、エンコーディングが無効なら、(RuneError, 1)を返します。これらは正しい、空でない UTF-8に対しては不可能な結果です。

エンコーディングが無効な場合は、それが不正なUTF-8である、範囲外のルーンをエンコードしている、または値のための最短可能なUTF-8エンコーディングでない場合です。それ以外の検証は行われません。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	str := "Hello, 世界"

	for len(str) > 0 {
		r, size := utf8.DecodeLastRuneInString(str)
		fmt.Printf("%c %v\n", r, size)

		str = str[:len(str)-size]
	}
}

Output:

界 3
世 3
  1
, 1
o 1
l 1
l 1
e 1
H 1

func DecodeRune ¶

func DecodeRune(p []byte) (r rune, size int)

DecodeRuneはpの最初のUTF-8エンコーディングを展開し、ルーンとそのバイト幅を返します。もしpが空なら、(RuneError, 0)を返します。それ以外の場合、エンコーディングが無効なら、(RuneError, 1)を返します。これらは正しい、空でないUTF-8に対しては不可能な結果です。

エンコーディングが無効な場合は、それが不正なUTF-8である、範囲外のルーンをエンコードしている、または値のための最短可能なUTF-8エンコーディングでない場合です。それ以外の検証は行われません。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	b := []byte("Hello, 世界")

	for len(b) > 0 {
		r, size := utf8.DecodeRune(b)
		fmt.Printf("%c %v\n", r, size)

		b = b[size:]
	}
}

Output:

H 1
e 1
l 1
l 1
o 1
, 1
  1
世 3
界 3

func DecodeRuneInString ¶

func DecodeRuneInString(s string) (r rune, size int)

DecodeRuneInStringは DecodeRune と同様ですが、入力は文字列です。もしsが空なら、(RuneError, 0)を返します。それ以外の場合、エンコーディングが無効なら、(RuneError, 1)を返します。これらは正しい、空でない UTF-8に対しては不可能な結果です。

エンコーディングが無効な場合は、それが不正なUTF-8である、範囲外のルーンをエンコードしている、または値のための最短可能なUTF-8エンコーディングでない場合です。それ以外の検証は行われません。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	str := "Hello, 世界"

	for len(str) > 0 {
		r, size := utf8.DecodeRuneInString(str)
		fmt.Printf("%c %v\n", r, size)

		str = str[size:]
	}
}

Output:

H 1
e 1
l 1
l 1
o 1
, 1
  1
世 3
界 3

func EncodeRune ¶

func EncodeRune(p []byte, r rune) int

EncodeRuneは、ルーンのUTF-8エンコーディングをpに書き込みます（これは十分に大きくなければなりません）。もしルーンが範囲外であれば、RuneError のエンコーディングを書き込みます。書き込まれたバイト数を返します。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	r := '世'
	buf := make([]byte, 3)

	n := utf8.EncodeRune(buf, r)

	fmt.Println(buf)
	fmt.Println(n)
}

Output:

[228 184 150]
3

Example (OutOfRange) ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	runes := []rune{
		// 0未満、範囲外です。
		-1,
		// 0x10FFFF より大きいため、範囲外です。
		0x110000,
		// ユニコードの置換文字。
		utf8.RuneError,
	}
	for i, c := range runes {
		buf := make([]byte, 3)
		size := utf8.EncodeRune(buf, c)
		fmt.Printf("%d: %d %[2]s %d\n", i, buf, size)
	}
}

Output:

0: [239 191 189] � 3
1: [239 191 189] � 3
2: [239 191 189] � 3

func FullRune ¶

func FullRune(p []byte) bool

FullRuneは、pのバイトがルーンの完全なUTF-8エンコードで始まるかどうかを報告します。無効なエンコーディングは完全なルーンと見なされるため、幅-1のエラールーンとして変換されます。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	buf := []byte{228, 184, 150} // こんにちは
	fmt.Println(utf8.FullRune(buf))
	fmt.Println(utf8.FullRune(buf[:2]))
}

Output:

true
false

func FullRuneInString ¶

func FullRuneInString(s string) bool

FullRuneInString は FullRune と似ていますが、入力は文字列です。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	str := "世"
	fmt.Println(utf8.FullRuneInString(str))
	fmt.Println(utf8.FullRuneInString(str[:2]))
}

Output:

true
false

func RuneCount ¶

func RuneCount(p []byte) int

RuneCount は p 内のルーンの数を返します。間違ったエンコーディングや短いエンコーディングは、1バイトの幅を持つ単一のルーンとして扱われます。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	buf := []byte("Hello, 世界")
	fmt.Println("bytes =", len(buf))
	fmt.Println("runes =", utf8.RuneCount(buf))
}

Output:

bytes = 13
runes = 9

func RuneCountInString ¶

func RuneCountInString(s string) (n int)

RuneCountInString is like RuneCount but its input is a string.

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	str := "Hello, 世界"
	fmt.Println("bytes =", len(str))
	fmt.Println("runes =", utf8.RuneCountInString(str))
}

Output:

bytes = 13
runes = 9

func RuneLen ¶

func RuneLen(r rune) int

RuneLenはルーンをUTF-8でエンコードするために必要なバイト数を返します。ルーンがUTF-8でエンコードすることができない場合は、-1を返します。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	fmt.Println(utf8.RuneLen('a'))
	fmt.Println(utf8.RuneLen('界'))
}

Output:

1
3

func RuneStart ¶

func RuneStart(b byte) bool

RuneStartは、バイトが符号化された（おそらく無効な）ルーンの最初のバイトであるかどうかを報告します。2番目以降のバイトは常に上位2ビットが10に設定されます。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	buf := []byte("a界")
	fmt.Println(utf8.RuneStart(buf[0]))
	fmt.Println(utf8.RuneStart(buf[1]))
	fmt.Println(utf8.RuneStart(buf[2]))
}

Output:

true
true
false

func Valid ¶

func Valid(p []byte) bool

Validは、pが完全に有効なUTF-8エンコードされたルーンで構成されているかどうかを示します。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	valid := []byte("Hello, 世界")
	invalid := []byte{0xff, 0xfe, 0xfd}

	fmt.Println(utf8.Valid(valid))
	fmt.Println(utf8.Valid(invalid))
}

Output:

true
false

func ValidRune ¶ added in v1.1.0

func ValidRune(r rune) bool

ValidRuneは、rがUTF-8として正当にエンコードされるかどうかを報告します。範囲外のコードポイントやサロゲートペアの半分は不正です。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	valid := 'a'
	invalid := rune(0xfffffff)

	fmt.Println(utf8.ValidRune(valid))
	fmt.Println(utf8.ValidRune(invalid))
}

Output:

true
false

func ValidString ¶

func ValidString(s string) bool

ValidStringは、sが完全に有効なUTF-8エンコードされたルーンで構成されているかどうかを報告します。

Example ¶

package main

import (
	"github.com/shogo82148/std/fmt"
	"github.com/shogo82148/std/unicode/utf8"
)

func main() {
	valid := "Hello, 世界"
	invalid := string([]byte{0xff, 0xfe, 0xfd})

	fmt.Println(utf8.ValidString(valid))
	fmt.Println(utf8.ValidString(invalid))
}

Output:

true
false

Types ¶

This section is empty.

Source Files ¶

View all Source files

utf8.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL