unicode

package

v1.2.0 Latest Latest Go to latest Published: Sep 26, 2023 License: MIT Imports: 0 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/shogo82148/std

Links

Open Source Insights

Documentation ¶

Overview ¶

Package unicode provides data and functions to test some properties of Unicode code points.

Index ¶

Constants
Variables
func In(r rune, ranges ...*RangeTable) bool
func Is(rangeTab *RangeTable, r rune) bool
func IsControl(r rune) bool
func IsDigit(r rune) bool
func IsGraphic(r rune) bool
func IsLetter(r rune) bool
func IsLower(r rune) bool
func IsMark(r rune) bool
func IsNumber(r rune) bool
func IsOneOf(ranges []*RangeTable, r rune) bool
func IsPrint(r rune) bool
func IsPunct(r rune) bool
func IsSpace(r rune) bool
func IsSymbol(r rune) bool
func IsTitle(r rune) bool
func IsUpper(r rune) bool
func SimpleFold(r rune) rune
func To(_case int, r rune) rune
func ToLower(r rune) rune
func ToTitle(r rune) rune
func ToUpper(r rune) rune
type CaseRange
type Range16
type Range32
type RangeTable
type SpecialCase
- func (special SpecialCase) ToLower(r rune) rune
- func (special SpecialCase) ToTitle(r rune) rune
- func (special SpecialCase) ToUpper(r rune) rune

Constants ¶

View Source

const (
	MaxRune         = '\U0010FFFF'
	ReplacementChar = '\uFFFD'
	MaxASCII        = '\u007F'
	MaxLatin1       = '\u00FF'
)

View Source

const (
	UpperCase = iota
	LowerCase
	TitleCase
	MaxCase
)

Indices into the Delta arrays inside CaseRanges for case mapping.

View Source

const (
	UpperLower = MaxRune + 1
)

If the Delta field of a CaseRange is UpperLower or LowerUpper, it means this CaseRange represents a sequence of the form (say) Upper Lower Upper Lower.

View Source

const Version = "6.2.0"

Version is the Unicode edition from which the tables are derived.

Variables ¶

View Source

var (
	Cc     = _Cc
	Cf     = _Cf
	Co     = _Co
	Cs     = _Cs
	Digit  = _Nd
	Nd     = _Nd
	Letter = _L
	L      = _L
	Lm     = _Lm
	Lo     = _Lo
	Lower  = _Ll
	Ll     = _Ll
	Mark   = _M
	M      = _M
	Mc     = _Mc
	Me     = _Me
	Mn     = _Mn
	Nl     = _Nl
	No     = _No
	Number = _N
	N      = _N
	Other  = _C
	C      = _C
	Pc     = _Pc
	Pd     = _Pd
	Pe     = _Pe
	Pf     = _Pf
	Pi     = _Pi
	Po     = _Po
	Ps     = _Ps
	Punct  = _P
	P      = _P
	Sc     = _Sc
	Sk     = _Sk
	Sm     = _Sm
	So     = _So
	Space  = _Z
	Z      = _Z
	Symbol = _S
	S      = _S
	Title  = _Lt
	Lt     = _Lt
	Upper  = _Lu
	Lu     = _Lu
	Zl     = _Zl
	Zp     = _Zp
	Zs     = _Zs
)

These variables have type *RangeTable.

View Source

var (
	Arabic                 = _Arabic
	Armenian               = _Armenian
	Avestan                = _Avestan
	Balinese               = _Balinese
	Bamum                  = _Bamum
	Batak                  = _Batak
	Bengali                = _Bengali
	Bopomofo               = _Bopomofo
	Brahmi                 = _Brahmi
	Braille                = _Braille
	Buginese               = _Buginese
	Buhid                  = _Buhid
	Canadian_Aboriginal    = _Canadian_Aboriginal
	Carian                 = _Carian
	Chakma                 = _Chakma
	Cham                   = _Cham
	Cherokee               = _Cherokee
	Common                 = _Common
	Coptic                 = _Coptic
	Cuneiform              = _Cuneiform
	Cypriot                = _Cypriot
	Cyrillic               = _Cyrillic
	Deseret                = _Deseret
	Devanagari             = _Devanagari
	Egyptian_Hieroglyphs   = _Egyptian_Hieroglyphs
	Ethiopic               = _Ethiopic
	Georgian               = _Georgian
	Glagolitic             = _Glagolitic
	Gothic                 = _Gothic
	Greek                  = _Greek
	Gujarati               = _Gujarati
	Gurmukhi               = _Gurmukhi
	Han                    = _Han
	Hangul                 = _Hangul
	Hanunoo                = _Hanunoo
	Hebrew                 = _Hebrew
	Hiragana               = _Hiragana
	Imperial_Aramaic       = _Imperial_Aramaic
	Inherited              = _Inherited
	Inscriptional_Pahlavi  = _Inscriptional_Pahlavi
	Inscriptional_Parthian = _Inscriptional_Parthian
	Javanese               = _Javanese
	Kaithi                 = _Kaithi
	Kannada                = _Kannada
	Katakana               = _Katakana
	Kayah_Li               = _Kayah_Li
	Kharoshthi             = _Kharoshthi
	Khmer                  = _Khmer
	Lao                    = _Lao
	Latin                  = _Latin
	Lepcha                 = _Lepcha
	Limbu                  = _Limbu
	Linear_B               = _Linear_B
	Lisu                   = _Lisu
	Lycian                 = _Lycian
	Lydian                 = _Lydian
	Malayalam              = _Malayalam
	Mandaic                = _Mandaic
	Meetei_Mayek           = _Meetei_Mayek
	Meroitic_Cursive       = _Meroitic_Cursive
	Meroitic_Hieroglyphs   = _Meroitic_Hieroglyphs
	Miao                   = _Miao
	Mongolian              = _Mongolian
	Myanmar                = _Myanmar
	New_Tai_Lue            = _New_Tai_Lue
	Nko                    = _Nko
	Ogham                  = _Ogham
	Ol_Chiki               = _Ol_Chiki
	Old_Italic             = _Old_Italic
	Old_Persian            = _Old_Persian
	Old_South_Arabian      = _Old_South_Arabian
	Old_Turkic             = _Old_Turkic
	Oriya                  = _Oriya
	Osmanya                = _Osmanya
	Phags_Pa               = _Phags_Pa
	Phoenician             = _Phoenician
	Rejang                 = _Rejang
	Runic                  = _Runic
	Samaritan              = _Samaritan
	Saurashtra             = _Saurashtra
	Sharada                = _Sharada
	Shavian                = _Shavian
	Sinhala                = _Sinhala
	Sora_Sompeng           = _Sora_Sompeng
	Sundanese              = _Sundanese
	Syloti_Nagri           = _Syloti_Nagri
	Syriac                 = _Syriac
	Tagalog                = _Tagalog
	Tagbanwa               = _Tagbanwa
	Tai_Le                 = _Tai_Le
	Tai_Tham               = _Tai_Tham
	Tai_Viet               = _Tai_Viet
	Takri                  = _Takri
	Tamil                  = _Tamil
	Telugu                 = _Telugu
	Thaana                 = _Thaana
	Thai                   = _Thai
	Tibetan                = _Tibetan
	Tifinagh               = _Tifinagh
	Ugaritic               = _Ugaritic
	Vai                    = _Vai
	Yi                     = _Yi
)

These variables have type *RangeTable.

View Source

var (
	ASCII_Hex_Digit                    = _ASCII_Hex_Digit
	Bidi_Control                       = _Bidi_Control
	Dash                               = _Dash
	Deprecated                         = _Deprecated
	Diacritic                          = _Diacritic
	Extender                           = _Extender
	Hex_Digit                          = _Hex_Digit
	Hyphen                             = _Hyphen
	IDS_Binary_Operator                = _IDS_Binary_Operator
	IDS_Trinary_Operator               = _IDS_Trinary_Operator
	Ideographic                        = _Ideographic
	Join_Control                       = _Join_Control
	Logical_Order_Exception            = _Logical_Order_Exception
	Noncharacter_Code_Point            = _Noncharacter_Code_Point
	Other_Alphabetic                   = _Other_Alphabetic
	Other_Default_Ignorable_Code_Point = _Other_Default_Ignorable_Code_Point
	Other_Grapheme_Extend              = _Other_Grapheme_Extend
	Other_ID_Continue                  = _Other_ID_Continue
	Other_ID_Start                     = _Other_ID_Start
	Other_Lowercase                    = _Other_Lowercase
	Other_Math                         = _Other_Math
	Other_Uppercase                    = _Other_Uppercase
	Pattern_Syntax                     = _Pattern_Syntax
	Pattern_White_Space                = _Pattern_White_Space
	Quotation_Mark                     = _Quotation_Mark
	Radical                            = _Radical
	STerm                              = _STerm
	Soft_Dotted                        = _Soft_Dotted
	Terminal_Punctuation               = _Terminal_Punctuation
	Unified_Ideograph                  = _Unified_Ideograph
	Variation_Selector                 = _Variation_Selector
	White_Space                        = _White_Space
)

These variables have type *RangeTable.

View Source

var CaseRanges = _CaseRanges

CaseRanges is the table describing case mappings for all letters with non-self mappings.

View Source

var Categories = map[string]*RangeTable{
	"C":  C,
	"Cc": Cc,
	"Cf": Cf,
	"Co": Co,
	"Cs": Cs,
	"L":  L,
	"Ll": Ll,
	"Lm": Lm,
	"Lo": Lo,
	"Lt": Lt,
	"Lu": Lu,
	"M":  M,
	"Mc": Mc,
	"Me": Me,
	"Mn": Mn,
	"N":  N,
	"Nd": Nd,
	"Nl": Nl,
	"No": No,
	"P":  P,
	"Pc": Pc,
	"Pd": Pd,
	"Pe": Pe,
	"Pf": Pf,
	"Pi": Pi,
	"Po": Po,
	"Ps": Ps,
	"S":  S,
	"Sc": Sc,
	"Sk": Sk,
	"Sm": Sm,
	"So": So,
	"Z":  Z,
	"Zl": Zl,
	"Zp": Zp,
	"Zs": Zs,
}

Categories is the set of Unicode category tables.

View Source

var FoldCategory = map[string]*RangeTable{
	"Common":    foldCommon,
	"Greek":     foldGreek,
	"Inherited": foldInherited,
	"L":         foldL,
	"Ll":        foldLl,
	"Lt":        foldLt,
	"Lu":        foldLu,
	"M":         foldM,
	"Mn":        foldMn,
}

FoldCategory maps a category name to a table of code points outside the category that are equivalent under simple case folding to code points inside the category. If there is no entry for a category name, there are no such points.

View Source

var FoldScript = map[string]*RangeTable{}

FoldScript maps a script name to a table of code points outside the script that are equivalent under simple case folding to code points inside the script. If there is no entry for a script name, there are no such points.

View Source

var GraphicRanges = []*RangeTable{
	L, M, N, P, S, Zs,
}

GraphicRanges defines the set of graphic characters according to Unicode.

View Source

var PrintRanges = []*RangeTable{
	L, M, N, P, S,
}

PrintRanges defines the set of printable characters according to Go. ASCII space, U+0020, is handled separately.

View Source

var Properties = map[string]*RangeTable{
	"ASCII_Hex_Digit":                    ASCII_Hex_Digit,
	"Bidi_Control":                       Bidi_Control,
	"Dash":                               Dash,
	"Deprecated":                         Deprecated,
	"Diacritic":                          Diacritic,
	"Extender":                           Extender,
	"Hex_Digit":                          Hex_Digit,
	"Hyphen":                             Hyphen,
	"IDS_Binary_Operator":                IDS_Binary_Operator,
	"IDS_Trinary_Operator":               IDS_Trinary_Operator,
	"Ideographic":                        Ideographic,
	"Join_Control":                       Join_Control,
	"Logical_Order_Exception":            Logical_Order_Exception,
	"Noncharacter_Code_Point":            Noncharacter_Code_Point,
	"Other_Alphabetic":                   Other_Alphabetic,
	"Other_Default_Ignorable_Code_Point": Other_Default_Ignorable_Code_Point,
	"Other_Grapheme_Extend":              Other_Grapheme_Extend,
	"Other_ID_Continue":                  Other_ID_Continue,
	"Other_ID_Start":                     Other_ID_Start,
	"Other_Lowercase":                    Other_Lowercase,
	"Other_Math":                         Other_Math,
	"Other_Uppercase":                    Other_Uppercase,
	"Pattern_Syntax":                     Pattern_Syntax,
	"Pattern_White_Space":                Pattern_White_Space,
	"Quotation_Mark":                     Quotation_Mark,
	"Radical":                            Radical,
	"STerm":                              STerm,
	"Soft_Dotted":                        Soft_Dotted,
	"Terminal_Punctuation":               Terminal_Punctuation,
	"Unified_Ideograph":                  Unified_Ideograph,
	"Variation_Selector":                 Variation_Selector,
	"White_Space":                        White_Space,
}

Properties is the set of Unicode property tables.

View Source

var Scripts = map[string]*RangeTable{}/* 102 elements not displayed */

Scripts is the set of Unicode script tables.

Functions ¶

func In ¶ added in v1.2.0

func In(r rune, ranges ...*RangeTable) bool

In reports whether the rune is a member of one of the ranges.

func Is ¶

func Is(rangeTab *RangeTable, r rune) bool

Is reports whether the rune is in the specified table of ranges.

func IsControl ¶

func IsControl(r rune) bool

IsControl reports whether the rune is a control character. The C (Other) Unicode category includes more code points such as surrogates; use Is(C, r) to test for them.

func IsDigit ¶

func IsDigit(r rune) bool

IsDigit reports whether the rune is a decimal digit.

func IsGraphic ¶

func IsGraphic(r rune) bool

IsGraphic reports whether the rune is defined as a Graphic by Unicode. Such characters include letters, marks, numbers, punctuation, symbols, and spaces, from categories L, M, N, P, S, Zs.

func IsLetter ¶

func IsLetter(r rune) bool

IsLetter reports whether the rune is a letter (category L).

func IsLower ¶

func IsLower(r rune) bool

IsLower reports whether the rune is a lower case letter.

func IsMark ¶

func IsMark(r rune) bool

IsMark reports whether the rune is a mark character (category M).

func IsNumber ¶

func IsNumber(r rune) bool

IsNumber reports whether the rune is a number (category N).

func IsOneOf ¶

func IsOneOf(ranges []*RangeTable, r rune) bool

IsOneOf reports whether the rune is a member of one of the ranges. The function "In" provides a nicer signature and should be used in preference to IsOneOf.

func IsPrint ¶

func IsPrint(r rune) bool

IsPrint reports whether the rune is defined as printable by Go. Such characters include letters, marks, numbers, punctuation, symbols, and the ASCII space character, from categories L, M, N, P, S and the ASCII space character. This categorization is the same as IsGraphic except that the only spacing character is ASCII space, U+0020.

func IsPunct ¶

func IsPunct(r rune) bool

IsPunct reports whether the rune is a Unicode punctuation character (category P).

func IsSpace ¶

func IsSpace(r rune) bool

IsSpace reports whether the rune is a space character as defined by Unicode's White Space property; in the Latin-1 space this is

'\t', '\n', '\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).

Other definitions of spacing characters are set by category Z and property Pattern_White_Space.

func IsSymbol ¶

func IsSymbol(r rune) bool

IsSymbol reports whether the rune is a symbolic character.

func IsTitle ¶

func IsTitle(r rune) bool

IsTitle reports whether the rune is a title case letter.

func IsUpper ¶

func IsUpper(r rune) bool

IsUpper reports whether the rune is an upper case letter.

func SimpleFold ¶

func SimpleFold(r rune) rune

SimpleFold iterates over Unicode code points equivalent under the Unicode-defined simple case folding. Among the code points equivalent to rune (including rune itself), SimpleFold returns the smallest rune >= r if one exists, or else the smallest rune >= 0.

For example:

SimpleFold('A') = 'a'
SimpleFold('a') = 'A'

SimpleFold('K') = 'k'
SimpleFold('k') = '\u212A' (Kelvin symbol, K)
SimpleFold('\u212A') = 'K'

SimpleFold('1') = '1'

func To ¶

func To(_case int, r rune) rune

To maps the rune to the specified case: UpperCase, LowerCase, or TitleCase.

func ToLower ¶

func ToLower(r rune) rune

ToLower maps the rune to lower case.

func ToTitle ¶

func ToTitle(r rune) rune

ToTitle maps the rune to title case.

func ToUpper ¶

func ToUpper(r rune) rune

ToUpper maps the rune to upper case.

Types ¶

type CaseRange ¶

type CaseRange struct {
	Lo    uint32
	Hi    uint32
	Delta d
}

CaseRange represents a range of Unicode code points for simple (one code point to one code point) case conversion. The range runs from Lo to Hi inclusive, with a fixed stride of 1. Deltas are the number to add to the code point to reach the code point for a different case for that character. They may be negative. If zero, it means the character is in the corresponding case. There is a special case representing sequences of alternating corresponding Upper and Lower pairs. It appears with a fixed Delta of

{UpperLower, UpperLower, UpperLower}

The constant UpperLower has an otherwise impossible delta value.

type Range16 ¶

type Range16 struct {
	Lo     uint16
	Hi     uint16
	Stride uint16
}

Range16 represents of a range of 16-bit Unicode code points. The range runs from Lo to Hi inclusive and has the specified stride.

type Range32 ¶

type Range32 struct {
	Lo     uint32
	Hi     uint32
	Stride uint32
}

Range32 represents of a range of Unicode code points and is used when one or more of the values will not fit in 16 bits. The range runs from Lo to Hi inclusive and has the specified stride. Lo and Hi must always be >= 1<<16.

type RangeTable ¶

type RangeTable struct {
	R16         []Range16
	R32         []Range32
	LatinOffset int
}

RangeTable defines a set of Unicode code points by listing the ranges of code points within the set. The ranges are listed in two slices to save space: a slice of 16-bit ranges and a slice of 32-bit ranges. The two slices must be in sorted order and non-overlapping. Also, R32 should contain only values >= 0x10000 (1<<16).

type SpecialCase ¶

type SpecialCase []CaseRange

SpecialCase represents language-specific case mappings such as Turkish. Methods of SpecialCase customize (by overriding) the standard mappings.

var AzeriCase SpecialCase = _TurkishCase

var TurkishCase SpecialCase = _TurkishCase

func (SpecialCase) ToLower ¶

func (special SpecialCase) ToLower(r rune) rune

ToLower maps the rune to lower case giving priority to the special mapping.

func (SpecialCase) ToTitle ¶

func (special SpecialCase) ToTitle(r rune) rune

ToTitle maps the rune to title case giving priority to the special mapping.

func (SpecialCase) ToUpper ¶

func (special SpecialCase) ToUpper(r rune) rune

ToUpper maps the rune to upper case giving priority to the special mapping.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
utf16 Package utf16 implements encoding and decoding of UTF-16 sequences.	Package utf16 implements encoding and decoding of UTF-16 sequences.
utf8 Package utf8 implements functions and constants to support text encoded in UTF-8.	Package utf8 implements functions and constants to support text encoded in UTF-8.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL