Documentation
¶
Overview ¶
Package unicode provides data and functions to test some properties of Unicode code points.
Index ¶
- Constants
- Variables
- func In(r rune, ranges ...*RangeTable) bool
- func Is(rangeTab *RangeTable, r rune) bool
- func IsControl(r rune) bool
- func IsDigit(r rune) bool
- func IsGraphic(r rune) bool
- func IsLetter(r rune) bool
- func IsLower(r rune) bool
- func IsMark(r rune) bool
- func IsNumber(r rune) bool
- func IsOneOf(ranges []*RangeTable, r rune) bool
- func IsPrint(r rune) bool
- func IsPunct(r rune) bool
- func IsSpace(r rune) bool
- func IsSymbol(r rune) bool
- func IsTitle(r rune) bool
- func IsUpper(r rune) bool
- func SimpleFold(r rune) rune
- func To(_case int, r rune) rune
- func ToLower(r rune) rune
- func ToTitle(r rune) rune
- func ToUpper(r rune) rune
- type CaseRange
- type Range16
- type Range32
- type RangeTable
- type SpecialCase
Constants ¶
const ( MaxRune = '\U0010FFFF' ReplacementChar = '\uFFFD' MaxASCII = '\u007F' MaxLatin1 = '\u00FF' )
const ( UpperCase = iota LowerCase TitleCase MaxCase )
Indices into the Delta arrays inside CaseRanges for case mapping.
const (
UpperLower = MaxRune + 1
)
If the Delta field of a CaseRange is UpperLower or LowerUpper, it means this CaseRange represents a sequence of the form (say) Upper Lower Upper Lower.
const Version = "6.2.0"
Version is the Unicode edition from which the tables are derived.
Variables ¶
var ( Cc = _Cc Cf = _Cf Co = _Co Cs = _Cs Digit = _Nd Nd = _Nd Letter = _L L = _L Lm = _Lm Lo = _Lo Lower = _Ll Ll = _Ll Mark = _M M = _M Mc = _Mc Me = _Me Mn = _Mn Nl = _Nl No = _No Number = _N N = _N Other = _C C = _C Pc = _Pc Pd = _Pd Pe = _Pe Pf = _Pf Pi = _Pi Po = _Po Ps = _Ps Punct = _P P = _P Sc = _Sc Sk = _Sk Sm = _Sm So = _So Space = _Z Z = _Z Symbol = _S S = _S Title = _Lt Lt = _Lt Upper = _Lu Lu = _Lu Zl = _Zl Zp = _Zp Zs = _Zs )
These variables have type *RangeTable.
var ( Arabic = _Arabic Armenian = _Armenian Avestan = _Avestan Balinese = _Balinese Bamum = _Bamum Batak = _Batak Bengali = _Bengali Bopomofo = _Bopomofo Brahmi = _Brahmi Braille = _Braille Buginese = _Buginese Buhid = _Buhid Canadian_Aboriginal = _Canadian_Aboriginal Carian = _Carian Chakma = _Chakma Cham = _Cham Cherokee = _Cherokee Common = _Common Coptic = _Coptic Cuneiform = _Cuneiform Cypriot = _Cypriot Cyrillic = _Cyrillic Deseret = _Deseret Devanagari = _Devanagari Egyptian_Hieroglyphs = _Egyptian_Hieroglyphs Ethiopic = _Ethiopic Georgian = _Georgian Glagolitic = _Glagolitic Gothic = _Gothic Greek = _Greek Gujarati = _Gujarati Gurmukhi = _Gurmukhi Han = _Han Hangul = _Hangul Hanunoo = _Hanunoo Hebrew = _Hebrew Hiragana = _Hiragana Imperial_Aramaic = _Imperial_Aramaic Inherited = _Inherited Inscriptional_Pahlavi = _Inscriptional_Pahlavi Inscriptional_Parthian = _Inscriptional_Parthian Javanese = _Javanese Kaithi = _Kaithi Kannada = _Kannada Katakana = _Katakana Kayah_Li = _Kayah_Li Kharoshthi = _Kharoshthi Khmer = _Khmer Lao = _Lao Latin = _Latin Lepcha = _Lepcha Limbu = _Limbu Linear_B = _Linear_B Lisu = _Lisu Lycian = _Lycian Lydian = _Lydian Malayalam = _Malayalam Mandaic = _Mandaic Meetei_Mayek = _Meetei_Mayek Meroitic_Cursive = _Meroitic_Cursive Meroitic_Hieroglyphs = _Meroitic_Hieroglyphs Miao = _Miao Mongolian = _Mongolian Myanmar = _Myanmar New_Tai_Lue = _New_Tai_Lue Nko = _Nko Ogham = _Ogham Ol_Chiki = _Ol_Chiki Old_Italic = _Old_Italic Old_Persian = _Old_Persian Old_South_Arabian = _Old_South_Arabian Old_Turkic = _Old_Turkic Oriya = _Oriya Osmanya = _Osmanya Phags_Pa = _Phags_Pa Phoenician = _Phoenician Rejang = _Rejang Runic = _Runic Samaritan = _Samaritan Saurashtra = _Saurashtra Sharada = _Sharada Shavian = _Shavian Sinhala = _Sinhala Sora_Sompeng = _Sora_Sompeng Sundanese = _Sundanese Syloti_Nagri = _Syloti_Nagri Syriac = _Syriac Tagalog = _Tagalog Tagbanwa = _Tagbanwa Tai_Le = _Tai_Le Tai_Tham = _Tai_Tham Tai_Viet = _Tai_Viet Takri = _Takri Tamil = _Tamil Telugu = _Telugu Thaana = _Thaana Thai = _Thai Tibetan = _Tibetan Tifinagh = _Tifinagh Ugaritic = _Ugaritic Vai = _Vai Yi = _Yi )
These variables have type *RangeTable.
var ( ASCII_Hex_Digit = _ASCII_Hex_Digit Bidi_Control = _Bidi_Control Dash = _Dash Deprecated = _Deprecated Diacritic = _Diacritic Extender = _Extender Hex_Digit = _Hex_Digit Hyphen = _Hyphen IDS_Binary_Operator = _IDS_Binary_Operator IDS_Trinary_Operator = _IDS_Trinary_Operator Ideographic = _Ideographic Join_Control = _Join_Control Logical_Order_Exception = _Logical_Order_Exception Noncharacter_Code_Point = _Noncharacter_Code_Point Other_Alphabetic = _Other_Alphabetic Other_Default_Ignorable_Code_Point = _Other_Default_Ignorable_Code_Point Other_Grapheme_Extend = _Other_Grapheme_Extend Other_ID_Continue = _Other_ID_Continue Other_ID_Start = _Other_ID_Start Other_Lowercase = _Other_Lowercase Other_Math = _Other_Math Other_Uppercase = _Other_Uppercase Pattern_Syntax = _Pattern_Syntax Pattern_White_Space = _Pattern_White_Space Quotation_Mark = _Quotation_Mark Radical = _Radical STerm = _STerm Soft_Dotted = _Soft_Dotted Terminal_Punctuation = _Terminal_Punctuation Unified_Ideograph = _Unified_Ideograph Variation_Selector = _Variation_Selector White_Space = _White_Space )
These variables have type *RangeTable.
var CaseRanges = _CaseRanges
CaseRanges is the table describing case mappings for all letters with non-self mappings.
var Categories = map[string]*RangeTable{ "C": C, "Cc": Cc, "Cf": Cf, "Co": Co, "Cs": Cs, "L": L, "Ll": Ll, "Lm": Lm, "Lo": Lo, "Lt": Lt, "Lu": Lu, "M": M, "Mc": Mc, "Me": Me, "Mn": Mn, "N": N, "Nd": Nd, "Nl": Nl, "No": No, "P": P, "Pc": Pc, "Pd": Pd, "Pe": Pe, "Pf": Pf, "Pi": Pi, "Po": Po, "Ps": Ps, "S": S, "Sc": Sc, "Sk": Sk, "Sm": Sm, "So": So, "Z": Z, "Zl": Zl, "Zp": Zp, "Zs": Zs, }
Categories is the set of Unicode category tables.
var FoldCategory = map[string]*RangeTable{
"Common": foldCommon,
"Greek": foldGreek,
"Inherited": foldInherited,
"L": foldL,
"Ll": foldLl,
"Lt": foldLt,
"Lu": foldLu,
"M": foldM,
"Mn": foldMn,
}
FoldCategory maps a category name to a table of code points outside the category that are equivalent under simple case folding to code points inside the category. If there is no entry for a category name, there are no such points.
var FoldScript = map[string]*RangeTable{}
FoldScript maps a script name to a table of code points outside the script that are equivalent under simple case folding to code points inside the script. If there is no entry for a script name, there are no such points.
var GraphicRanges = []*RangeTable{ L, M, N, P, S, Zs, }
GraphicRanges defines the set of graphic characters according to Unicode.
var PrintRanges = []*RangeTable{ L, M, N, P, S, }
PrintRanges defines the set of printable characters according to Go. ASCII space, U+0020, is handled separately.
var Properties = map[string]*RangeTable{ "ASCII_Hex_Digit": ASCII_Hex_Digit, "Bidi_Control": Bidi_Control, "Dash": Dash, "Deprecated": Deprecated, "Diacritic": Diacritic, "Extender": Extender, "Hex_Digit": Hex_Digit, "Hyphen": Hyphen, "IDS_Binary_Operator": IDS_Binary_Operator, "IDS_Trinary_Operator": IDS_Trinary_Operator, "Ideographic": Ideographic, "Join_Control": Join_Control, "Logical_Order_Exception": Logical_Order_Exception, "Noncharacter_Code_Point": Noncharacter_Code_Point, "Other_Alphabetic": Other_Alphabetic, "Other_Default_Ignorable_Code_Point": Other_Default_Ignorable_Code_Point, "Other_Grapheme_Extend": Other_Grapheme_Extend, "Other_ID_Continue": Other_ID_Continue, "Other_ID_Start": Other_ID_Start, "Other_Lowercase": Other_Lowercase, "Other_Math": Other_Math, "Other_Uppercase": Other_Uppercase, "Pattern_Syntax": Pattern_Syntax, "Pattern_White_Space": Pattern_White_Space, "Quotation_Mark": Quotation_Mark, "Radical": Radical, "STerm": STerm, "Soft_Dotted": Soft_Dotted, "Terminal_Punctuation": Terminal_Punctuation, "Unified_Ideograph": Unified_Ideograph, "Variation_Selector": Variation_Selector, "White_Space": White_Space, }
Properties is the set of Unicode property tables.
var Scripts = map[string]*RangeTable{}/* 102 elements not displayed */
Scripts is the set of Unicode script tables.
Functions ¶
func In ¶ added in v1.2.0
func In(r rune, ranges ...*RangeTable) bool
In reports whether the rune is a member of one of the ranges.
func Is ¶
func Is(rangeTab *RangeTable, r rune) bool
Is reports whether the rune is in the specified table of ranges.
func IsControl ¶
IsControl reports whether the rune is a control character. The C (Other) Unicode category includes more code points such as surrogates; use Is(C, r) to test for them.
func IsGraphic ¶
IsGraphic reports whether the rune is defined as a Graphic by Unicode. Such characters include letters, marks, numbers, punctuation, symbols, and spaces, from categories L, M, N, P, S, Zs.
func IsOneOf ¶
func IsOneOf(ranges []*RangeTable, r rune) bool
IsOneOf reports whether the rune is a member of one of the ranges. The function "In" provides a nicer signature and should be used in preference to IsOneOf.
func IsPrint ¶
IsPrint reports whether the rune is defined as printable by Go. Such characters include letters, marks, numbers, punctuation, symbols, and the ASCII space character, from categories L, M, N, P, S and the ASCII space character. This categorization is the same as IsGraphic except that the only spacing character is ASCII space, U+0020.
func IsSpace ¶
IsSpace reports whether the rune is a space character as defined by Unicode's White Space property; in the Latin-1 space this is
'\t', '\n', '\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).
Other definitions of spacing characters are set by category Z and property Pattern_White_Space.
func SimpleFold ¶
SimpleFold iterates over Unicode code points equivalent under the Unicode-defined simple case folding. Among the code points equivalent to rune (including rune itself), SimpleFold returns the smallest rune >= r if one exists, or else the smallest rune >= 0.
For example:
SimpleFold('A') = 'a'
SimpleFold('a') = 'A'
SimpleFold('K') = 'k'
SimpleFold('k') = '\u212A' (Kelvin symbol, K)
SimpleFold('\u212A') = 'K'
SimpleFold('1') = '1'
Types ¶
type CaseRange ¶
CaseRange represents a range of Unicode code points for simple (one code point to one code point) case conversion. The range runs from Lo to Hi inclusive, with a fixed stride of 1. Deltas are the number to add to the code point to reach the code point for a different case for that character. They may be negative. If zero, it means the character is in the corresponding case. There is a special case representing sequences of alternating corresponding Upper and Lower pairs. It appears with a fixed Delta of
{UpperLower, UpperLower, UpperLower}
The constant UpperLower has an otherwise impossible delta value.
type Range16 ¶
Range16 represents of a range of 16-bit Unicode code points. The range runs from Lo to Hi inclusive and has the specified stride.
type Range32 ¶
Range32 represents of a range of Unicode code points and is used when one or more of the values will not fit in 16 bits. The range runs from Lo to Hi inclusive and has the specified stride. Lo and Hi must always be >= 1<<16.
type RangeTable ¶
RangeTable defines a set of Unicode code points by listing the ranges of code points within the set. The ranges are listed in two slices to save space: a slice of 16-bit ranges and a slice of 32-bit ranges. The two slices must be in sorted order and non-overlapping. Also, R32 should contain only values >= 0x10000 (1<<16).
type SpecialCase ¶
type SpecialCase []CaseRange
SpecialCase represents language-specific case mappings such as Turkish. Methods of SpecialCase customize (by overriding) the standard mappings.
var AzeriCase SpecialCase = _TurkishCase
var TurkishCase SpecialCase = _TurkishCase
func (SpecialCase) ToLower ¶
func (special SpecialCase) ToLower(r rune) rune
ToLower maps the rune to lower case giving priority to the special mapping.
func (SpecialCase) ToTitle ¶
func (special SpecialCase) ToTitle(r rune) rune
ToTitle maps the rune to title case giving priority to the special mapping.
func (SpecialCase) ToUpper ¶
func (special SpecialCase) ToUpper(r rune) rune
ToUpper maps the rune to upper case giving priority to the special mapping.
Directories
¶
| Path | Synopsis |
|---|---|
|
Package utf16 implements encoding and decoding of UTF-16 sequences.
|
Package utf16 implements encoding and decoding of UTF-16 sequences. |
|
Package utf8 implements functions and constants to support text encoded in UTF-8.
|
Package utf8 implements functions and constants to support text encoded in UTF-8. |