README
Go Text
This repository holds supplementary Go libraries for text processing, many involving Unicode.
Semantic Versioning
This repo uses Semantic versioning (http://semver.org/), so
- MAJOR version when you make incompatible API changes,
- MINOR version when you add functionality in a backwards-compatible manner, and
- PATCH version when you make backwards-compatible bug fixes.
Until version 1.0.0 of x/text is reached, the minor version is considered a major version. So going from 0.1.0 to 0.2.0 is considered to be a major version bump.
A major new CLDR version is mapped to a minor version increase in x/text. Any other new CLDR version is mapped to a patch version increase in x/text.
It is important that the Unicode version used in x/text
matches the one used
by your Go compiler. The x/text
repository supports multiple versions of
Unicode and will match the version of Unicode to that of the Go compiler. At the
moment this is supported for Go compilers from version 1.7.
Download/Install
The easiest way to install is to run go get -u golang.org/x/text
. You can
also manually git clone the repository to $GOPATH/src/golang.org/x/text
.
Contribute
To submit changes to this repository, see http://golang.org/doc/contribute.html.
To generate the tables in this repository (except for the encoding tables), run go generate from this directory. By default tables are generated for the Unicode version in core and the CLDR version defined in golang.org/x/text/unicode/cldr.
Running go generate will as a side effect create a DATA subdirectory in this directory, which holds all files that are used as a source for generating the tables. This directory will also serve as a cache.
Testing
Run
go test ./...
from this directory to run all tests. Add the "-tags icu" flag to also run ICU conformance tests (if available). This requires that you have the correct ICU version installed on your system.
TODO:
- updating unversioned source files.
Generating Tables
To generate the tables in this repository (except for the encoding
tables), run go generate
from this directory. By default tables are
generated for the Unicode version in core and the CLDR version defined in
golang.org/x/text/unicode/cldr.
Running go generate will as a side effect create a DATA subdirectory in this directory which holds all files that are used as a source for generating the tables. This directory will also serve as a cache.
Versions
To update a Unicode version run
UNICODE_VERSION=x.x.x go generate
where x.x.x
must correspond to a directory in https://www.unicode.org/Public/.
If this version is newer than the version in core it will also update the
relevant packages there. The idna package in x/net will always be updated.
To update a CLDR version run
CLDR_VERSION=version go generate
where version
must correspond to a directory in
https://www.unicode.org/Public/cldr/.
Note that the code gets adapted over time to changes in the data and that backwards compatibility is not maintained. So updating to a different version may not work.
The files in DATA/{iana|icu|w3|whatwg} are currently not versioned.
Report Issues / Send Patches
This repository uses Gerrit for code changes. To learn how to submit changes to this repository, see https://golang.org/doc/contribute.html.
The main issue tracker for the image repository is located at https://github.com/golang/go/issues. Prefix your issue with "x/text:" in the subject line, so it is easy to find.
Documentation
Overview ¶
text is a repository of text-related packages related to internationalization (i18n) and localization (l10n), such as character encodings, text transformations, and locale-specific text handling.
There is a 30 minute video, recorded on 2017-11-30, on the "State of golang.org/x/text" at https://www.youtube.com/watch?v=uYrDrMEGu58
Directories
Path | Synopsis |
---|---|
cases | Package cases provides general and language-specific case mappers. |
cmd/gotext | gotext is a tool for managing text in Go source code. |
cmd/gotext/examples/extract | |
cmd/gotext/examples/extract_http | |
cmd/gotext/examples/extract_http/pkg | |
cmd/gotext/examples/rewrite | |
collate | Package collate contains types for comparing and sorting Unicode strings according to a given collation order. |
collate/build | |
collate/tools/colcmp | |
currency | Package currency contains currency-related functionality. |
date | |
encoding | Package encoding defines an interface for character encodings, such as Shift JIS and Windows 1252, that can convert to and from UTF-8. |
encoding/charmap | Package charmap provides simple character encodings such as IBM Code Page 437 and Windows 1252. |
encoding/htmlindex | Package htmlindex maps character set encoding names to Encodings as recommended by the W3C for use in HTML 5. |
encoding/ianaindex | Package ianaindex maps names to Encodings as specified by the IANA registry. |
encoding/internal | Package internal contains code that is shared among encoding implementations. |
encoding/internal/enctest | |
encoding/internal/identifier | Package identifier defines the contract between implementations of Encoding and Index by defining identifiers that uniquely identify standardized coded character sets (CCS) and character encoding schemes (CES), which we will together refer to as encodings, for which Encoding implementations provide converters to and from UTF-8. |
encoding/japanese | Package japanese provides Japanese encodings such as EUC-JP and Shift JIS. |
encoding/korean | Package korean provides Korean encodings such as EUC-KR. |
encoding/simplifiedchinese | Package simplifiedchinese provides Simplified Chinese encodings such as GBK. |
encoding/traditionalchinese | Package traditionalchinese provides Traditional Chinese encodings such as Big5. |
encoding/unicode | Package unicode provides Unicode encodings such as UTF-16. |
encoding/unicode/utf32 | Package utf32 provides the UTF-32 Unicode encoding. |
feature/plural | Package plural provides utilities for handling linguistic plurals in text. |
internal | Package internal contains non-exported functionality that are used by packages in the text repository. |
internal/catmsg | Package catmsg contains support types for package x/text/message/catalog. |
internal/cldrtree | Package cldrtree builds and generates a CLDR index file, including all inheritance. |
internal/colltab | Package colltab contains functionality related to collation tables. |
internal/export/idna | Package idna implements IDNA2008 using the compatibility processing defined by UTS (Unicode Technical Standard) #46, which defines a standard to deal with the transition from IDNA2003. |
internal/export/unicode | Package unicode generates the Unicode tables in core. |
internal/format | Package format contains types for defining language-specific formatting of values. |
internal/gen | Package gen contains common code for the various code generation tools in the text repository. |
internal/gen/bitfield | Package bitfield converts annotated structs into integer values. |
internal/language | |
internal/language/compact | Package compact defines a compact representation of language tags. |
internal/number | Package number contains tools and data for formatting numbers. |
internal/stringset | Package stringset provides a way to represent a collection of strings compactly. |
internal/tag | Package tag contains functionality handling tags and related data. |
internal/testtext | Package testtext contains test data that is of common use to the text repository. |
internal/triegen | Package triegen implements a code generator for a trie for associating unsigned integer values with UTF-8 encoded runes. |
internal/ucd | Package ucd provides a parser for Unicode Character Database files, the format of which is defined in https://www.unicode.org/reports/tr44/. |
internal/utf8internal | Package utf8internal contains low-level utf8-related constants, tables, etc. |
language | Package language implements BCP 47 language tags and related functionality. |
language/display | Package display provides display names for languages, scripts and regions in a requested language. |
message | Package message implements formatted I/O for localized strings with functions analogous to the fmt's print functions. |
message/catalog | Package catalog defines collections of translated format strings. |
message/pipeline | Package pipeline provides tools for creating translation pipelines. |
number | Package number formats numbers according to the customs of different locales. |
runes | Package runes provide transforms for UTF-8 encoded text. |
search | Package search provides language-specific search and string matching. |
secure | secure is a repository of text security related packages. |
secure/bidirule | Package bidirule implements the Bidi Rule defined by RFC 5893. |
secure/precis | Package precis contains types and functions for the preparation, enforcement, and comparison of internationalized strings ("PRECIS") as defined in RFC 8264. |
transform | Package transform provides reader and writer wrappers that transform the bytes passing through as well as various transformations. |
unicode | unicode holds packages with implementations of Unicode standards that are mostly used as building blocks for other packages in golang.org/x/text, layout engines, or are otherwise more low-level in nature. |
unicode/bidi | Package bidi contains functionality for bidirectional text support. |
unicode/cldr | Package cldr provides a parser for LDML and related XML formats. |
unicode/norm | Package norm contains types and functions for normalizing Unicode strings. |
unicode/rangetable | Package rangetable provides utilities for creating and inspecting unicode.RangeTables. |
unicode/runenames | Package runenames provides rune names from the Unicode Character Database. |
width | Package width provides functionality for handling different widths in text. |