Documentation
¶
Overview ¶
Package unidecode provides ASCII transliterations of Unicode text. Unicode characters are mapped to ASCII characters based on their phonetic representation.
The package provides three ways to transliterate Unicode text:
- The Unidecode function for transliterates a string into plain 7-bit ASCII.
- The Append function transliterates a string into plain 7-bit ASCII and appends the result to a byte slice.
- The NewWriter function creates a writer that transliterates Unicode text into plain 7-bit ASCII and writes the result to an io.Writer.
The package also provides an ErrorHandling type that specifies how to handle errors during transliteration.
The best results can be achieved by first applying NFC or NFKC normalizing to the input text:
import ( "golang.org/x/text/unicode/norm" "github.com/aisbergg/go-unidecode/pkg/unidecode" ) s := "北京kožušček" n := norm.NFKC.String(s) d, _ := unidecode.Unidecode(n, unidecode.Ignore) fmt.Println(d) // Output: Bei Jing kozuscek
Index ¶
- func Append(b []byte, s string, errors ErrorHandling, replacement ...string) ([]byte, error)
- func AppendBytes(b, s []byte, errors ErrorHandling, replacement ...string) ([]byte, error)
- func Unidecode(s string, errors ErrorHandling, replacement ...string) (string, error)
- func UnidecodeBytes(b []byte, errors ErrorHandling, replacement ...string) ([]byte, error)
- type Buffer
- type Error
- type ErrorHandling
- type Writer
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Append ¶ added in v1.2.0
Append transliterates Unicode text into plain 7-bit ASCII, appends the result to the byte slice, and returns the updated slice.
Example ¶
package main import ( "fmt" "github.com/aisbergg/go-unidecode/pkg/unidecode" ) func main() { s := "北京kožušček" buf := make([]byte, 0, len(s)+len(s)/3) b, err := unidecode.Append(buf, s, unidecode.Ignore) if err != nil { fmt.Println(err) return } fmt.Println(string(b)) }
Output: Bei Jing kozuscek
func AppendBytes ¶ added in v1.2.0
func AppendBytes(b, s []byte, errors ErrorHandling, replacement ...string) ([]byte, error)
AppendBytes transliterates Unicode text into plain 7-bit ASCII, appends the result to the byte slice, and returns the updated slice.
func Unidecode ¶
func Unidecode(s string, errors ErrorHandling, replacement ...string) (string, error)
Unidecode transliterates Unicode text into plain 7-bit ASCII.
Example ¶
package main import ( "fmt" "github.com/aisbergg/go-unidecode/pkg/unidecode" ) func main() { s := "北京kožušček" d, _ := unidecode.Unidecode(s, unidecode.Ignore) fmt.Println(d) }
Output: Bei Jing kozuscek
Example (ErrorPreserve) ¶
package main import ( "fmt" "github.com/aisbergg/go-unidecode/pkg/unidecode" ) func main() { s := "⁐" d, _ := unidecode.Unidecode(s, unidecode.Preserve) fmt.Println(d) }
Output: ⁐
Example (ErrorReplace) ¶
package main import ( "fmt" "github.com/aisbergg/go-unidecode/pkg/unidecode" ) func main() { s := "⁐" d, _ := unidecode.Unidecode(s, unidecode.Replace, "?") fmt.Println(d) }
Output: ?
Example (ErrorStrict) ¶
package main import ( "fmt" "github.com/aisbergg/go-unidecode/pkg/unidecode" ) func main() { s := "北京⁐" _, err := unidecode.Unidecode(s, unidecode.Strict) fmt.Println(err) }
Output: no replacement found for character '⁐' at offset 6
func UnidecodeBytes ¶ added in v1.2.0
func UnidecodeBytes(b []byte, errors ErrorHandling, replacement ...string) ([]byte, error)
UnidecodeBytes transliterates Unicode text into plain 7-bit ASCII.
Types ¶
type Error ¶
type Error struct {
// contains filtered or unexported fields
}
Error represents an error that occurred during transliteration.
type ErrorHandling ¶
type ErrorHandling uint8
ErrorHandling specifies the behavior of Unidecode in case of an error.
const ( // Ignore specifies that untransliteratable characters should be ignored. Ignore ErrorHandling = iota // Strict specifies that untransliteratable characters should cause an // error. Strict // Replace specifies that untransliteratable characters should be replaced // with a given replacement value. Replace // Preserve specifies that untransliteratable characters should be // preserved. Preserve )
type Writer ¶ added in v1.2.0
type Writer struct {
// contains filtered or unexported fields
}
Writer is an io.Writer that transliterates Unicode text into plain 7-bit ASCII.
func NewWriter ¶ added in v1.2.0
func NewWriter(w io.Writer, errors ErrorHandling, replacement ...string) Writer
NewWriter returns a new Writer.
Example ¶
package main import ( "fmt" "strings" "github.com/aisbergg/go-unidecode/pkg/unidecode" ) func main() { s := "北京kožušček" bld := strings.Builder{} w := unidecode.NewWriter(&bld, unidecode.Ignore) w.Write([]byte(s)) fmt.Println(bld.String()) }
Output: Bei Jing kozuscek