unidecode

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2025 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package unidecode provides ASCII transliterations of Unicode text. Unicode characters are mapped to ASCII characters based on their phonetic representation.

The package provides three ways to transliterate Unicode text:

  1. The Unidecode function for transliterates a string into plain 7-bit ASCII.
  2. The Append function transliterates a string into plain 7-bit ASCII and appends the result to a byte slice.
  3. The NewWriter function creates a writer that transliterates Unicode text into plain 7-bit ASCII and writes the result to an io.Writer.

The package also provides an ErrorHandling type that specifies how to handle errors during transliteration.

The best results can be achieved by first applying NFC or NFKC normalizing to the input text:

import (
	"golang.org/x/text/unicode/norm"
	"github.com/aisbergg/go-unidecode/pkg/unidecode"
)

s := "北京kožušček"
n := norm.NFKC.String(s)
d, _ := unidecode.Unidecode(n, unidecode.Ignore)
fmt.Println(d)
// Output: Bei Jing kozuscek

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Append added in v1.2.0

func Append(b []byte, s string, errors ErrorHandling, replacement ...string) ([]byte, error)

Append transliterates Unicode text into plain 7-bit ASCII, appends the result to the byte slice, and returns the updated slice.

Example
package main

import (
	"fmt"

	"github.com/aisbergg/go-unidecode/pkg/unidecode"
)

func main() {
	s := "北京kožušček"
	buf := make([]byte, 0, len(s)+len(s)/3)
	b, err := unidecode.Append(buf, s, unidecode.Ignore)
	if err != nil {
		fmt.Println(err)
		return
	}
	fmt.Println(string(b))
}
Output:

Bei Jing kozuscek

func AppendBytes added in v1.2.0

func AppendBytes(b, s []byte, errors ErrorHandling, replacement ...string) ([]byte, error)

AppendBytes transliterates Unicode text into plain 7-bit ASCII, appends the result to the byte slice, and returns the updated slice.

func Unidecode

func Unidecode(s string, errors ErrorHandling, replacement ...string) (string, error)

Unidecode transliterates Unicode text into plain 7-bit ASCII.

Example
package main

import (
	"fmt"

	"github.com/aisbergg/go-unidecode/pkg/unidecode"
)

func main() {
	s := "北京kožušček"
	d, _ := unidecode.Unidecode(s, unidecode.Ignore)
	fmt.Println(d)
}
Output:

Bei Jing kozuscek
Example (ErrorPreserve)
package main

import (
	"fmt"

	"github.com/aisbergg/go-unidecode/pkg/unidecode"
)

func main() {
	s := "⁐"
	d, _ := unidecode.Unidecode(s, unidecode.Preserve)
	fmt.Println(d)
}
Output:

Example (ErrorReplace)
package main

import (
	"fmt"

	"github.com/aisbergg/go-unidecode/pkg/unidecode"
)

func main() {
	s := "⁐"
	d, _ := unidecode.Unidecode(s, unidecode.Replace, "?")
	fmt.Println(d)
}
Output:

?
Example (ErrorStrict)
package main

import (
	"fmt"

	"github.com/aisbergg/go-unidecode/pkg/unidecode"
)

func main() {
	s := "北京⁐"
	_, err := unidecode.Unidecode(s, unidecode.Strict)
	fmt.Println(err)
}
Output:

no replacement found for character '⁐' at offset 6

func UnidecodeBytes added in v1.2.0

func UnidecodeBytes(b []byte, errors ErrorHandling, replacement ...string) ([]byte, error)

UnidecodeBytes transliterates Unicode text into plain 7-bit ASCII.

Types

type Buffer added in v1.2.0

type Buffer interface {
	io.StringWriter
	fmt.Stringer
}

type Error

type Error struct {
	// contains filtered or unexported fields
}

Error represents an error that occurred during transliteration.

func (*Error) Error

func (e *Error) Error() string

Error returns the formatted error message.

type ErrorHandling

type ErrorHandling uint8

ErrorHandling specifies the behavior of Unidecode in case of an error.

const (
	// Ignore specifies that untransliteratable characters should be ignored.
	Ignore ErrorHandling = iota
	// Strict specifies that untransliteratable characters should cause an
	// error.
	Strict
	// Replace specifies that untransliteratable characters should be replaced
	// with a given replacement value.
	Replace
	// Preserve specifies that untransliteratable characters should be
	// preserved.
	Preserve
)

type Writer added in v1.2.0

type Writer struct {
	// contains filtered or unexported fields
}

Writer is an io.Writer that transliterates Unicode text into plain 7-bit ASCII.

func NewWriter added in v1.2.0

func NewWriter(w io.Writer, errors ErrorHandling, replacement ...string) Writer

NewWriter returns a new Writer.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/aisbergg/go-unidecode/pkg/unidecode"
)

func main() {
	s := "北京kožušček"
	bld := strings.Builder{}
	w := unidecode.NewWriter(&bld, unidecode.Ignore)
	w.Write([]byte(s))
	fmt.Println(bld.String())
}
Output:

Bei Jing kozuscek

func (Writer) Write added in v1.2.0

func (uw Writer) Write(p []byte) (n int, err error)

func (Writer) WriteString added in v1.2.0

func (uw Writer) WriteString(s string) (n int, err error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL