utf-deconfuse

command module
v0.0.0-...-282c629 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2025 License: MIT Imports: 6 Imported by: 0

README

Unicode Deconfuser

utf-deconfuse detects and replaces unicode confusable characters.

See background and security risks for more information on why this program exists.

Quick start

go install codeberg.org/pft/utf-deconfuse@latest

# go installs the executable under ${GOPATH}/bin
# run 'go env' to find out more.
utf-deconfuse -help
Usage of ./utf-deconfuse:
  -diff
    	Enable diff mode
  -file string
    	Path to file to check for consuables
  -ignore-ascii
    	Do not consider ASCII codepoints (even if confusabel, e.g., '"') (default true)
  -output string
    	Path to output file (defaults to stdout)

Usage notes:

  • utf-deconfuse can also read from stdin when -file is not specified.
  • ASCII characters are ignored by default (enable with -ignore-ascii false).
  • diff prints what has been changed and is meant to use within the terminal.

Background

Unicode has about 300k code points (characters). Many of them look rather similar and can be confused with each other. For example, a (0061, LATIN SMALL LETTER A) looks much like 𝖺 (1D5BA, MATHEMATICAL SANS-SERIF SMALL A). Unicode refers to these code points as "Confusable Characters".

Security risks

Confusable characters can be and have been abused in a number of attacks:

  • IDN homograph attack: confusable characters used in domain names to impersonate a legitimate name, e.g., wikipediа.org instead of wikipedia.org.
  • Trojan source attack: confusable characters used in the sources code (see also here).

An in-depth discussion of security issues is given in Unicode Technical Report #36. Unicode Technical Standard #39 discusses detection mechanisms.

A note on the name

Actually it should be called unicode-deconfuse as utf is only the encoding, but utf-deconfuse is much more shorter.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
Package deconfuse provides convinient wrappers to conver UTF confusable runes.
Package deconfuse provides convinient wrappers to conver UTF confusable runes.
transform
Code generated.
Code generated.
internal
files
Package transform provides helper functions to manage files.
Package transform provides helper functions to manage files.
mapgen
Package util provides convenient methods to fetch and parse table of confusable characters from unicode.org.
Package util provides convenient methods to fetch and parse table of confusable characters from unicode.org.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL