anonymizer

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 24, 2024 License: MIT Imports: 7 Imported by: 0

README

anonymizer

Go package for anonymizing text. It removes all kinds of PII: names, places, phone numbers, etc.

The main design principle is "better safe than sorry": if it's not sure if a word should be anonymized, it gets anonymized. It includes all non-dictionary words and words starting with a capital letter (which aren't at the beginning of a sentence).

Example

Input:

Good morning, doctor. My name is Gram. I live in amsterdam, at kerkstraat 42. My social number is 123-456.

Output:

Good morning, doctor. My name is █▄▄▄. I live in ▄▄▄▄▄▄▄▄▄, at ▄▄▄▄▄▄▄▄▄▄ 00. My social number is 000-000.

Installation

go get github.com/orsinium-labs/anonymizer

Make sure you have dictionaries installed for the language you're going to anonymize. For example, for American English:

sudo apt install wamerican

To list dictionaries that you already have installed:

ls /usr/share/dict

To list all dictionaries that can be installed:

sudo apt install aptitude
aptitude search '?provides(wordlist)'

If the language is not found or not provided, the default one will be used. Run sudo select-default-wordlist to change the system default.

Usage

input := "Hi, my name is Gram."
dict, err := anonymizer.LoadDict("en")
if err != nil {
    panic(err)
}
a := anonymizer.New(dict)
a.Language = "en"
output := a.Anonymize(input)
fmt.Println(output)
// Output: Hi, my name is Xxxx.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Anonymizer

type Anonymizer struct {
	// The dictionary to use to see if a word is a dictionary word.
	Dict *Dict
	// The placeholder to use instead of uppercase letters.
	Uppercase rune
	// The placeholder to use instead of lowercase letters.
	Lowercase rune
	// The placeholder to use instead of digits.
	Digit rune
}

func New

func New(dict *Dict) Anonymizer

func (Anonymizer) Anonymize

func (a Anonymizer) Anonymize(text string) string

Replace with a placeholder all non-dictionary words in the text.

type Dict

type Dict = trie.Trie[struct{}]

Dictionary of words.

func LoadDict

func LoadDict(lang string) (*Dict, error)

Load dictionary for the given language.

If the language is not found or not provided, the default one will be used. Run `sudo select-default-wordlist` to change the system default.

func MustLoadDict

func MustLoadDict(lang string) *Dict

A wrapper around LoadDict that panics on error.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL