Documentation
¶
Overview ¶
Package anonymize provides deterministic, keyed-hash anonymization of text.
Model: input text is deterministically split into blocks. Each block is replaced by an HMAC-SHA256 digest keyed by a per-TENANT secret. Because HMAC is deterministic, the SAME block under the SAME tenant key always yields the IDENTICAL hash — enabling dedup and correlation WITHIN a tenant. Because the tenant key is mixed into the MAC, the SAME block under a DIFFERENT tenant key yields a DIFFERENT hash — giving cross-tenant unlinkability. The raw text is never present in the anonymized output (only fixed-width hex digests are).
The package is pure Go and uses only the standard library (crypto/hmac, crypto/sha256). It performs no I/O and is deterministic.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func SplitBlocks ¶
SplitBlocks deterministically splits text into blocks.
Splitting is on runs of whitespace (so it does not depend on the exact spacing in the input), and empty blocks are dropped. This is exported so tests and callers can reason about block boundaries independently of hashing.
Types ¶
type Anonymizer ¶
type Anonymizer struct {
// contains filtered or unexported fields
}
Anonymizer turns text into per-block keyed-hash tokens.
A non-empty domain is mixed into every MAC as a scheme/namespace separator so that digests produced by one Anonymizer configuration cannot collide with those of another configuration using the same tenant key and block text.
func New ¶
func New(domain string) *Anonymizer
New returns an Anonymizer for the given scheme domain (namespace). The domain is a configuration label (e.g. "prompt-v1"); it is NOT a secret and is the same across tenants. Cross-tenant separation comes from the per-tenant key supplied to Anonymize, not from the domain.
func (*Anonymizer) Anonymize ¶
func (a *Anonymizer) Anonymize(tenantKey []byte, text string) []string
Anonymize splits text into blocks and returns one lowercase-hex HMAC-SHA256 digest per block, keyed by tenantKey. The returned slice is positionally aligned with SplitBlocks(text).
Determinism: identical (tenantKey, domain, block) inputs always produce the identical digest. Unlinkability: changing tenantKey changes every digest.