token

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

Documentation

Overview

Package token implements deterministic (HMAC-SHA256) and probabilistic (Bloom filter) tokenization plus the comparison primitives Equal, DicePerField, Score, and Match.

Most callers want Match — it wraps DicePerField + Score and returns the thresholded decision in one call. Even simpler: package session bundles a Tokenizer with a FieldSet so you don't have to thread the schema through every call.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DicePerField

func DicePerField(a, b sriracha.ProbabilisticToken) ([]float64, error)

DicePerField returns the Sørensen–Dice coefficient between corresponding fields of a and b. The result is one score in [0, 1] per field, in FieldSet order. A field with an all-zero filter on either side scores 0.

Returns an error if FieldSetVersion, KeyID, FieldSetFingerprint (when both sides set it), ProbabilisticParams, or field count differ — scores would not be comparable.

Most callers want Match — it wraps DicePerField + Score and returns the thresholded decision.

func Equal

func Equal(a, b sriracha.DeterministicToken) bool

Equal reports whether a and b are bit-identical across every field. It returns false if FieldSetVersion, KeyID, FieldSetFingerprint (when both sides set it), or field count differ. A field that is nil on one side and non-nil (or differently-sized) on the other compares unequal. Per-field byte comparison is constant-time.

func Score

func Score(perField []float64, fs sriracha.FieldSet) (float64, error)

Score returns the weight-normalised aggregate of perField against fs.Fields[i].Weight. Fields with non-positive weight are excluded from both numerator and denominator, so callers can mask out absent fields by zeroing their weight. Returns an error if the lengths do not match or no field has positive weight.

Types

type Calibration

type Calibration struct {
	OptimalThreshold float64    `json:"optimal_threshold"`
	F1               float64    `json:"f1"`
	Precision        float64    `json:"precision"`
	Recall           float64    `json:"recall"`
	ROC              []ROCPoint `json:"roc"`
}

Calibration is the output of Calibrate: the threshold that maximizes F1 over the labeled pairs, plus the full ROC curve at 0.01 step granularity.

func Calibrate

func Calibrate(pairs []LabeledPair, fs sriracha.FieldSet) (Calibration, error)

Calibrate sweeps thresholds in 0.01 steps from 0.00 to 1.00 (101 points) and reports the threshold that maximizes F1 over pairs. Use this to pick the threshold for production Match calls instead of guessing.

Cost is O(N×101 + N×fields_per_token) Dice operations. For N labeled pairs it computes Match exactly N times and reuses the resulting Score across all 101 thresholds.

Returns an error if pairs is empty, or if any pair fails the underlying Match call (mismatched FieldSetVersion, KeyID, fingerprint, params, etc.).

type LabeledPair

type LabeledPair struct {
	A     sriracha.ProbabilisticToken `json:"a"`
	B     sriracha.ProbabilisticToken `json:"b"`
	Match bool                        `json:"match"`
}

LabeledPair is one row of ground-truth: two ProbabilisticTokens believed to be either the same person (Match=true) or different people (Match=false).

type MatchResult

type MatchResult struct {
	Score            float64              `json:"score"`
	PerField         []float64            `json:"per_field"`
	Paths            []sriracha.FieldPath `json:"paths"`
	IsMatch          bool                 `json:"is_match"`
	ComparableFields int                  `json:"comparable_fields"`
}

MatchResult holds the output of Match: per-field Dice scores, the weighted aggregate Score in [0, 1], the threshold decision, the FieldSet paths in the same order as PerField, and a count of fields that contributed to the weighted average (excludes both-absent fields and fields with non-positive weight).

func Match

Match is the canonical entry point for probabilistic comparison: it wraps DicePerField + Score and returns the threshold decision in a single call.

Match compares a and b under fs and returns per-field Dice scores, the weighted aggregate, and a threshold decision. Fields with all-zero filters on both sides are treated as absent and drop from the weighted average; asymmetric absence (zero on one side, populated on the other) keeps its score of 0 and counts as a real mismatch signal.

If every field is both-absent (or zero-weighted), the returned MatchResult has Score=0, IsMatch=false, ComparableFields=0 — never an error. The error return is reserved for genuine mismatches: threshold out of range, version / key / fingerprint / params drift, or field-count disagreement between the tokens and fs.

func (MatchResult) ByPath

func (r MatchResult) ByPath() map[sriracha.FieldPath]float64

ByPath returns a fresh map keyed by FieldPath with each path's Dice score. Useful for downstream code that wants to look up scores without scanning.

func (MatchResult) ScoreFor

func (r MatchResult) ScoreFor(path sriracha.FieldPath) (float64, bool)

ScoreFor returns the per-field Dice score for path along with true if the path appears in the result. Paths with zero or negative weight that were dropped from the weighted average still appear here with their raw Dice score.

type Option

type Option func(*tokenizerOpts)

Option configures a Tokenizer at construction time.

func WithKeyID

func WithKeyID(id string) Option

WithKeyID labels every token emitted by the Tokenizer with the given key identifier. Comparison helpers use it to surface post-rotation mismatches.

type ROCPoint

type ROCPoint struct {
	Threshold float64 `json:"threshold"`
	Precision float64 `json:"precision"`
	Recall    float64 `json:"recall"`
	F1        float64 `json:"f1"`
}

ROCPoint is one threshold and the precision/recall/F1 it produces over the supplied labeled pairs.

type Tokenizer

type Tokenizer interface {
	// TokenizeDeterministic tokenizes a RawRecord in deterministic mode (HMAC-SHA256
	// per field). The returned token's Fields slice is aligned with fs.Fields:
	// each entry is a 32-byte HMAC for a present field, or nil for an absent
	// optional field. Missing required fields return an error.
	TokenizeDeterministic(record sriracha.RawRecord, fs sriracha.FieldSet) (sriracha.DeterministicToken, error)
	// TokenizeProbabilistic tokenizes a RawRecord in probabilistic (Bloom filter)
	// mode. The returned token's Fields slice is aligned with fs.Fields:
	// present fields contain the populated filter, absent optional fields
	// contain an all-zero filter of the same length. Missing required fields
	// return an error.
	TokenizeProbabilistic(record sriracha.RawRecord, fs sriracha.FieldSet) (sriracha.ProbabilisticToken, error)
	// TokenizeField returns the deterministic 32-byte HMAC for a single
	// (value, path) pair, after running the same normalization pipeline
	// TokenizeDeterministic uses. Useful for stable indexing of one field outside
	// the FieldSet flow.
	TokenizeField(value string, path sriracha.FieldPath) ([]byte, error)
	// Destroy wipes the secret buffer that backs this Tokenizer. Pooled HMAC
	// instances created from the secret may still hold derived key material
	// (inner/outer pad) on the heap until garbage-collected. The Tokenizer
	// must not be used after this call.
	Destroy()
}

Tokenizer produces tokens from RawRecords using a shared secret. Call Destroy when finished to wipe the source secret buffer; if you forget, a runtime cleanup wipes it once the Tokenizer becomes unreachable.

Tokenizer is safe for concurrent use by multiple goroutines until Destroy is called; HMAC instances are pooled internally. Calling any tokenize method after Destroy is undefined.

Most callers want a session.Session — it bundles a Tokenizer with a FieldSet so you don't have to thread the schema through every call.

func New

func New(secret []byte, opts ...Option) (Tokenizer, error)

New creates a Tokenizer with the given HMAC secret. The secret is copied into a locked, non-swappable memory region and the source slice is wiped. Returns an error if secret is empty.

A runtime finalizer wipes the locked buffer if the returned Tokenizer becomes unreachable without an explicit Destroy call.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL