Documentation
¶
Overview ¶
Package textutil contains small normalization and cleanup helpers used across imports and API responses.
Index ¶
Constants ¶
const ( AuthorMatchAutoThreshold = 0.94 AuthorMatchAmbiguousMinimum = 0.88 )
Jaro-Winkler thresholds for author-name matching. Keep conservative so we do not silently merge distinct authors.
Variables ¶
This section is empty.
Functions ¶
func CleanDescription ¶
CleanDescription normalizes provider descriptions for plain-text UI display.
func JaroWinkler ¶
JaroWinkler returns the Jaro-Winkler similarity between s and t (range 0–1). Both inputs should be pre-normalised (e.g. lowercased) by the caller. The Winkler scaling factor p=0.1 is applied for up to 4 common prefix chars.
func NormalizeAuthorName ¶
NormalizeAuthorName lower-cases the name, strips punctuation/diacritics-adjacent characters, and collapses whitespace. Returned form is suitable for key-style equality comparisons but still preserves token spacing.
func NormalizeAuthorNameWithVariants ¶
NormalizeAuthorNameWithVariants returns a de-duplicated list of normalized forms of the author name, suitable for equality-style comparisons:
- base normalized ("r r haywood")
- suffix-stripped ("john smith" from "John Smith Jr.")
- compact-initials ("rr haywood")
- expanded-initials ("r r haywood" from "rr haywood")
- last-first ("haywood r r")
Callers should treat any match across the two variant sets as equivalent. The first element is always the canonical base form.
Types ¶
type AuthorMatchKind ¶
type AuthorMatchKind int
AuthorMatchKind classifies how confident a name match is.
const ( // AuthorMatchNone means no variant pairing was close enough to consider. AuthorMatchNone AuthorMatchKind = iota // AuthorMatchExact means a normalized variant of each side compared equal. AuthorMatchExact // AuthorMatchFuzzyAuto means the best Jaro-Winkler score across variants // cleared the auto-accept threshold. AuthorMatchFuzzyAuto // AuthorMatchFuzzyAmbiguous means the best score was close but below the // auto threshold; the caller should surface a review rather than silently // merging. AuthorMatchFuzzyAmbiguous )
type AuthorMatchResult ¶
type AuthorMatchResult struct {
Kind AuthorMatchKind
Score float64 // best Jaro-Winkler score observed (0 when Kind is None)
}
AuthorMatchResult is the outcome of comparing two author names across all supported variants.
func MatchAuthorName ¶
func MatchAuthorName(a, b string) AuthorMatchResult
MatchAuthorName compares two raw author names (no prior normalization required) and reports the strongest class of match. Exact-via-variants beats fuzzy; ambiguous-fuzzy never auto-matches.