textutil

package
v1.11.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package textutil contains small normalization and cleanup helpers used across imports and API responses.

Index

Constants

View Source
const (
	AuthorMatchAutoThreshold    = 0.94
	AuthorMatchAmbiguousMinimum = 0.88
)

Jaro-Winkler thresholds for author-name matching. Keep conservative so we do not silently merge distinct authors.

Variables

This section is empty.

Functions

func CleanDescription

func CleanDescription(description string) string

CleanDescription normalizes provider descriptions for plain-text UI display.

func JaroWinkler

func JaroWinkler(s, t string) float64

JaroWinkler returns the Jaro-Winkler similarity between s and t (range 0–1). Both inputs should be pre-normalised (e.g. lowercased) by the caller. The Winkler scaling factor p=0.1 is applied for up to 4 common prefix chars.

func NormalizeAuthorName

func NormalizeAuthorName(name string) string

NormalizeAuthorName lower-cases the name, strips punctuation/diacritics-adjacent characters, and collapses whitespace. Returned form is suitable for key-style equality comparisons but still preserves token spacing.

func NormalizeAuthorNameWithVariants

func NormalizeAuthorNameWithVariants(name string) []string

NormalizeAuthorNameWithVariants returns a de-duplicated list of normalized forms of the author name, suitable for equality-style comparisons:

  • base normalized ("r r haywood")
  • suffix-stripped ("john smith" from "John Smith Jr.")
  • compact-initials ("rr haywood")
  • expanded-initials ("r r haywood" from "rr haywood")
  • last-first ("haywood r r")

Callers should treat any match across the two variant sets as equivalent. The first element is always the canonical base form.

Types

type AuthorMatchKind

type AuthorMatchKind int

AuthorMatchKind classifies how confident a name match is.

const (
	// AuthorMatchNone means no variant pairing was close enough to consider.
	AuthorMatchNone AuthorMatchKind = iota
	// AuthorMatchExact means a normalized variant of each side compared equal.
	AuthorMatchExact
	// AuthorMatchFuzzyAuto means the best Jaro-Winkler score across variants
	// cleared the auto-accept threshold.
	AuthorMatchFuzzyAuto
	// AuthorMatchFuzzyAmbiguous means the best score was close but below the
	// auto threshold; the caller should surface a review rather than silently
	// merging.
	AuthorMatchFuzzyAmbiguous
)

type AuthorMatchResult

type AuthorMatchResult struct {
	Kind  AuthorMatchKind
	Score float64 // best Jaro-Winkler score observed (0 when Kind is None)
}

AuthorMatchResult is the outcome of comparing two author names across all supported variants.

func MatchAuthorName

func MatchAuthorName(a, b string) AuthorMatchResult

MatchAuthorName compares two raw author names (no prior normalization required) and reports the strongest class of match. Exact-via-variants beats fuzzy; ambiguous-fuzzy never auto-matches.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL