stringutil

package
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2026 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package stringutil provides string manipulation utilities.

The package is organized into several categories:

Search and Indexing

Functions for finding substrings and patterns:

indices := stringutil.AllIndexes("banana", "an")  // [1, 3]
ok := stringutil.HasAnyPrefix(s, "http://", "https://")
ok := stringutil.ContainsAll(s, "foo", "bar")

Transformation

Functions for transforming strings:

reversed := stringutil.Reverse("hello")  // "olleh"
truncated := stringutil.Truncate(s, 100, "...")
padded := stringutil.PadLeft("42", 5, '0')  // "00042"

Validation

Functions for checking string properties:

if stringutil.IsNumeric(s) { ... }
if stringutil.IsAlpha(s) { ... }
if stringutil.IsPalindrome(s) { ... }

Similarity (see similarity.go)

Algorithms for measuring string similarity:

distance := stringutil.LevenshteinDistance("kitten", "sitting")  // 3
score := stringutil.JaroWinklerSimilarity("martha", "marhta")    // ~0.96
coefficient := stringutil.DiceCoefficient("night", "nacht")

All functions are designed to be nil-safe and handle edge cases gracefully.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AllIndexes

func AllIndexes(s, substr string) []int

AllIndexes returns all starting positions of substr in s. Returns nil if substr is empty or s doesn't contain substr.

Example:

indices := AllIndexes("banana", "an")
// indices = [1, 3]

func Between

func Between(s, start, end string) (string, bool)

Between extracts the substring between start and end markers. Returns empty string and false if markers not found in proper order.

Example:

Between("[hello]", "[", "]")  // "hello", true

func BetweenAll

func BetweenAll(s, start, end string) []string

BetweenAll extracts all substrings between start and end markers.

Example:

BetweenAll("a[1]b[2]c[3]", "[", "]")  // ["1", "2", "3"]

func CamelCase

func CamelCase(s string) string

CamelCase converts s to camelCase.

Example:

CamelCase("hello_world")  // "helloWorld"
CamelCase("hello-world")  // "helloWorld"

func Capitalize

func Capitalize(s string) string

Capitalize returns s with the first character uppercased and the rest lowercased.

Example:

Capitalize("hELLO")  // "Hello"

func CleanString added in v1.1.0

func CleanString(input string, options ...CleanOption) (string, error)

CleanString applies the specified cleaning options to the input string. Options are applied in a fixed order for consistency and correctness:

  1. HTML stripping (remove tags, decode entities)
  2. Unicode normalization (NFKD + diacritic removal)
  3. Database sanitization (UTF-8 validation, NUL removal, truncation)

This order ensures that:

  • HTML entities are decoded before Unicode normalization processes the text
  • Database constraints (length, encoding) are applied last on the final result

Returns the cleaned string. Error is returned only for transformer failures (extremely unlikely with valid Go strings).

If no options are provided, the input string is returned unchanged.

Example:

// Apply all three cleaning modes:
result, err := CleanString(
    "<p>Héllo &amp; Wörld</p>",
    WithHTMLStrip(),
    WithUnicodeNorm(),
    WithDBSanitize(50),
)
// result = "Hello & World"

// Apply only Unicode normalization:
result, _ := CleanString("café", WithUnicodeNorm())
// result = "cafe"

func CommonPrefix

func CommonPrefix(strs ...string) string

CommonPrefix returns the longest common prefix of the given strings. Returns empty string if no common prefix or fewer than 2 strings.

Example:

CommonPrefix("interstellar", "internet", "internal")  // "inter"

func CommonSuffix

func CommonSuffix(strs ...string) string

CommonSuffix returns the longest common suffix of the given strings.

func ContainsAll

func ContainsAll(s string, substrs ...string) bool

ContainsAll reports whether s contains all of the given substrings.

func ContainsAny

func ContainsAny(s string, substrs ...string) bool

ContainsAny reports whether s contains any of the given substrings.

Example:

if ContainsAny(text, "error", "fail", "warning") { ... }

func CosineSimilarity

func CosineSimilarity(s1, s2 string, n int) float64

CosineSimilarity computes the cosine similarity of two strings based on their character n-gram vectors. Returns a value between 0 and 1.

This is useful for comparing longer texts.

func CountLines

func CountLines(s string) int

CountLines returns the number of lines in s. An empty string returns 0; a string without newlines returns 1.

func DamerauLevenshteinDistance

func DamerauLevenshteinDistance(s1, s2 string) int

DamerauLevenshteinDistance extends Levenshtein to include transpositions (swapping two adjacent characters) as a single edit operation.

Example:

DamerauLevenshteinDistance("ca", "ac")  // 1 (transposition)
LevenshteinDistance("ca", "ac")         // 2 (delete + insert)

func Dedent

func Dedent(s string) string

Dedent removes common leading whitespace from all lines.

Example:

Dedent("  a\n  b\n  c")  // "a\nb\nc"

func DiceCoefficient

func DiceCoefficient(s1, s2 string) float64

DiceCoefficient returns the Sørensen–Dice coefficient comparing bigrams. Returns a value between 0 and 1, where 1 means identical sets of bigrams.

This metric is useful for comparing short strings or when order matters less.

Example:

DiceCoefficient("night", "nacht")  // ~0.25

func HammingDistance

func HammingDistance(s1, s2 string) int

HammingDistance returns the number of positions where corresponding characters differ. Only defined for strings of equal length. Returns -1 if strings have different lengths.

Example:

HammingDistance("karolin", "kathrin")  // 3

func HasAnyPrefix

func HasAnyPrefix(s string, prefixes ...string) bool

HasAnyPrefix reports whether s starts with any of the given prefixes.

Example:

if HasAnyPrefix(url, "http://", "https://") { ... }

func HasAnySuffix

func HasAnySuffix(s string, suffixes ...string) bool

HasAnySuffix reports whether s ends with any of the given suffixes.

func Indent

func Indent(s, prefix string) string

Indent adds prefix to the beginning of each line in s.

Example:

Indent("a\nb\nc", "  ")  // "  a\n  b\n  c"

func IsASCII

func IsASCII(s string) bool

IsASCII reports whether s contains only ASCII characters.

func IsAlpha

func IsAlpha(s string) bool

IsAlpha reports whether s contains only alphabetic characters.

func IsAlphanumeric

func IsAlphanumeric(s string) bool

IsAlphanumeric reports whether s contains only letters and digits.

func IsBlank

func IsBlank(s string) bool

IsBlank reports whether s contains only whitespace characters.

func IsEmpty

func IsEmpty(s string) bool

IsEmpty reports whether s is empty (zero length).

func IsLower

func IsLower(s string) bool

IsLower reports whether all letters in s are lowercase. Returns true for strings with no letters.

func IsNumeric

func IsNumeric(s string) bool

IsNumeric reports whether s contains only numeric digits.

func IsPalindrome

func IsPalindrome(s string, normalize bool) bool

IsPalindrome reports whether s reads the same forwards and backwards. Case-sensitive and ignores whitespace/punctuation only if normalize is true.

Example:

IsPalindrome("racecar", false)           // true
IsPalindrome("A man a plan a canal Panama", true)  // true (normalized)

func IsPrintable

func IsPrintable(s string) bool

IsPrintable reports whether s contains only printable characters.

func IsUpper

func IsUpper(s string) bool

IsUpper reports whether all letters in s are uppercase. Returns true for strings with no letters.

func JaroSimilarity

func JaroSimilarity(s1, s2 string) float64

JaroSimilarity returns the Jaro similarity between two strings. Returns a value between 0 (completely different) and 1 (identical).

The algorithm considers: - Number of matching characters - Number of transpositions

Example:

JaroSimilarity("martha", "marhta")  // ~0.944

func JaroWinklerSimilarity

func JaroWinklerSimilarity(s1, s2 string, prefixScale float64) float64

JaroWinklerSimilarity returns the Jaro-Winkler similarity between two strings. This is an extension of Jaro that gives more weight to strings with a common prefix.

The prefixScale parameter (0 to 0.25) determines how much weight to give to the common prefix. Standard value is 0.1.

Example:

JaroWinklerSimilarity("martha", "marhta", 0.1)  // ~0.961

func Join

func Join(elems []string, sep string) string

Join concatenates elements with sep. Wrapper around strings.Join for API completeness.

func KebabCase

func KebabCase(s string) string

KebabCase converts s to kebab-case.

Example:

KebabCase("HelloWorld")  // "hello-world"

func LevenshteinDistance

func LevenshteinDistance(s1, s2 string) int

LevenshteinDistance returns the minimum number of single-character edits (insertions, deletions, substitutions) required to change s1 into s2.

Time complexity: O(len(s1) * len(s2)) Space complexity: O(min(len(s1), len(s2)))

Example:

LevenshteinDistance("kitten", "sitting")  // 3

func LevenshteinSimilarity

func LevenshteinSimilarity(s1, s2 string) float64

LevenshteinSimilarity returns a similarity score between 0 and 1 based on Levenshtein distance. 1 means identical strings.

Example:

LevenshteinSimilarity("hello", "hallo")  // ~0.8

func Lines

func Lines(s string) []string

Lines splits s into lines. Unlike strings.Split, handles \r\n properly.

Example:

lines := Lines("a\nb\nc")  // ["a", "b", "c"]

func LongestCommonSubsequence

func LongestCommonSubsequence(s1, s2 string) int

LongestCommonSubsequence returns the length of the longest common subsequence. A subsequence is a sequence that can be derived by deleting some elements without changing the order of remaining elements.

Example:

LongestCommonSubsequence("ABCDGH", "AEDFHR")  // 3 ("ADH")

func LongestCommonSubstring

func LongestCommonSubstring(s1, s2 string) string

LongestCommonSubstring returns the longest common contiguous substring.

Example:

LongestCommonSubstring("ABABC", "BABCA")  // "BABC"

func NormalizeUnicode added in v1.1.0

func NormalizeUnicode(s string) (string, error)

NormalizeUnicode applies NFKD normalization and removes combining marks (diacritics) from the input string.

The process:

  1. NFKD (Compatibility Decomposition): decomposes characters into their base form + combining marks (e.g., "é" → "e" + combining acute accent)
  2. Remove combining marks: strips all Unicode Mn (Mark, Nonspacing) characters
  3. Recompose to NFC for consistent output

Characters that are not letters or numbers are preserved as-is (spaces, punctuation, etc.).

Example:

NormalizeUnicode("café résumé")  // "cafe resume", nil
NormalizeUnicode("naïve")        // "naive", nil
NormalizeUnicode("Ångström")     // "Angstrom", nil

func NormalizeWhitespace added in v1.1.0

func NormalizeWhitespace(s string) string

NormalizeWhitespace collapses all consecutive whitespace characters (spaces, tabs, newlines) into a single space, and trims leading/trailing whitespace.

This is useful for cleaning user input or text extracted from HTML where whitespace may be irregular.

Example:

NormalizeWhitespace("  Hello   World  \n\t ")  // "Hello World"
NormalizeWhitespace("\t\n")                     // ""

func NthRune

func NthRune(s string, n int) (rune, bool)

NthRune returns the rune at rune position n (0-indexed). Returns (0, false) if n is out of bounds.

The previous implementation compared the byte offset i to n, which silently returned wrong results for multi-byte UTF-8 strings. This version counts runes explicitly.

func PadCenter

func PadCenter(s string, length int, padChar rune) string

PadCenter centers s by adding padChar on both sides. If odd padding needed, extra character goes on the right.

Example:

PadCenter("hello", 11, '*')  // "***hello***"

func PadLeft

func PadLeft(s string, length int, padChar rune) string

PadLeft pads s on the left with padChar to reach the target length. If s is already >= length, returns s unchanged.

Example:

PadLeft("42", 5, '0')  // "00042"

func PadRight

func PadRight(s string, length int, padChar rune) string

PadRight pads s on the right with padChar to reach the target length.

Example:

PadRight("42", 5, '0')  // "42000"

func PascalCase

func PascalCase(s string) string

PascalCase converts s to PascalCase.

Example:

PascalCase("hello_world")  // "HelloWorld"

func RemoveAccents added in v1.1.0

func RemoveAccents(s string) (string, error)

RemoveAccents is an alias for NormalizeUnicode that removes diacritical marks from characters. This is a common operation name used in many string-processing libraries.

Example:

RemoveAccents("café")  // "cafe", nil
RemoveAccents("über")  // "uber", nil

func RemoveAll

func RemoveAll(s string, substrs ...string) string

RemoveAll removes all occurrences of the given substrings from s.

Example:

clean := RemoveAll("hello world", "l", "o")  // "he wrd"

func RemoveNonPrintable added in v1.1.0

func RemoveNonPrintable(s string) string

RemoveNonPrintable removes all non-printable characters from s, except for common whitespace (space, tab, newline, carriage return).

This is useful for cleaning user input that may contain control characters, zero-width characters, or other invisible Unicode characters.

Example:

RemoveNonPrintable("Hello\x07World")  // "HelloWorld" (bell character removed)
RemoveNonPrintable("Hello\tWorld\n")  // "Hello\tWorld\n" (whitespace preserved)

func Repeat

func Repeat(s string, n int) string

Repeat returns s repeated n times. If n <= 0, returns empty string.

Example:

Repeat("ab", 3)  // "ababab"

func Reverse

func Reverse(s string) string

Reverse returns s with its characters in reverse order. Correctly handles multi-byte UTF-8 characters.

Example:

rev := Reverse("hello")  // "olleh"
rev := Reverse("日本語")   // "語本日"

func RuneCount

func RuneCount(s string) int

RuneCount returns the number of runes (Unicode code points) in s. This differs from len(s), which returns bytes.

Example:

RuneCount("日本語")  // 3
len("日本語")      // 9 (bytes)

func SafeSlice

func SafeSlice(s string, start, end int) string

SafeSlice safely slices s by rune indices, returning an empty string for invalid ranges. Useful when working with user input where indices might be out of bounds.

func SanitizeUTF8 added in v1.1.0

func SanitizeUTF8(s string) string

SanitizeUTF8 ensures the string contains only valid UTF-8 and removes NUL bytes (\x00) which are problematic for most databases.

Invalid UTF-8 byte sequences are replaced with U+FFFD (Unicode replacement character), following Go's standard behavior.

Example:

SanitizeUTF8("Hello\x00World")     // "HelloWorld"
SanitizeUTF8("Hello\xffWorld")     // "Hello\uFFFDWorld"
SanitizeUTF8("Valid UTF-8 string") // "Valid UTF-8 string" (unchanged)

func Slugify added in v1.1.0

func Slugify(s string) (string, error)

Slugify converts a string to a URL-friendly slug. It normalizes Unicode, lowercases, replaces non-alphanumeric characters with hyphens, collapses multiple hyphens, and trims leading/trailing hyphens.

Example:

Slugify("Hello, World!")        // "hello-world", nil
Slugify("Café Résumé")         // "cafe-resume", nil
Slugify("  Multiple   Spaces ") // "multiple-spaces", nil

func SnakeCase

func SnakeCase(s string) string

SnakeCase converts s to snake_case.

Example:

SnakeCase("HelloWorld")  // "hello_world"
SnakeCase("helloWorld")  // "hello_world"

func SplitAfter

func SplitAfter(s, sep string) []string

SplitAfter splits s after each instance of sep. Wrapper around strings.SplitAfter for consistency.

func SplitAndTrim added in v1.1.0

func SplitAndTrim(s, sep string) []string

SplitAndTrim splits s by sep, trims whitespace from each token, and drops any tokens that are empty after trimming.

This is the single most common string-processing pattern in Go backends — parsing comma-separated config values, CSV-like user input, and HTTP header lists all require split + trim + discard-blanks.

Example:

SplitAndTrim("  a , b ,  ,  c  ", ",") // ["a", "b", "c"]
SplitAndTrim("", ",")                  // nil

func SplitN

func SplitN(s, sep string, n int) []string

SplitN splits s by sep into at most n parts. If n <= 0, returns all parts (same as strings.Split). Wrapper around strings.SplitN for consistency.

func StripHTMLEntities added in v1.1.0

func StripHTMLEntities(s string) string

StripHTMLEntities removes all HTML/XML tags and decodes HTML entities.

Tag removal handles:

  • Standard tags: <p>, <br/>, <div class="x">
  • Self-closing tags: <br />, <img />
  • Script and style tags (content included — for full script removal, use a proper HTML parser)

Entity decoding handles (via html.UnescapeString):

  • Named entities: &amp; &lt; &gt; &quot; &apos; &nbsp; &copy; etc.
  • Decimal numeric: &#169; &#8212;
  • Hex numeric: &#x00A9; &#x2014;

Example:

StripHTMLEntities("<p>Hello &amp; World</p>")  // "Hello & World"
StripHTMLEntities("Price: &euro;10")           // "Price: €10"
StripHTMLEntities("5 &gt; 3 &amp;&amp; 2 &lt; 4") // "5 > 3 && 2 < 4"

func StripTags

func StripTags(s string) string

StripTags removes HTML/XML tags from s. This is a simple implementation that may not handle all edge cases.

Example:

StripTags("<p>Hello <b>World</b></p>")  // "Hello World"

func SwapCase

func SwapCase(s string) string

SwapCase swaps the case of each letter in s.

Example:

SwapCase("Hello World")  // "hELLO wORLD"

func Title

func Title(s string) string

Title returns s with the first character of each word uppercased.

Example:

Title("hello world")  // "Hello World"

func ToASCII added in v1.1.0

func ToASCII(s string) (string, error)

ToASCII converts a Unicode string to its closest ASCII representation by removing diacritics, replacing non-letter/non-number characters with spaces, and collapsing whitespace.

This is useful for generating slugs, search keys, or filenames from Unicode input.

Example:

ToASCII("Héllo, Wörld!")    // "Hello  World", nil
ToASCII("café résumé")      // "cafe resume", nil

func Truncate

func Truncate(s string, maxLen int, suffix string) string

Truncate shortens s to maxLen characters, appending suffix if truncated. The total length including suffix will not exceed maxLen.

Example:

Truncate("Hello World", 8, "...")  // "Hello..."

func TruncateRunes added in v1.1.0

func TruncateRunes(s string, maxLen int) string

TruncateRunes truncates s to at most maxLen runes. Unlike byte-level truncation, this is Unicode-safe and will never split a multi-byte character.

If maxLen <= 0, returns empty string. If s has fewer runes than maxLen, returns s unchanged.

Example:

TruncateRunes("Hello, 世界!", 8)  // "Hello, 世界"  (correct, not "Hello, \xe4")
TruncateRunes("café", 3)          // "caf"

func TruncateWords

func TruncateWords(s string, maxLen int, suffix string) string

TruncateWords truncates s at a word boundary, appending suffix if truncated. Attempts to break at word boundaries rather than mid-word.

func Words

func Words(s string) []string

Words splits s into words, treating any non-alphanumeric character as separator.

Example:

Words("hello, world!")  // ["hello", "world"]

func Wrap

func Wrap(s string, width int) string

Wrap wraps text at the specified width, breaking at word boundaries. Preserves existing line breaks.

Types

type CleanOption added in v1.1.0

type CleanOption func(*cleanConfig)

CleanOption configures a cleaning step for CleanString. Options are applied in a fixed, safe order regardless of the order they are passed:

  1. HTML stripping (first, to remove markup before text processing)
  2. Unicode normalization (second, to normalize the text content)
  3. Database sanitization (last, to enforce encoding/length constraints)

func WithDBSanitize added in v1.1.0

func WithDBSanitize(maxLen int) CleanOption

WithDBSanitize enables database sanitization:

  • Replaces invalid UTF-8 sequences with U+FFFD (replacement character)
  • Replaces NUL bytes (\x00) with empty string (NUL breaks PostgreSQL, MySQL, etc.)
  • Optionally truncates to maxLen runes (0 = no truncation)

Truncation is rune-aware: it will never cut a multi-byte character in half.

Note: This does NOT escape SQL. Use parameterized queries for SQL injection prevention. This function handles encoding-level sanitization only.

Example:

result, _ := CleanString("Hello\x00World", WithDBSanitize(0))
// result = "HelloWorld"

result, _ := CleanString("Hello World", WithDBSanitize(5))
// result = "Hello"

func WithHTMLStrip added in v1.1.0

func WithHTMLStrip() CleanOption

WithHTMLStrip enables HTML tag removal and entity decoding. All HTML/XML tags are stripped, and HTML entities are decoded to their Unicode equivalents (e.g., &amp; → &, &lt; → <, &nbsp; → space, &#169; → ©).

Uses Go's standard html.UnescapeString for entity decoding, which handles all named HTML entities, decimal (&#123;), and hex (&#x7B;) numeric entities.

Example:

result, _ := CleanString("<p>Hello &amp; World</p>", WithHTMLStrip())
// result = "Hello & World"

func WithUnicodeNorm added in v1.1.0

func WithUnicodeNorm() CleanOption

WithUnicodeNorm enables Unicode normalization: NFKD decomposition followed by removal of combining marks (diacritics). This converts characters like "é" → "e", "ñ" → "n", "ü" → "u".

This is useful for search indexing, comparison, and ensuring ASCII-compatible text from Unicode input.

Example:

result, _ := CleanString("café résumé", WithUnicodeNorm())
// result = "cafe resume"

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL