strsim

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: MIT Imports: 6 Imported by: 0

README

strsim

CI Go Report Card codecov Go Reference

Comprehensive string similarity metrics for Go. Edit distance, token-based similarity, and phonetic encoding — all in one package with a unified API.

Features

  • 15+ algorithms in a single zero-dependency package
  • Unified interfaces — every metric implements Metric (similarity) or DistanceMetric (distance + similarity)
  • Correct Unicode handling — all algorithms operate on runes, not bytes
  • Safe edge case handling — empty strings return 1.0 similarity (never NaN), mismatched lengths return 0.0
  • Batch operationsFindBestMatch, FindTopN, FindAboveThreshold
  • Zero dependencies — stdlib only

Quick Start

go get github.com/jcoruiz/strsim
package main

import (
    "fmt"
    "github.com/jcoruiz/strsim"
)

func main() {
    // Edit distance
    fmt.Println(strsim.Levenshtein("kitten", "sitting"))  // 3

    // Normalized similarity [0, 1]
    fmt.Println(strsim.JaroWinklerSimilarity("martha", "marhta"))  // ~0.961

    // Phonetic matching
    fmt.Println(strsim.SoundexMatch("Robert", "Rupert"))  // true

    // Find best match from candidates
    candidates := []string{"golang", "gold", "golem", "gopher", "python"}
    match := strsim.FindBestMatch("go", candidates, strsim.NewJaroWinkler())
    fmt.Printf("%s (%.2f)\n", match.Value, match.Similarity)
}

Algorithms

Edit Distance
Algorithm Function Returns
Hamming Hamming(a, b) int, error (equal-length only)
Levenshtein Levenshtein(a, b) int
Optimal String Alignment OSA(a, b) int
Damerau-Levenshtein DamerauLevenshtein(a, b) int
Longest Common Subsequence LCSDistance(a, b) int

All edit distance functions have a corresponding *Similarity variant returning a normalized float64 in [0, 1].

Token-based / Set-based Similarity
Algorithm Function Notes
Jaro JaroSimilarity(a, b) Matching characters + transpositions
Jaro-Winkler JaroWinklerSimilarity(a, b) Jaro + prefix bonus
Cosine (n-gram) CosineSimilarity(a, b) TF-weighted n-gram vectors
Jaccard (n-gram) JaccardSimilarity(a, b) Set intersection / union
Sørensen-Dice (n-gram) DiceSimilarity(a, b) 2·intersection / (
Overlap Coefficient OverlapSimilarity(a, b) Intersection / min(
Phonetic Encoding
Algorithm Function Returns
American Soundex Soundex(s) string (4-char code)
Metaphone Metaphone(s) string
Double Metaphone DoubleMetaphone(s) string, string (primary, alternate)
NYSIIS NYSIIS(s) string

All phonetic functions have a corresponding *Match(a, b) variant returning bool.

Interfaces

Every metric implements at least one interface, making them interchangeable:

// Similarity metric — returns [0, 1] where 1.0 = identical.
type Metric interface {
    Similarity(a, b string) float64
}

// Distance metric — also returns raw edit distance.
type DistanceMetric interface {
    Metric
    Distance(a, b string) int
}

// Phonetic encoder.
type Encoder interface {
    Encode(s string) string
}
Using interfaces
// Use any Metric interchangeably
func findSimilar(query string, items []string, m strsim.Metric) {
    for _, item := range items {
        if m.Similarity(query, item) > 0.8 {
            fmt.Println(item)
        }
    }
}

// Works with any metric
findSimilar("golang", items, strsim.NewJaroWinkler())
findSimilar("golang", items, strsim.NewLevenshtein())
findSimilar("golang", items, strsim.NewDamerauLevenshtein())

Batch Operations

candidates := []string{"golang", "gold", "golem", "gopher", "python", "ruby"}
m := strsim.NewJaroWinkler()

// Best single match
best := strsim.FindBestMatch("go", candidates, m)

// Top N matches, sorted by similarity descending
top3 := strsim.FindTopN("go", candidates, 3, m)

// All matches above threshold
matches := strsim.FindAboveThreshold("go", candidates, 0.7, m)

Configurable Metrics

Most algorithms have configurable variants:

// Custom Levenshtein costs
lev := &strsim.LevenshteinMetric{
    InsertCost:  1,
    DeleteCost:  1,
    ReplaceCost: 2,
}
dist := lev.Distance("kitten", "sitting")

// Custom Jaro-Winkler parameters
jw := &strsim.JaroWinklerMetric{
    BoostThreshold: 0.7,
    PrefixSize:     4,
}
sim := jw.Similarity("martha", "marhta")

// Custom n-gram size
ng := &strsim.NgramMetric{Size: 3}  // trigrams instead of bigrams
sim = ng.Cosine("night", "nacht")

// Custom phonetic code length
enc := &strsim.SoundexEncoder{MaxLength: 6}
code := enc.Encode("Washington")

ASCII Fast Path

All metrics support an ASCIIOnly mode that skips rune conversion for faster processing of pure-ASCII input. This produces incorrect results for multi-byte UTF-8 strings — use only when you know your input is ASCII (identifiers, codes, URLs, English text).

// Up to 11x faster for ASCII-only input
m := &strsim.LevenshteinMetric{
    InsertCost: 1, DeleteCost: 1, ReplaceCost: 1,
    ASCIIOnly: true,
}
dist := m.Distance("kitten", "sitting")

// Works with batch operations too
jw := &strsim.JaroWinklerMetric{
    BoostThreshold: 0.7, PrefixSize: 4,
    ASCIIOnly: true,
}
match := strsim.FindBestMatch("query", candidates, jw)
Algorithm Rune (ns/op) ASCII (ns/op) Speedup
Hamming 113 10 11x
Damerau-Levenshtein 17,337 7,113 2.4x
Jaro-Winkler 474 341 1.4x
LCS 1,574 1,373 1.15x
Levenshtein 3,039 2,885 1.05x

Inputs: 43-char ASCII strings.

Benchmarks

Measured on AMD Ryzen, Go 1.22 (go test -bench=. -benchmem):

Algorithm ns/op B/op allocs/op
Soundex 30 21 3
Hamming 39 0 0
Jaro-Winkler 48 16 2
Jaro 54 32 2
Metaphone 56 32 3
NYSIIS 86 64 4
Double Metaphone 134 32 4
LCS 534 448 2
Cosine (bigram) 850 507 18
Levenshtein 1017 448 2
OSA 1030 672 3
Jaccard (bigram) 1172 1084 22

Inputs: "kitten"/"sitting" for edit distance, "Schneider" for phonetic, "night"/"nacht" for n-gram.

Why strsim?

The Go ecosystem for string similarity is fragmented across 8+ libraries, each with a subset of algorithms, inconsistent APIs, and known bugs:

  • go-edlib — NaN on empty strings, float32 precision, no phonetic
  • strutil — NaN on empty strings, negative similarity with custom costs
  • smetrics — operates on bytes not runes (broken Unicode), Soundex panics on empty input
  • matchr — most complete phonetic support but GPLv3 licensed

strsim consolidates everything into one MIT-licensed package with correct Unicode handling, consistent APIs, and no edge-case surprises.

License

MIT

Documentation

Overview

Package strsim provides comprehensive string similarity metrics, distance functions, and phonetic encoding algorithms.

It combines edit-distance, token-based, and phonetic algorithms in a single package with a unified API. Every metric returns both raw distance and normalized similarity in [0, 1], where 1.0 means identical.

Quick Start

// Edit distance
d := strsim.Levenshtein("kitten", "sitting")  // 3

// Normalized similarity [0, 1]
s := strsim.JaroWinklerSimilarity("martha", "marhta")  // ~0.961

// Phonetic encoding
code := strsim.Soundex("Robert")  // "R163"

// Find best match from a list
m := strsim.FindBestMatch("golang", candidates, strsim.NewJaroWinkler())

Interfaces

All similarity metrics implement the Metric interface, allowing them to be used interchangeably. Distance metrics additionally implement DistanceMetric. Phonetic encoders implement the Encoder interface.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CosineSimilarity

func CosineSimilarity(a, b string) float64

CosineSimilarity returns the cosine similarity between a and b using bigrams.

func DamerauLevenshtein

func DamerauLevenshtein(a, b string) int

DamerauLevenshtein returns the true Damerau-Levenshtein distance between a and b.

func DamerauLevenshteinSimilarity

func DamerauLevenshteinSimilarity(a, b string) float64

DamerauLevenshteinSimilarity returns the normalized similarity between a and b using the true Damerau-Levenshtein distance as a value in [0, 1].

func DiceSimilarity

func DiceSimilarity(a, b string) float64

DiceSimilarity returns the Sorensen-Dice similarity between a and b using bigrams.

func DoubleMetaphone

func DoubleMetaphone(s string) (primary, alternate string)

DoubleMetaphone returns the primary and alternate Double Metaphone codes for the given string using the default code length of 4.

func DoubleMetaphoneMatch

func DoubleMetaphoneMatch(a, b string) bool

DoubleMetaphoneMatch reports whether two strings share at least one common Double Metaphone code (primary or alternate).

func Hamming

func Hamming(a, b string) (int, error)

Hamming returns the Hamming distance between two strings. It returns an error if the strings have different rune lengths, since Hamming distance is only defined for strings of equal length.

func HammingSimilarity

func HammingSimilarity(a, b string) float64

HammingSimilarity returns the normalized Hamming similarity between a and b as a value in [0, 1]. Returns 1.0 when both strings are empty. Returns 0.0 when the strings have different rune lengths.

func JaccardSimilarity

func JaccardSimilarity(a, b string) float64

JaccardSimilarity returns the Jaccard similarity between a and b using bigrams.

func JaroSimilarity

func JaroSimilarity(a, b string) float64

JaroSimilarity returns the Jaro similarity between a and b as a value in [0, 1], where 1.0 means identical. The algorithm considers characters as matching if they appear within a window of max(len(a), len(b))/2 - 1 positions and counts transpositions among matched characters.

func JaroWinklerSimilarity

func JaroWinklerSimilarity(a, b string) float64

JaroWinklerSimilarity returns the Jaro-Winkler similarity between a and b using default settings (BoostThreshold = 0.7, PrefixSize = 4).

func LCS

func LCS(a, b string) int

LCS returns the length of the longest common subsequence of a and b. It uses O(min(m, n)) space with a two-row optimization.

func LCSDistance

func LCSDistance(a, b string) int

LCSDistance returns the LCS distance between a and b, defined as runeLen(a) + runeLen(b) - 2*LCS(a, b).

func LCSSimilarity

func LCSSimilarity(a, b string) float64

LCSSimilarity returns the normalized similarity between a and b based on the LCS length as a value in [0, 1]. Returns 1.0 when both strings are empty.

func Levenshtein

func Levenshtein(a, b string) int

Levenshtein returns the standard Levenshtein distance between a and b using unit costs for all operations.

func LevenshteinSimilarity

func LevenshteinSimilarity(a, b string) float64

LevenshteinSimilarity returns the normalized similarity between a and b as a value in [0, 1] using the standard Levenshtein distance.

func Metaphone

func Metaphone(s string) string

Metaphone returns the Metaphone code for the given string using the default code length of 4.

func MetaphoneMatch

func MetaphoneMatch(a, b string) bool

MetaphoneMatch reports whether two strings have the same Metaphone code.

func NYSIIS

func NYSIIS(s string) string

NYSIIS returns the NYSIIS code for the given string using the default code length of 6.

func NYSIISMatch

func NYSIISMatch(a, b string) bool

NYSIISMatch reports whether two strings produce the same NYSIIS code.

func OSA

func OSA(a, b string) int

OSA returns the Optimal String Alignment distance between a and b.

func OSASimilarity

func OSASimilarity(a, b string) float64

OSASimilarity returns the normalized similarity between a and b using the OSA distance as a value in [0, 1].

func OverlapSimilarity

func OverlapSimilarity(a, b string) float64

OverlapSimilarity returns the overlap coefficient between a and b using bigrams.

func Soundex

func Soundex(s string) string

Soundex returns the American Soundex code for the given string using the default code length of 4.

func SoundexMatch

func SoundexMatch(a, b string) bool

SoundexMatch reports whether two strings have the same Soundex code using the default code length.

Types

type DamerauLevenshteinMetric

type DamerauLevenshteinMetric struct {
	// ASCIIOnly skips rune conversion for faster ASCII processing.
	// Produces incorrect results for multi-byte UTF-8 input.
	ASCIIOnly bool
}

DamerauLevenshteinMetric computes the true Damerau-Levenshtein distance between two strings. Unlike OSA, it allows unrestricted transpositions, meaning substrings may be edited more than once.

Set ASCIIOnly to true for faster processing of pure-ASCII strings.

func NewDamerauLevenshtein

func NewDamerauLevenshtein() *DamerauLevenshteinMetric

NewDamerauLevenshtein returns a new DamerauLevenshteinMetric instance.

func (*DamerauLevenshteinMetric) Distance

func (m *DamerauLevenshteinMetric) Distance(a, b string) int

Distance returns the true Damerau-Levenshtein distance between a and b using the algorithm with a DA (last-row-seen) map. Time and space are O(m * n).

func (*DamerauLevenshteinMetric) Similarity

func (m *DamerauLevenshteinMetric) Similarity(a, b string) float64

Similarity returns the normalized similarity between a and b as a value in [0, 1]. Returns 1.0 when both strings are empty.

type DistanceMetric

type DistanceMetric interface {
	Metric
	Distance(a, b string) int
}

DistanceMetric computes both raw edit distance and normalized similarity.

type DoubleMetaphoneEncoder

type DoubleMetaphoneEncoder struct {
	// MaxLength is the maximum length of each generated code. Default is 4.
	MaxLength int
}

DoubleMetaphoneEncoder implements the Double Metaphone algorithm by Lawrence Philips. It produces two phonetic codes (primary and alternate) that capture different possible pronunciations, accounting for non-English name origins including Germanic, Slavic, Celtic, Greek, Italian, Spanish, and Chinese.

func NewDoubleMetaphone

func NewDoubleMetaphone() *DoubleMetaphoneEncoder

NewDoubleMetaphone returns a new DoubleMetaphoneEncoder with the default code length of 4.

func (*DoubleMetaphoneEncoder) Encode

func (e *DoubleMetaphoneEncoder) Encode(s string) (primary, alternate string)

Encode returns the primary and alternate Double Metaphone codes for the given string. Both codes are empty if the input contains no ASCII letters.

func (*DoubleMetaphoneEncoder) Match

func (e *DoubleMetaphoneEncoder) Match(a, b string) bool

Match reports whether two strings share at least one common Double Metaphone code.

type DualEncoder

type DualEncoder interface {
	Encode(s string) (primary, alternate string)
}

DualEncoder produces primary and alternate phonetic encodings.

type Encoder

type Encoder interface {
	Encode(s string) string
}

Encoder produces a phonetic encoding of a string.

type HammingMetric

type HammingMetric struct {
	// ASCIIOnly skips rune conversion for faster ASCII processing.
	// In this mode, length is measured in bytes instead of runes.
	ASCIIOnly bool
}

HammingMetric computes the Hamming distance between two strings of equal length. Hamming distance is the number of positions at which the corresponding characters differ.

Set ASCIIOnly to true for faster processing of pure-ASCII strings. In this mode, length is measured in bytes instead of runes.

func NewHamming

func NewHamming() *HammingMetric

NewHamming returns a new HammingMetric instance.

func (*HammingMetric) Distance

func (m *HammingMetric) Distance(a, b string) int

Distance returns the Hamming distance between a and b. It panics if the strings have different lengths (rune length, or byte length if ASCIIOnly).

func (*HammingMetric) Similarity

func (m *HammingMetric) Similarity(a, b string) float64

Similarity returns the normalized similarity between a and b as a value in [0, 1]. Returns 1.0 when both strings are empty. Returns 0.0 when the strings have different lengths.

type JaroMetric

type JaroMetric struct {
	// ASCIIOnly skips rune conversion for faster ASCII processing.
	// Produces incorrect results for multi-byte UTF-8 input.
	ASCIIOnly bool
}

JaroMetric computes the Jaro similarity between two strings. Jaro similarity is based on the number of matching characters and transpositions.

Set ASCIIOnly to true for faster processing of pure-ASCII strings.

func NewJaro

func NewJaro() *JaroMetric

NewJaro returns a new JaroMetric instance.

func (*JaroMetric) Similarity

func (m *JaroMetric) Similarity(a, b string) float64

Similarity returns the Jaro similarity between a and b.

type JaroWinklerMetric

type JaroWinklerMetric struct {
	// BoostThreshold is the minimum Jaro score required to apply the prefix
	// bonus. Default: 0.7.
	BoostThreshold float64

	// PrefixSize is the maximum number of prefix characters considered for
	// the bonus. Default: 4.
	PrefixSize int

	// ASCIIOnly skips rune conversion for faster ASCII processing.
	// Produces incorrect results for multi-byte UTF-8 input.
	ASCIIOnly bool
}

JaroWinklerMetric computes the Jaro-Winkler similarity between two strings. Jaro-Winkler extends Jaro with a prefix bonus that increases the score when the strings share a common prefix.

Set ASCIIOnly to true for faster processing of pure-ASCII strings.

func NewJaroWinkler

func NewJaroWinkler() *JaroWinklerMetric

NewJaroWinkler returns a new JaroWinklerMetric with default settings (BoostThreshold = 0.7, PrefixSize = 4).

func (*JaroWinklerMetric) Similarity

func (m *JaroWinklerMetric) Similarity(a, b string) float64

Similarity returns the Jaro-Winkler similarity between a and b as a value in [0, 1], where 1.0 means identical.

type LCSMetric

type LCSMetric struct {
	// ASCIIOnly skips rune conversion for faster ASCII processing.
	// Produces incorrect results for multi-byte UTF-8 input.
	ASCIIOnly bool
}

LCSMetric computes the Longest Common Subsequence (LCS) between two strings. The LCS distance is defined as len(a) + len(b) - 2*LCS(a, b), representing the minimum number of characters that must be deleted from both strings to make them equal.

Set ASCIIOnly to true for faster processing of pure-ASCII strings.

func NewLCS

func NewLCS() *LCSMetric

NewLCS returns a new LCSMetric instance.

func (*LCSMetric) Distance

func (m *LCSMetric) Distance(a, b string) int

Distance returns the LCS distance between a and b.

func (*LCSMetric) Similarity

func (m *LCSMetric) Similarity(a, b string) float64

Similarity returns the normalized similarity between a and b as a value in [0, 1]. Returns 1.0 when both strings are empty.

type LevenshteinMetric

type LevenshteinMetric struct {
	// InsertCost is the cost of inserting a character. Default: 1.
	InsertCost int
	// DeleteCost is the cost of deleting a character. Default: 1.
	DeleteCost int
	// ReplaceCost is the cost of replacing a character. Default: 1.
	ReplaceCost int
	// ASCIIOnly skips rune conversion for faster ASCII processing.
	// Produces incorrect results for multi-byte UTF-8 input.
	ASCIIOnly bool
}

LevenshteinMetric computes the Levenshtein edit distance between two strings. Levenshtein distance counts the minimum number of single-character insertions, deletions, and substitutions needed to transform one string into the other.

The metric supports configurable operation costs via InsertCost, DeleteCost, and ReplaceCost fields.

Set ASCIIOnly to true for faster processing of pure-ASCII strings. This skips the rune conversion but produces incorrect results for multi-byte UTF-8 input.

func NewLevenshtein

func NewLevenshtein() *LevenshteinMetric

NewLevenshtein returns a new LevenshteinMetric with all operation costs set to 1 (standard Levenshtein distance).

func (*LevenshteinMetric) Distance

func (m *LevenshteinMetric) Distance(a, b string) int

Distance returns the Levenshtein distance between a and b using the configured operation costs. It uses O(min(m, n)) space.

func (*LevenshteinMetric) Similarity

func (m *LevenshteinMetric) Similarity(a, b string) float64

Similarity returns the normalized similarity between a and b as a value in [0, 1]. It normalizes by dividing the distance by the maximum possible distance (max rune length * max operation cost). Returns 1.0 when both strings are empty.

type Match

type Match struct {
	// Value is the matched candidate string.
	Value string
	// Similarity is the similarity score in [0, 1].
	Similarity float64
	// Index is the position of this candidate in the original slice.
	Index int
}

Match represents a candidate string and its similarity score.

func FindAboveThreshold

func FindAboveThreshold(query string, candidates []string, threshold float64, m Metric) []Match

FindAboveThreshold returns all candidates whose similarity to query meets or exceeds the given threshold, sorted by similarity descending.

func FindBestMatch

func FindBestMatch(query string, candidates []string, m Metric) Match

FindBestMatch returns the candidate with the highest similarity to query according to the given Metric. If candidates is empty, it returns a Match with Index -1 and Similarity 0.

func FindTopN

func FindTopN(query string, candidates []string, n int, m Metric) []Match

FindTopN returns the top n candidates with the highest similarity to query, sorted by similarity descending. If n exceeds the number of candidates, all candidates are returned.

type MetaphoneEncoder

type MetaphoneEncoder struct {
	// MaxLength is the maximum length of the generated code. Default is 4.
	MaxLength int
}

MetaphoneEncoder implements the original Metaphone phonetic algorithm by Lawrence Philips. It transforms an English word into a phonetic key that represents its approximate pronunciation.

func NewMetaphone

func NewMetaphone() *MetaphoneEncoder

NewMetaphone returns a new MetaphoneEncoder with the default code length of 4.

func (*MetaphoneEncoder) Encode

func (e *MetaphoneEncoder) Encode(s string) string

Encode returns the Metaphone code for the given string. It returns an empty string if the input contains no ASCII letters.

func (*MetaphoneEncoder) Match

func (e *MetaphoneEncoder) Match(a, b string) bool

Match reports whether two strings produce the same Metaphone code.

type Metric

type Metric interface {
	Similarity(a, b string) float64
}

Metric computes similarity between two strings. Implementations return values in [0, 1] where 1.0 means identical.

type NYSIISEncoder

type NYSIISEncoder struct {
	// MaxLength is the maximum length of the generated code. Default is 6.
	// Set to 0 for unlimited length.
	MaxLength int
}

NYSIISEncoder implements the New York State Identification and Intelligence System (NYSIIS) phonetic algorithm. It produces a code that groups similar-sounding names together. The algorithm handles common English name patterns and is particularly effective for American names.

func NewNYSIIS

func NewNYSIIS() *NYSIISEncoder

NewNYSIIS returns a new NYSIISEncoder with the default code length of 6.

func (*NYSIISEncoder) Encode

func (e *NYSIISEncoder) Encode(s string) string

Encode returns the NYSIIS code for the given string. It returns an empty string if the input contains no ASCII letters.

func (*NYSIISEncoder) Match

func (e *NYSIISEncoder) Match(a, b string) bool

Match reports whether two strings produce the same NYSIIS code.

type NgramMetric

type NgramMetric struct {
	// Size is the n-gram size. Default: 2 (bigrams).
	Size int
}

NgramMetric computes string similarity using n-gram based coefficients. It supports Cosine, Jaccard, Sorensen-Dice, and Overlap similarity.

func NewNgram

func NewNgram() *NgramMetric

NewNgram returns a new NgramMetric with Size = 2 (bigrams).

func (*NgramMetric) Cosine

func (m *NgramMetric) Cosine(a, b string) float64

Cosine returns the cosine similarity between a and b using term-frequency n-gram vectors: dot(A, B) / (||A|| * ||B||). Returns 1.0 when both strings are empty, 0.0 when only one is empty.

func (*NgramMetric) Dice

func (m *NgramMetric) Dice(a, b string) float64

Dice returns the Sorensen-Dice similarity between a and b using set semantics: 2 * |intersection| / (|A| + |B|). Returns 1.0 when both strings are empty.

func (*NgramMetric) Jaccard

func (m *NgramMetric) Jaccard(a, b string) float64

Jaccard returns the Jaccard similarity between a and b using set semantics: |intersection| / |union|. Returns 1.0 when both strings are empty.

func (*NgramMetric) Overlap

func (m *NgramMetric) Overlap(a, b string) float64

Overlap returns the overlap coefficient between a and b: |intersection| / min(|A|, |B|). Returns 1.0 when both strings are empty.

func (*NgramMetric) Similarity

func (m *NgramMetric) Similarity(a, b string) float64

Similarity returns the Sorensen-Dice similarity, which is the default similarity coefficient for n-gram based comparison.

type OSAMetric

type OSAMetric struct {
	// ASCIIOnly skips rune conversion for faster ASCII processing.
	// Produces incorrect results for multi-byte UTF-8 input.
	ASCIIOnly bool
}

OSAMetric computes the Optimal String Alignment (restricted edit) distance between two strings. OSA extends the Levenshtein distance by also counting transpositions of two adjacent characters as a single operation, with the restriction that no substring is edited more than once.

Set ASCIIOnly to true for faster processing of pure-ASCII strings.

func NewOSA

func NewOSA() *OSAMetric

NewOSA returns a new OSAMetric instance.

func (*OSAMetric) Distance

func (m *OSAMetric) Distance(a, b string) int

Distance returns the OSA distance between a and b. It uses a 3-row optimization for O(3 * min(m, n)) space.

func (*OSAMetric) Similarity

func (m *OSAMetric) Similarity(a, b string) float64

Similarity returns the normalized similarity between a and b as a value in [0, 1]. Returns 1.0 when both strings are empty.

type SoundexEncoder

type SoundexEncoder struct {
	// MaxLength is the length of the generated Soundex code. Default is 4.
	MaxLength int
}

SoundexEncoder implements the American Soundex phonetic algorithm. It maps a name to a four-character code consisting of one letter followed by three digits, enabling approximate matching of names that sound alike.

func NewSoundex

func NewSoundex() *SoundexEncoder

NewSoundex returns a new SoundexEncoder with the default code length of 4.

func (*SoundexEncoder) Encode

func (e *SoundexEncoder) Encode(s string) string

Encode returns the Soundex code for the given string. It returns an empty string if the input contains no ASCII letters.

func (*SoundexEncoder) Match

func (e *SoundexEncoder) Match(a, b string) bool

Match reports whether two strings produce the same Soundex code.

Directories

Path Synopsis
examples
basic command
Package main demonstrates basic usage of the strsim library.
Package main demonstrates basic usage of the strsim library.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL