Documentation
¶
Index ¶
- func Distance(s1, s2 string, weights Weights, opts ...Option) int
- func Match(s1, s2 string, weights Weights, cutoff float64, similarityFunc SimilarityFunc, ...) bool
- func NormalizedDistance(s1, s2 string, weights Weights, opts ...Option) float64
- func NormalizedSimilarity(s1, s2 string, weights Weights, opts ...Option) float64
- func PartialSimilarity(s1, s2 string, weights Weights, opts ...Option) float64
- func RemovePunctuation(s1, s2 *string)
- func RemoveWhitespace(s1, s2 *string)
- func ToAlphanumeric(s1, s2 *string)
- func ToLowercase(s1, s2 *string)
- type BestMatchResult
- func GetBestMatch(s1 string, candidates []string, weights Weights, cutoff float64, ...) *BestMatchResult
- func GetBestMatches(s1 string, candidates []string, weights Weights, cutoff float64, ...) []BestMatchResult
- func GetBestMatchesSorted(s1 string, candidates []string, weights Weights, cutoff float64, ...) []BestMatchResult
- type Option
- type SimilarityFunc
- type Weights
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Distance ¶
Distance calculates the distance between two strings s1 and s2 using the provided weights for substitution, insertion, and deletion. Can use options to modify the input strings before calculating the distance, such as converting to lowercase, removing whitespace, or removing punctuation.
Example ¶
weights := DefaultWeights()
s1 := "kitten"
s2 := " SITTING "
fmt.Printf("Normal Distance: %d\n", Distance(s1, s2, weights))
fmt.Printf("Distance With Options: %d\n", Distance(s1, s2, weights, ToLowercase, RemoveWhitespace))
weightsCustom := Weights{
Substitution: 35,
Insertion: 1,
Deletion: 35,
}
s2Clean := "sitting"
fmt.Printf("Distance With Custom Weights: %d\n", Distance(s1, s2Clean, weightsCustom))
Output: Normal Distance: 11 Distance With Options: 3 Distance With Custom Weights: 71
func Match ¶ added in v0.6.0
func Match(s1, s2 string, weights Weights, cutoff float64, similarityFunc SimilarityFunc, opts ...Option) bool
Match returns true if the normalized similarity between s1 and s2 is greater than or equal to the cutoff.
Example ¶
weights := IndelWeights()
fmt.Println(Match("a test", "this is a test", weights, 0.5, NormalizedSimilarity))
fmt.Println(Match("a test", "this is a test", weights, 0.9, NormalizedSimilarity))
Output: true false
func NormalizedDistance ¶
NormalizedDistance calculates the normalized distance between two strings s1 and s2 using the provided weights for substitution, insertion, and deletion. The normalized distance is the distance divided by the sum of the lengths of the two strings, resulting in a value between 0 and 1. This means that the normalized distance is 0 when the strings are identical and approaches 1 as the strings become more different.
Example ¶
weights := DefaultWeights()
s1 := "kitten"
s2 := "sitting"
fmt.Printf("Normalized Distance: %.8f\n", NormalizedDistance(s1, s2, weights))
Output: Normalized Distance: 0.23076923
func NormalizedSimilarity ¶
NormalizedSimilarity calculates the normalized similarity between two strings s1 and s2 using the provided weights for substitution, insertion, and deletion. The normalized similarity is 1 minus the normalized distance, resulting in a value between 0 and 1. This means that the normalized similarity is 1 when the strings are identical and approaches 0 as the strings become more different.
Example ¶
weights := DefaultWeights()
s1 := "kitten"
s2 := "sitting"
fmt.Printf("Normalized Similarity: %.8f\n", NormalizedSimilarity(s1, s2, weights))
Output: Normalized Similarity: 0.76923077
func PartialSimilarity ¶ added in v0.4.0
PartialSimilarity calculates the partial similarity between two strings s1 and s2 using the provided weights for substitution, insertion, and deletion. The partial similarity is a measure of how similar the shorter string is to any substring of the longer string, with a penalty based on differing lengths. This may yield more desirable results when comparing strings of vastly differing lengths, depending on the use-case.
Example ¶
weights := IndelWeights()
s1 := "a test"
s2 := "this is a test"
fmt.Printf("Partial Similarity: %.8f\n", PartialSimilarity(s1, s2, weights))
Output: Partial Similarity: 0.85000000
func RemovePunctuation ¶
func RemovePunctuation(s1, s2 *string)
RemovePunctuation removes common punctuation characters from both strings.
func RemoveWhitespace ¶
func RemoveWhitespace(s1, s2 *string)
RemoveWhitespace removes all whitespace characters from both strings.
func ToAlphanumeric ¶ added in v0.2.0
func ToAlphanumeric(s1, s2 *string)
ToAlphanumeric removes all non-alphanumeric characters from both strings.
Types ¶
type BestMatchResult ¶ added in v0.6.0
type BestMatchResult struct {
Candidate string `json:"candidate"`
Similarity float64 `json:"similarity"`
}
BestMatchResult represents a candidate string and its similarity score to the input string.
func GetBestMatch ¶ added in v0.6.0
func GetBestMatch(s1 string, candidates []string, weights Weights, cutoff float64, similarityFunc SimilarityFunc, opts ...Option) *BestMatchResult
GetBestMatch returns the candidate string with the highest similarity to s1 that meets the cutoff threshold. If no candidate meets the cutoff, it returns nil.
Example ¶
weights := IndelWeights()
candidates := []string{"kitten", "sitting", "bitten", "written"}
result := GetBestMatch("kittens", candidates, weights, 0.5, NormalizedSimilarity)
if result != nil {
fmt.Printf("Best Match: %s (Similarity: %.8f)\n", result.Candidate, result.Similarity)
}
noMatch := GetBestMatch("xyz", candidates, weights, 0.99, NormalizedSimilarity)
fmt.Printf("No Match: %v\n", noMatch)
Output: Best Match: kitten (Similarity: 0.92307692) No Match: <nil>
func GetBestMatches ¶ added in v0.6.0
func GetBestMatches(s1 string, candidates []string, weights Weights, cutoff float64, similarityFunc SimilarityFunc, opts ...Option) []BestMatchResult
GetBestMatches returns a slice of BestMatchResult for all candidates that have a similarity to s1 greater than or equal to the cutoff.
Example ¶
weights := IndelWeights()
candidates := []string{"kitten", "sitting", "bitten", "written"}
results := GetBestMatches("kittens", candidates, weights, 0.5, NormalizedSimilarity)
fmt.Printf("Matches: %d\n", len(results))
for _, r := range results {
fmt.Printf(" %s: %.8f\n", r.Candidate, r.Similarity)
}
Output: Matches: 4 kitten: 0.92307692 sitting: 0.57142857 bitten: 0.76923077 written: 0.71428571
func GetBestMatchesSorted ¶ added in v0.6.0
func GetBestMatchesSorted(s1 string, candidates []string, weights Weights, cutoff float64, similarityFunc SimilarityFunc, opts ...Option) []BestMatchResult
GetBestMatchesSorted returns a slice of BestMatchResult for all candidates that have a similarity to s1 greater than or equal to the cutoff, sorted in descending order of similarity.
Example ¶
weights := IndelWeights()
candidates := []string{"kitten", "sitting", "sitting", "bitten", "written"}
results := GetBestMatchesSorted("kittens", candidates, weights, 0.5, NormalizedSimilarity)
fmt.Printf("Sorted Matches: %d\n", len(results))
for _, r := range results {
fmt.Printf(" %s: %.8f\n", r.Candidate, r.Similarity)
}
Output: Sorted Matches: 5 kitten: 0.92307692 bitten: 0.76923077 written: 0.71428571 sitting: 0.57142857 sitting: 0.57142857
type Option ¶
type Option func(s1, s2 *string)
Option is a type alias for a function that takes two string pointers which should modify them in place.
type SimilarityFunc ¶ added in v0.6.0
type Weights ¶
type Weights struct {
// Substitution is the cost of substituting one character for another.
// By default it should be set to 1.
Substitution int `json:"substitution"`
// Insertion is the cost of inserting a character into a string.
// By default it should be set to 1.
Insertion int `json:"insertion"`
// Deletion is the cost of deleting a character from a string.
// By default it should be set to 1.
Deletion int `json:"deletion"`
}
Weights defines the weights for substitution, insertion, and deletion operations.
func DefaultWeights ¶ added in v0.6.0
func DefaultWeights() Weights
DefaultWeights returns the default distance weights for substitution, insertion, and deletion. Substitution: 1, Insertion: 1, Deletion: 1
func IndelWeights ¶ added in v0.6.0
func IndelWeights() Weights
IndelWeights returns the distance weights for substitution, insertion, and deletion where substitutions are more expensive than insertions and deletions. Substitution: 2, Insertion: 1, Deletion: 1