Documentation
¶
Overview ¶
Package fuzzysearch provides fuzzy string matching with Levenshtein distance ranking, useful for filtering data quickly based on lightweight user input.
It is NOT safe for concurrent use; callers must synchronize access.
Index ¶
- func Analyze(text string) []string
- func Find(source string, targets []string) []string
- func FindFold(source string, targets []string) []string
- func FindNormalized(source string, targets []string) []string
- func FindNormalizedFold(source string, targets []string) []string
- func Intersection(a []int, b []int) []int
- func LevenshteinDistance(s, t string) int
- func LowercaseFilter(tokens []string) []string
- func Match(source, target string) bool
- func MatchFold(source, target string) bool
- func MatchNormalized(source, target string) bool
- func MatchNormalizedFold(source, target string) bool
- func NoldTransformer() transform.Transformer
- func NoopTransformer() transform.Transformer
- func NormalizeTransformer() transform.Transformer
- func NormalizedFoldTransformer() transform.Transformer
- func RankMatch(source, target string) int
- func RankMatchFold(source, target string) int
- func RankMatchNormalized(source, target string) int
- func RankMatchNormalizedFold(source, target string) int
- func StopwordFilter(tokens []string, stopwords set.GenericDataSet[string]) []string
- func StringTransform(s string, t transform.Transformer) (transformed string)
- func Tokenize(text string) []string
- type Document
- type Index
- type Rank
- type Ranks
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Find ¶
Find will return a list of strings in targets that fuzzy matches source.
Example ¶
fmt.Print(Find("whl", []string{"cartwheel", "foobar", "wheel", "baz"}))
Output: [cartwheel wheel]
func FindFold ¶
FindFold is a case-insensitive version of Find.
Example ¶
package main
import (
"fmt"
"github.com/FrogoAI/memory/fuzzysearch"
)
func main() {
results := fuzzysearch.FindFold("WHL", []string{"cartwheel", "foobar", "Wheel"})
fmt.Println(results)
}
Output: [cartwheel Wheel]
func FindNormalized ¶
FindNormalized is a unicode-normalized version of Find.
func FindNormalizedFold ¶
FindNormalizedFold is a unicode-normalized and case-insensitive version of Find.
func LevenshteinDistance ¶
LevenshteinDistance measures the difference between two strings. The Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.
This implemention is optimized to use O(minInt(m,n)) space and is based on the optimized C version found here: http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance#C
Example ¶
package main
import (
"fmt"
"github.com/FrogoAI/memory/fuzzysearch"
)
func main() {
fmt.Println(fuzzysearch.LevenshteinDistance("kitten", "sitting"))
}
Output: 3
func LowercaseFilter ¶
LowercaseFilter lowercase all tokens
func Match ¶
Match returns true if source matches target using a fuzzy-searching algorithm. Note that it doesn't implement Levenshtein distance (see RankMatch instead), but rather a simplified version where there's no approximation. The method will return true only if each character in the source can be found in the target and occurs after the preceding matches.
Example ¶
fmt.Print(Match("twl", "cartwheel"))
Output: true
func MatchFold ¶
MatchFold is a case-insensitive version of Match.
Example ¶
package main
import (
"fmt"
"github.com/FrogoAI/memory/fuzzysearch"
)
func main() {
fmt.Println(fuzzysearch.MatchFold("WHL", "cartwheel"))
}
Output: true
func MatchNormalized ¶
MatchNormalized is a unicode-normalized version of Match.
func MatchNormalizedFold ¶
MatchNormalizedFold is a unicode-normalized and case-insensitive version of Match.
Example ¶
package main
import (
"fmt"
"github.com/FrogoAI/memory/fuzzysearch"
)
func main() {
fmt.Println(fuzzysearch.MatchNormalizedFold("cafe", "Café"))
}
Output: true
func NoldTransformer ¶
func NoldTransformer() transform.Transformer
NoldTransformer returns a transformer that folds Unicode characters to lowercase.
func NoopTransformer ¶
func NoopTransformer() transform.Transformer
NoopTransformer returns a no-op transformer that passes strings through unchanged.
func NormalizeTransformer ¶
func NormalizeTransformer() transform.Transformer
NormalizeTransformer returns a transformer that applies Unicode NFD normalization and removes combining marks.
func NormalizedFoldTransformer ¶
func NormalizedFoldTransformer() transform.Transformer
NormalizedFoldTransformer returns a transformer that normalizes Unicode and folds to lowercase.
func RankMatch ¶
RankMatch is similar to Match except it will measure the Levenshtein distance between the source and the target and return its result. If there was no match, it will return -1. Given the requirements of match, RankMatch only needs to perform a subset of the Levenshtein calculation, only deletions need be considered, required additions and substitutions would fail the match test.
Example ¶
fmt.Print(RankMatch("twl", "cartwheel"))
Output: 6
func RankMatchFold ¶
RankMatchFold is a case-insensitive version of RankMatch.
func RankMatchNormalized ¶
RankMatchNormalized is a unicode-normalized version of RankMatch.
Example ¶
package main
import (
"fmt"
"github.com/FrogoAI/memory/fuzzysearch"
)
func main() {
distance := fuzzysearch.RankMatchNormalized("cafe", "café")
fmt.Println(distance)
}
Output: 0
func RankMatchNormalizedFold ¶
RankMatchNormalizedFold is a unicode-normalized and case-insensitive version of RankMatch.
func StopwordFilter ¶
func StopwordFilter(tokens []string, stopwords set.GenericDataSet[string]) []string
StopwordFilter filer of stopworld
func StringTransform ¶
func StringTransform(s string, t transform.Transformer) (transformed string)
StringTransform applies the given text transformer to s, returning the original on error.
Types ¶
type Rank ¶
type Rank struct {
// Source is used as the source for matching.
Source string
// Target is the word matched against.
Target string
// Distance is the Levenshtein distance between Source and Target.
Distance int
// Location of Target in original list
OriginalIndex int
}
Rank holds the result of a fuzzy match, including the Levenshtein distance and original index.
type Ranks ¶
type Ranks []Rank
Ranks is a sortable slice of Rank results ordered by Distance.
func RankFind ¶
RankFind is similar to Find, except it will also rank all matches using Levenshtein distance.
Example ¶
fmt.Printf("%+v", RankFind("whl", []string{"cartwheel", "foobar", "wheel", "baz"}))
Output: [{Source:whl Target:cartwheel Distance:6 OriginalIndex:0} {Source:whl Target:wheel Distance:2 OriginalIndex:2}]
func RankFindFold ¶
RankFindFold is a case-insensitive version of RankFind.
Example ¶
package main
import (
"fmt"
"github.com/FrogoAI/memory/fuzzysearch"
)
func main() {
ranks := fuzzysearch.RankFindFold("WHL", []string{"cartwheel", "foobar", "Wheel"})
for _, r := range ranks {
fmt.Printf("%s: %d\n", r.Target, r.Distance)
}
}
Output: cartwheel: 9 Wheel: 4
func RankFindNormalized ¶
RankFindNormalized is a unicode-normalized version of RankFind.
func RankFindNormalizedFold ¶
RankFindNormalizedFold is a unicode-normalized and case-insensitive version of RankFind.