Documentation
¶
Overview ¶
package fuzzy contains interfaces and code to facilitate fuzzy-matching of name-strings to scientific names collected in gnames database.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func EditDistance ¶
EditDistance calculates edit distance (**ed**) according to Levenshtein algorithm. It also runs additional checks and if they fail, returns -1.
Checks: - result should not exceed maxEditDistance - number of characters divided by ed should be bigger than charsPerED
It assumes that checks have to be applied only to the second string:
EditDistance("Pomatomus", "Pom atomus")
returns -1
EditDistance("Pom atomus", "Pomatomus")
returns 1
It also assumes that number os spaces between words was already normalized to 1 space, and that s1 and s2 always have the same number of words.
Types ¶
type FuzzyMatcher ¶
type FuzzyMatcher interface {
// Initialize data for the matcher.
Init()
// MatchStem takes a stemmed scientific name and max edit distance.
// The search stops if current edit distance becomes bigger than edit
// distance. The method returns 0 or more stems that did match the
// input stem within the edit distance constraint.
MatchStem(stem string) []string
// MatchStemExact takes a stem and returns true if the is the exact
// match of the stem is found.
MatchStemExact(stem string) bool
// StemToCanonicals takes a stem and returns back canonicals
// that correspond to that stem.
StemToMatchItems(stem string) []mlib.MatchItem
}
FuzzyMatcher describes methods needed for fuzzy matching.