Documentation ¶
Overview ¶
package fuzzy contains interfaces and code to facilitate fuzzy-matching of name-strings to scientific names collected in gnames database.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func EditDistance ¶
EditDistance calculates edit distance (**ed**) according to Levenshtein algorithm. It also runs additional checks and if they fail, returns -1.
Checks: - result should not exceed maxEditDistance - number of characters divided by ed should be bigger than charsPerED
It assumes that checks have to be applied only to the second string:
EditDistance("Pomatomus", "Pom atomus")
returns -1
EditDistance("Pom atomus", "Pomatomus")
returns 1
It also assumes that number os spaces between words was already normalized to 1 space, and that s1 and s2 always have the same number of words.
Types ¶
type FuzzyMatcher ¶
type FuzzyMatcher interface { // Initialize data for the matcher. Init() // MatchStem takes a stemmed scientific name and max edit distance. // The search stops if current edit distance becomes bigger than edit // distance. The method returns 0 or more stems that did match the // input stem within the edit distance constraint. MatchStem(stem string) []string // MatchStemExact takes a stem and returns true if the is the exact // match of the stem is found. MatchStemExact(stem string) bool // StemToCanonicals takes a stem and returns back canonicals // that correspond to that stem. StemToMatchItems(stem string) []mlib.MatchItem }
FuzzyMatcher describes methods needed for fuzzy matching.