fuzzy

package
v0.7.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2022 License: MIT Imports: 3 Imported by: 0

Documentation

Overview

package fuzzy contains interfaces and code to facilitate fuzzy-matching of name-strings to scientific names collected in gnames database.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EditDistance

func EditDistance(s1, s2 string) int

EditDistance calculates edit distance (**ed**) according to Levenshtein algorithm. It also runs additional checks and if they fail, returns -1.

Checks: - result should not exceed maxEditDistance - number of characters divided by ed should be bigger than charsPerED

It assumes that checks have to be applied only to the second string:

EditDistance("Pomatomus", "Pom atomus")

returns -1

EditDistance("Pom atomus", "Pomatomus")

returns 1

It also assumes that number os spaces between words was already normalized to 1 space, and that s1 and s2 always have the same number of words.

Types

type FuzzyMatcher

type FuzzyMatcher interface {
	// Initialize data for the matcher.
	Init()

	// MatchStem takes a stemmed scientific name and max edit distance.
	// The search stops if current edit distance becomes bigger than edit
	// distance. The method returns 0 or more stems that did match the
	// input stem within the edit distance constraint.
	MatchStem(stem string) []string

	// MatchStemExact takes a stem and returns true if the is the exact
	// match of the stem is found.
	MatchStemExact(stem string) bool

	// StemToCanonicals takes a stem and returns back canonicals
	// that correspond to that stem.
	StemToMatchItems(stem string) []mlib.MatchItem
}

FuzzyMatcher describes methods needed for fuzzy matching.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL