Documentation ¶
Overview ¶
This package implements the Levenshtein algorithm for computing the similarity between two strings. The central function is MatrixForStrings, which computes the Levenshtein matrix. The functions DistanceForMatrix, EditScriptForMatrix and RatioForMatrix read various interesting properties off the matrix. The package also provides the convenience functions DistanceForStrings, EditScriptForStrings and RatioForStrings for going directly from two strings to the property of interest.
Index ¶
- Constants
- func DistanceForMatrix(matrix [][]int) int
- func DistanceForStrings(source []rune, target []rune, op Options) int
- func IdenticalRunes(a rune, b rune) bool
- func LogMatrix(source []rune, target []rune, matrix [][]int)
- func MatrixForStrings(source []rune, target []rune, op Options) [][]int
- func RatioForMatrix(matrix [][]int) float64
- func RatioForStrings(source []rune, target []rune, op Options) float64
- func WriteMatrix(source []rune, target []rune, matrix [][]int, writer io.Writer)
- type EditOperation
- type EditScript
- type MatchFunction
- type Options
Examples ¶
Constants ¶
const ( Ins = iota Del Sub Match )
Variables ¶
This section is empty.
Functions ¶
func DistanceForMatrix ¶
DistanceForMatrix reads the edit distance off the given Levenshtein matrix.
func DistanceForStrings ¶
DistanceForStrings returns the edit distance between source and target.
It has a runtime proportional to len(source) * len(target) and memory use proportional to len(target).
Example ¶
source := "a" target := "aa" distance := DistanceForStrings([]rune(source), []rune(target), DefaultOptions) fmt.Printf(`Distance between "%s" and "%s" computed as %d`, source, target, distance)
Output: Distance between "a" and "aa" computed as 1
func IdenticalRunes ¶
IdenticalRunes is the default MatchFunction: it checks whether two runes are identical.
func LogMatrix ¶
LogMatrix writes a visual representation of the given matrix for the given strings to os.Stderr. This function is deprecated, use WriteMatrix(source, target, matrix, os.Stderr) instead.
func MatrixForStrings ¶
MatrixForStrings generates a 2-D array representing the dynamic programming table used by the Levenshtein algorithm, as described e.g. here: http://www.let.rug.nl/kleiweg/lev/ The reason for putting the creation of the table into a separate function is that it cannot only be used for reading of the edit distance between two strings, but also e.g. to backtrace an edit script that provides an alignment between the characters of both strings.
func RatioForMatrix ¶
RatioForMatrix returns the Levenshtein ratio for the given matrix. The ratio is computed as follows:
(sourceLength + targetLength - distance) / (sourceLength + targetLength)
func RatioForStrings ¶
RatioForStrings returns the Levenshtein ratio for the given strings. The ratio is computed as follows:
(sourceLength + targetLength - distance) / (sourceLength + targetLength)
func WriteMatrix ¶
WriteMatrix writes a visual representation of the given matrix for the given strings to the given writer.
Example ¶
source := []rune("neighbor") target := []rune("Neighbour") matrix := MatrixForStrings(source, target, DefaultOptions) WriteMatrix(source, target, matrix, os.Stdout)
Output: N e i g h b o u r 0 1 2 3 4 5 6 7 8 9 n 1 2 3 4 5 6 7 8 9 10 e 2 3 2 3 4 5 6 7 8 9 i 3 4 3 2 3 4 5 6 7 8 g 4 5 4 3 2 3 4 5 6 7 h 5 6 5 4 3 2 3 4 5 6 b 6 7 6 5 4 3 2 3 4 5 o 7 8 7 6 5 4 3 2 3 4 r 8 9 8 7 6 5 4 3 4 3
Types ¶
type EditOperation ¶
type EditOperation int
func (EditOperation) String ¶
func (operation EditOperation) String() string
type EditScript ¶
type EditScript []EditOperation
func EditScriptForMatrix ¶
func EditScriptForMatrix(matrix [][]int, op Options) EditScript
EditScriptForMatrix returns an optimal edit script based on the given Levenshtein matrix.
func EditScriptForStrings ¶
func EditScriptForStrings(source []rune, target []rune, op Options) EditScript
EditScriptForStrings returns an optimal edit script to turn source into target.
type MatchFunction ¶
type Options ¶
type Options struct { InsCost int DelCost int SubCost int Matches MatchFunction }
var DefaultOptions Options = Options{ InsCost: 1, DelCost: 1, SubCost: 2, Matches: IdenticalRunes, }
DefaultOptions is the default options without substitution: insertion cost is 1, deletion cost is 1, substitution cost is 2 (meaning insert and delete will be used instead), and two runes match iff they are identical.
var DefaultOptionsWithSub Options = Options{ InsCost: 1, DelCost: 1, SubCost: 1, Matches: IdenticalRunes, }
DefaultOptionsWithSub is the default options with substitution: insertion cost is 1, deletion cost is 1, substitution cost is 1, and two runes match iff they are identical.