levenshtein

package module
v0.0.0-...-cae8b0e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 5, 2020 License: MIT Imports: 3 Imported by: 114

Documentation

Overview

This package implements the Levenshtein algorithm for computing the similarity between two strings. The central function is MatrixForStrings, which computes the Levenshtein matrix. The functions DistanceForMatrix, EditScriptForMatrix and RatioForMatrix read various interesting properties off the matrix. The package also provides the convenience functions DistanceForStrings, EditScriptForStrings and RatioForStrings for going directly from two strings to the property of interest.

Index

Examples

Constants

View Source
const (
	Ins = iota
	Del
	Sub
	Match
)

Variables

This section is empty.

Functions

func DistanceForMatrix

func DistanceForMatrix(matrix [][]int) int

DistanceForMatrix reads the edit distance off the given Levenshtein matrix.

func DistanceForStrings

func DistanceForStrings(source []rune, target []rune, op Options) int

DistanceForStrings returns the edit distance between source and target.

It has a runtime proportional to len(source) * len(target) and memory use proportional to len(target).

Example
source := "a"
target := "aa"
distance := DistanceForStrings([]rune(source), []rune(target), DefaultOptions)
fmt.Printf(`Distance between "%s" and "%s" computed as %d`, source, target, distance)
Output:

Distance between "a" and "aa" computed as 1

func IdenticalRunes

func IdenticalRunes(a rune, b rune) bool

IdenticalRunes is the default MatchFunction: it checks whether two runes are identical.

func LogMatrix

func LogMatrix(source []rune, target []rune, matrix [][]int)

LogMatrix writes a visual representation of the given matrix for the given strings to os.Stderr. This function is deprecated, use WriteMatrix(source, target, matrix, os.Stderr) instead.

func MatrixForStrings

func MatrixForStrings(source []rune, target []rune, op Options) [][]int

MatrixForStrings generates a 2-D array representing the dynamic programming table used by the Levenshtein algorithm, as described e.g. here: http://www.let.rug.nl/kleiweg/lev/ The reason for putting the creation of the table into a separate function is that it cannot only be used for reading of the edit distance between two strings, but also e.g. to backtrace an edit script that provides an alignment between the characters of both strings.

func RatioForMatrix

func RatioForMatrix(matrix [][]int) float64

RatioForMatrix returns the Levenshtein ratio for the given matrix. The ratio is computed as follows:

(sourceLength + targetLength - distance) / (sourceLength + targetLength)

func RatioForStrings

func RatioForStrings(source []rune, target []rune, op Options) float64

RatioForStrings returns the Levenshtein ratio for the given strings. The ratio is computed as follows:

(sourceLength + targetLength - distance) / (sourceLength + targetLength)

func WriteMatrix

func WriteMatrix(source []rune, target []rune, matrix [][]int, writer io.Writer)

WriteMatrix writes a visual representation of the given matrix for the given strings to the given writer.

Example
source := []rune("neighbor")
target := []rune("Neighbour")
matrix := MatrixForStrings(source, target, DefaultOptions)
WriteMatrix(source, target, matrix, os.Stdout)
Output:

      N  e  i  g  h  b  o  u  r
   0  1  2  3  4  5  6  7  8  9
n  1  2  3  4  5  6  7  8  9 10
e  2  3  2  3  4  5  6  7  8  9
i  3  4  3  2  3  4  5  6  7  8
g  4  5  4  3  2  3  4  5  6  7
h  5  6  5  4  3  2  3  4  5  6
b  6  7  6  5  4  3  2  3  4  5
o  7  8  7  6  5  4  3  2  3  4
r  8  9  8  7  6  5  4  3  4  3

Types

type EditOperation

type EditOperation int

func (EditOperation) String

func (operation EditOperation) String() string

type EditScript

type EditScript []EditOperation

func EditScriptForMatrix

func EditScriptForMatrix(matrix [][]int, op Options) EditScript

EditScriptForMatrix returns an optimal edit script based on the given Levenshtein matrix.

func EditScriptForStrings

func EditScriptForStrings(source []rune, target []rune, op Options) EditScript

EditScriptForStrings returns an optimal edit script to turn source into target.

type MatchFunction

type MatchFunction func(rune, rune) bool

type Options

type Options struct {
	InsCost int
	DelCost int
	SubCost int
	Matches MatchFunction
}
var DefaultOptions Options = Options{
	InsCost: 1,
	DelCost: 1,
	SubCost: 2,
	Matches: IdenticalRunes,
}

DefaultOptions is the default options without substitution: insertion cost is 1, deletion cost is 1, substitution cost is 2 (meaning insert and delete will be used instead), and two runes match iff they are identical.

var DefaultOptionsWithSub Options = Options{
	InsCost: 1,
	DelCost: 1,
	SubCost: 1,
	Matches: IdenticalRunes,
}

DefaultOptionsWithSub is the default options with substitution: insertion cost is 1, deletion cost is 1, substitution cost is 1, and two runes match iff they are identical.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL