levenshtein

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 27, 2025 License: MIT Imports: 3 Imported by: 0

README

levenshtein

Go Report Card Go Reference Tests coverage

Small, idiomatic Go implementation of the Levenshtein edit distance algorithm. This package provides utilities to compute distances, ratios, edit matrices and edit scripts for sequences of runes (Unicode code points).

It is based on the original texttheater/golang-levenshtein implementation and provides a compact API that is convenient for Go projects that need fuzzy string comparisons, similarity scoring, or to produce edit scripts.

Installation

Use Go modules (recommended):

    go get github.com/TomTonic/levenshtein

Then import:

    import "github.com/TomTonic/levenshtein"

Quick API overview

Key functions and types you'll use:

  • DistanceForStrings(source []rune, target []rune, op Options) int — computes the Levenshtein distance between two rune slices.
  • MatrixForStrings(source []rune, target []rune, op Options) [][]int — returns the full DP matrix (useful for backtraces or visualization).
  • EditScriptForStrings(source []rune, target []rune, op Options) EditScript — returns an optimal edit script (sequence of ops: Ins, Del, Sub, Match).
  • RatioForStrings(source []rune, target []rune, op Options) float64 — returns a normalized similarity ratio.
  • DefaultOptions and DefaultOptionsWithSub — convenience Options values.

Why []rune? (the important hint)

Go strings are byte sequences and indexing a string yields a byte. For Unicode-aware string comparison you should convert to []rune (which are Unicode code points) before calling this package. That ensures multi-byte characters (e.g. emoji, accented letters) are treated as single elements.

Example: source := "café" and target := "cafe", if you call the library with []rune(source) and []rune(target) you will get correct behavior for Unicode text.

Usage examples

Simple distance:
package main

import (
        "fmt"
        "github.com/TomTonic/levenshtein"
)

func main() {
        a := "kitten"
        b := "sitting"
        d := levenshtein.DistanceForStrings([]rune(a), []rune(b), levenshtein.DefaultOptions)
        fmt.Println(d) // prints 5 with DefaultOptions (no substitution)
}
Using substitution (counts substitution as 1 instead of 2):
d := levenshtein.DistanceForStrings([]rune("kitten"), []rune("sitting"), levenshtein.DefaultOptionsWithSub)
// d == 3
Get the edit script:
script := levenshtein.EditScriptForStrings([]rune("a"), []rune("aa"), levenshtein.DefaultOptions)
// script will contain EditOperation values such as Match, Ins, Del, Sub
Visualize the matrix:
matrix := levenshtein.MatrixForStrings([]rune("neighbor"), []rune("Neighbour"), levenshtein.DefaultOptions)
levenshtein.WriteMatrix([]rune("neighbor"), []rune("Neighbour"), matrix, os.Stdout)

Options and customization

The Options struct lets you tweak insertion, deletion and substitution costs and also provide a custom Matches function. This is handy for case-insensitive comparisons or language-specific matching.

Example (case-insensitive matching):

import "unicode"

caseInsensitive := levenshtein.Options{
        InsCost: 1,
        DelCost: 1,
        SubCost: 1,
        Matches: func(a, b rune) bool {
                return unicode.ToLower(a) == unicode.ToLower(b)
        },
}

dist := levenshtein.DistanceForStrings([]rune("Apple"), []rune("apple"), caseInsensitive)
// dist == 0

Quality Management

Automatic Quality Checks

This repository aims to be small, well-tested and CI-driven. Practices for maintaining quality:

  • Tests: keep the existing unit tests in levenshtein_test.go. Run them with go test ./... and check coverage with go test ./... -cover.

  • Linting: run go vet and a linter such as golangci-lint to catch suspicious code patterns. Example:

      golangci-lint run --enable=govet --enable=staticcheck
    
  • Formatting: enforce gofmt (or go fmt) before commits.

Release management

For small Go libraries the standard approach is Semantic Versioning using git tags and GitHub Releases.

Problem Reports

Contributors are encouraged to include a one-line summary in the PRs header and steps-to-reproduce (test cases) in the body.

Contributing

Contributions are welcome. Please use GitHub functionality to propose changes.

License

This project is released under the terms found in the LICENSE file.

Acknowledgements

This implementation is adapted from the original texttheater/golang-levenshtein project.

Contact / Support

If you find issues, please open an issue or a PR on GitHub. For general help or usage questions, include the inputs you used and expected output.

Documentation

Overview

Package levenshtein implements the Levenshtein edit distance algorithm for sequences of runes (Unicode code points). The package focuses on clarity and small API surface: it provides functions to compute an edit distance, the underlying dynamic programming matrix, a normalized similarity ratio, and an edit script (sequence of operations) to transform one sequence into another.

Important: callers should convert Go strings to []rune before calling the public functions in this package. Go strings are byte sequences and indexing them yields bytes; converting to []rune ensures Unicode code points are compared as single elements (see README for examples).

The main functions are:

  • MatrixForStrings: build the full DP matrix (useful for backtraces).
  • DistanceForStrings: compute the edit distance using an optimized two-row approach (lower memory).
  • EditScriptForStrings: backtrace an optimal edit script given the input sequences and options.

The Options struct lets you customize insertion, deletion and substitution costs and supply a custom Match function (for case-insensitive comparisons or language-specific matching).

Index

Examples

Constants

View Source
const (
	Ins = iota
	Del
	Sub
	Match
)

Variables

This section is empty.

Functions

func DistanceForMatrix

func DistanceForMatrix(matrix [][]int) int

DistanceForMatrix extracts the edit distance from a Levenshtein matrix produced by MatrixForStrings. The matrix must be a (height x width) table where height == len(source)+1 and width == len(target)+1 when built by MatrixForStrings.

Return value: bottom-right cell containing the minimal edit cost.

func DistanceForStrings

func DistanceForStrings(source []rune, target []rune, op Options) int

DistanceForStrings returns the minimal edit distance between the provided source and target rune slices using the costs defined in op.

Parameters:

  • source: source sequence as []rune. Length >= 0.
  • target: target sequence as []rune. Length >= 0.
  • op: Options controlling costs and matching function (see Options documentation). Costs should be non-negative integers.

Return value:

  • an int representing the minimal total cost to transform source into target using insertions, deletions and (optionally) substitutions.

Complexity: time O(len(source)*len(target)), memory O(len(target)). For large inputs consider using MatrixForStrings when you need the full matrix (EditScriptForStrings calls MatrixForStrings internally).

Example:

distance := DistanceForStrings([]rune("kitten"), []rune("sitting"), DefaultOptions)

Notes:

  • The function expects rune slices rather than raw strings so that Unicode code points are handled correctly.
Example
source := "a"
target := "aa"
distance := DistanceForStrings([]rune(source), []rune(target), DefaultOptions)
fmt.Printf(`Distance between "%s" and "%s" computed as %d`, source, target, distance)
Output:
Distance between "a" and "aa" computed as 1

func IdenticalRunes

func IdenticalRunes(a rune, b rune) bool

IdenticalRunes is the default MatchFunction: it checks whether two runes are identical (==). Use this for strict, case-sensitive Unicode-aware comparisons.

func LogMatrix

func LogMatrix(source []rune, target []rune, matrix [][]int)

LogMatrix writes a visual representation of the given matrix to os.Stderr. It is a convenience wrapper around WriteMatrix and is deprecated in favor of using WriteMatrix with an explicit writer.

func MatrixForStrings

func MatrixForStrings(source []rune, target []rune, op Options) [][]int

MatrixForStrings builds and returns the full Levenshtein dynamic programming matrix for the given source and target rune slices and Options. The returned matrix has dimensions (len(source)+1) x (len(target)+1). Rows correspond to prefixes of source and columns to prefixes of target; the bottom-right cell contains the minimal edit cost.

Use this function when you need the full matrix for backtracing an edit script, visualizing the computation, or computing additional properties. If you only need the scalar distance, prefer DistanceForStrings which uses O(len(target)) memory.

func RatioForMatrix

func RatioForMatrix(matrix [][]int) float64

RatioForMatrix computes the normalized similarity ratio from a DP matrix produced by MatrixForStrings. The ratio is in [0,1]. If the combined length of source and target is zero the function returns 0 to avoid division by zero.

func RatioForStrings

func RatioForStrings(source []rune, target []rune, op Options) float64

RatioForStrings returns a normalized similarity score in the range [0,1] computed from the Levenshtein distance. The formula used is:

(sourceLength + targetLength - distance) / (sourceLength + targetLength)

where sourceLength == len(source) and targetLength == len(target).

Notes:

  • If both source and target are empty (sum == 0) the function returns 0. This mirrors the behavior in RatioForMatrix and avoids division by zero.
  • Higher values indicate more similar sequences; 1.0 means identical and 0.0 means no similarity under the chosen cost model.

func WriteMatrix

func WriteMatrix(source []rune, target []rune, matrix [][]int, writer io.Writer)

WriteMatrix prints a human-readable representation of the DP matrix to the supplied writer. This can be used for debugging and demonstration. The matrix must correspond to the provided source and target rune slices (i.e. have dimensions (len(source)+1) x (len(target)+1)).

Example
source := []rune("neighbor")
target := []rune("Neighbour")
matrix := MatrixForStrings(source, target, DefaultOptions)
WriteMatrix(source, target, matrix, os.Stdout)
Output:
      N  e  i  g  h  b  o  u  r
   0  1  2  3  4  5  6  7  8  9
n  1  2  3  4  5  6  7  8  9 10
e  2  3  2  3  4  5  6  7  8  9
i  3  4  3  2  3  4  5  6  7  8
g  4  5  4  3  2  3  4  5  6  7
h  5  6  5  4  3  2  3  4  5  6
b  6  7  6  5  4  3  2  3  4  5
o  7  8  7  6  5  4  3  2  3  4
r  8  9  8  7  6  5  4  3  4  3

Types

type EditOperation

type EditOperation int

EditOperation describes a single edit operation in an edit script produced by EditScriptForStrings or EditScriptForMatrix.

Possible values:

  • Ins: insertion (element present in target but not source)
  • Del: deletion (element present in source but not target)
  • Sub: substitution (source element replaced by target element)
  • Match: elements match (no cost if Matches returns true)

func (EditOperation) String

func (operation EditOperation) String() string

String prints a concise human-readable representation for an EditOperation. Useful for debugging and tests.

type EditScript

type EditScript []EditOperation

EditScript is a sequence of EditOperation values describing an edit procedure to transform a source sequence into a target sequence. The operations are ordered from left-to-right (source/target prefixes).

func EditScriptForMatrix

func EditScriptForMatrix(matrix [][]int, op Options) EditScript

EditScriptForMatrix computes an optimal edit script using an already computed Levenshtein matrix. The matrix must be produced by MatrixForStrings and have dimensions (len(source)+1) x (len(target)+1).

func EditScriptForStrings

func EditScriptForStrings(source []rune, target []rune, op Options) EditScript

EditScriptForStrings returns an optimal EditScript (sequence of EditOperation values) that transforms source into target using the costs defined in op. The function computes the full matrix and then backtraces an optimal path.

Example:

script := EditScriptForStrings([]rune("a"), []rune("aa"), DefaultOptions)
// script is something like {Match, Ins}

type MatchFunction

type MatchFunction func(rune, rune) bool

MatchFunction is used to determine whether two runes are considered a match. The function should be symmetric and must not have side-effects. It must return true if the two runes are considered equal for matching purposes (for example, case-insensitive matching can be implemented by comparing unicode.ToLower(a) == unicode.ToLower(b)).

type Options

type Options struct {
	InsCost int
	DelCost int
	SubCost int
	Matches MatchFunction
}

Options controls the costs associated with edit operations and the matching function used to decide whether two runes count as equal.

Fields:

  • InsCost: cost of inserting a rune from the target into the source.
  • DelCost: cost of deleting a rune from the source.
  • SubCost: cost of substituting (replacing) one rune with another. If SubCost is greater than InsCost + DelCost the algorithm will prefer a delete+insert instead of a substitution.
  • Matches: a MatchFunction that returns true when two runes are considered equal (no substitution cost).

Contracts / boundaries: costs are treated as ints and the algorithm assumes non-negative costs. Negative costs will produce undefined or nonsensical results and should be avoided. There is no explicit overflow checking on very large costs; prefer small positive integers (1, 2, ...).

var DefaultOptions Options = Options{
	InsCost: 1,
	DelCost: 1,
	SubCost: 2,
	Matches: IdenticalRunes,
}

DefaultOptions is a sensible default Options value where substitution is effectively disabled (SubCost == 2 while InsCost + DelCost == 2) so that a substitution is implemented as delete+insert. Use this when you prefer substitutions to be treated as two operations.

var DefaultOptionsWithSub Options = Options{
	InsCost: 1,
	DelCost: 1,
	SubCost: 1,
	Matches: IdenticalRunes,
}

DefaultOptionsWithSub is similar to DefaultOptions but gives substitution a cost of 1. With these options a substitution is cheaper than delete+insert and will often be chosen if two runes differ.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL