edlib

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 7, 2020 License: MIT Imports: 2 Imported by: 63

README

Go-edlib : Edit distance and string comparison library

Travis CI Test coverage Go Report Card Documentation link

Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, etc...


Table of Contents


Requirements

  • Go (v1.13+)

Introduction

Golang open-source library which includes most (and soon all) edit-distance and string comparision algorithms with some extra!
Designed to be fully compatible with Unicode and ASCII characters!
This library is 100% test covered 😁

Features

  • Levenshtein

  • LCS (Longest common subsequence) with edit distance, backtrack and diff functions ✨

  • Hamming

  • Damerau-Levenshtein, with following variants :

    • OSA (Optimal string alignment) ✨
    • Adjacent transpositions ✨
  • Jaro & Jaro-Winkler similarity algorithms ✨

  • Computed similarity percentage functions based on all available edit distance algorithms in this lib ✨

  • ASCII and Unicode compatibility ! 🥳

Installation

Open bash into you project folder and run :

go get github.com/hbollon/go-edlib

And import it into your project.

Run tests

If you want to run all units tests just run :

go test ./... -coverpkg=./... # Add desired parameters to this command if you want

Documentation

You can find all the documentation here : Documentation

Author

👤 Hugo Bollon

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐️ if this project helped you!

📝 License

Copyright © 2020 Hugo Bollon.
This project is MIT License licensed.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DamerauLevenshteinDistance

func DamerauLevenshteinDistance(str1, str2 string) int

DamerauLevenshteinDistance calculate the distance between two string This algorithm computes the true Damerau–Levenshtein distance with adjacent transpositions Allowing insertions, deletions, substitutions and transpositions to change one string to the second Compatible with non-ASCII characters

func HammingDistance

func HammingDistance(str1, str2 string) (int, error)

HammingDistance calculate the edit distance between two given strings using only substitutions Return edit distance integer and an error

func JaroSimilarity added in v1.1.0

func JaroSimilarity(str1, str2 string) float32

JaroSimilarity return a similarity index (between 0 and 1) It use Jaro distance algorithm and allow only transposition operation

func JaroWinklerSimilarity added in v1.1.0

func JaroWinklerSimilarity(str1, str2 string) float32

JaroWinklerSimilarity return a similarity index (between 0 and 1) Use Jaro similarity and after look for a common prefix (length <= 4)

func LCS

func LCS(str1, str2 string) int

LCS takes two strings and compute their LCS(Longuest Subsequence Problem)

func LCSBacktrack added in v1.1.0

func LCSBacktrack(str1, str2 string) (string, error)

LCSBacktrack returns all choices taken during LCS process

func LCSBacktrackAll added in v1.1.0

func LCSBacktrackAll(str1, str2 string) ([]string, error)

LCSBacktrackAll returns an array containing all common substrings between str1 and str2

func LCSDiff added in v1.1.0

func LCSDiff(str1, str2 string) ([]string, error)

LCSDiff will backtrack through the lcs matrix and return the diff between the two sequences

func LCSEditDistance

func LCSEditDistance(str1, str2 string) int

LCSEditDistance determines the edit distance between two strings using LCS function (allow only insert and delete operations)

func LevenshteinDistance

func LevenshteinDistance(str1, str2 string) int

LevenshteinDistance calculate the distance between two string This algorithm allow insertions, deletions and substitutions to change one string to the second Compatible with non-ASCII characters

func OSADamerauLevenshteinDistance

func OSADamerauLevenshteinDistance(str1, str2 string) int

OSADamerauLevenshteinDistance calculate the distance between two string Optimal string alignment distance variant that use extension of the Wagner-Fisher dynamic programming algorithm Doesn't allow multiple transformations on a same substring Allowing insertions, deletions, substitutions and transpositions to change one string to the second Compatible with non-ASCII characters

func StringsSimilarity added in v1.1.0

func StringsSimilarity(str1 string, str2 string, algo AlgorithMethod) (float32, error)

StringsSimilarity return a similarity index [0..1] between two strings based on given edit distance algorithm in parameter. Use defined AlgorithmMethod type.

Types

type AlgorithMethod added in v1.1.0

type AlgorithMethod uint8

AlgorithMethod is an Integer type used to identify edit distance algorithms

const (
	Levenshtein           AlgorithMethod = iota
	DamerauLevenshtein    AlgorithMethod = iota
	OSADamerauLevenshtein AlgorithMethod = iota
	Lcs                   AlgorithMethod = iota
	Hamming               AlgorithMethod = iota
	Jaro                  AlgorithMethod = iota
	JaroWinkler           AlgorithMethod = iota
)

type StringHashMap added in v1.1.0

type StringHashMap map[string]struct{}

StringHashMap is HashMap substitue for string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL