similarity

package
v0.0.0-...-e098d3e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 4, 2021 License: BSD-3-Clause Imports: 1 Imported by: 0

Documentation

Overview

Package similarity provides diff-like implementation to determine how similar two byte streams are.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func LCS

func LCS(a, b []byte) (max, indexA, indexB int)

LCS is an implementation of longest common subsequence problem1 optimized for space.

Most LCS implementations are optimized for time, do a lot of allocations to memoize, making them unsuitable for larger inputs.

Since in our use case we only need the length of the common chunk, we can avoid most of the allocations.

In worst case scenario (there's almost nothing in common between a and b) the time complexity is O(N^2). In best case scenario (a == b) the time complexity is O(N).

func MaxSimilarity

func MaxSimilarity(a, b []byte) float64

MaxSimilarity returns the larger number between Similarity(a, b) / len(a) and Similarity(a, b) / len(b).

1 means either they are identical, or one is superset of the other. (for example, a = "abcdef" and b = "abcfoodef")

func MinSimilarity

func MinSimilarity(a, b []byte) float64

MinSimilarity returns the smaller number between Similarity(a, b) / len(a) and Similarity(a, b) / len(b).

1 means they are identical, 0 means they have nothing in common.

func Similarity

func Similarity(a, b []byte) int

Similarity returns how similar a and b are.

The return value is the total length of the chunks a and b have in common. For example, when a is "abcdef" and b is "abcfoodef", they have 2 chunks in common: "abc" and "def", thus 6 is returned.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL