lcs

package
v0.0.0-...-e2c53ed Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 26, 2024 License: Apache-2.0 Imports: 3 Imported by: 0

README

Package cloudeng.io/algo/lcs

import cloudeng.io/algo/lcs

Package lcs provides implementations of algorithms to find the longest common subsequence/shortest edit script (LCS/SES) between two slices suitable for use with unicode/utf8 and other alphabets.

Types

Type DP
type DP[T comparable] struct {
	// contains filtered or unexported fields
}

DP represents a dynamic programming based implementation for finding the longest common subsequence and shortest edit script (LCS/SES) for transforming A to B. See https://en.wikipedia.org/wiki/Longest_common_subsequence_problem. This implementation can return all LCS and SES rather than just the first one found. If a single LCS or SES is sufficient then the Myer's algorithm implementation is lilkey a better choice.

Functions
func NewDP[T comparable](a, b []T) *DP[T]

NewDP creates a new instance of DP. The implementation supports slices of bytes/uint8, rune/int32 and int64s.

Methods
func (dp *DP[T]) AllLCS() [][]T

AllLCS returns all of the the longest common subsquences.

func (dp *DP[T]) Fprint(out io.Writer)
func (dp *DP[T]) LCS() []T

LCS returns the longest common subsquence.

func (dp *DP[T]) SES() *EditScript[T]

SES returns the shortest edit script.

Type Edit
type Edit[T comparable] struct {
	Op   EditOp
	A, B int
	Val  T
}

Edit represents a single edit. For deletions, an edit specifies the index in the original (A) slice to be deleted. For insertions, an edit specifies the new value and the index in the original (A) slice that the new value is to be inserted at, but immediately after the existing value if that value was not deleted. Insertions also provide the index of the new value in the new (B) slice. A third operation is provided, that identifies identical values, ie. the members of the LCS and their position in the original and new slices. This allows for the LCS to retrieved from the SES.

An EditScript that can be trivially 'replayed' to create the new slice from the original one.

var b []uint8
 for _, action := range actions {
   switch action.Op {
   case Insert:
     b = append(b, action.Val.(int64))
   case Identical:
     b = append(b, a[action.A])
   }
 }
Type EditOp
type EditOp int

EditOp represents an edit operation.

Constants
Insert, Delete, Identical
Insert EditOp = iota
Delete
Identical

Values for EditOP.

Type EditScript
type EditScript[T comparable] []Edit[T]

EditScript represents a series of Edits.

Methods
func (es *EditScript[T]) Apply(a []T) []T

Apply transforms the original slice to the new slice by applying the SES.

func (es *EditScript[T]) FormatHorizontal(out io.Writer, a []T)

FormatVertical prints a representation of the edit script across three lines, with the top line showing the result of applying the edit, the middle line the operations applied and the bottom line any items deleted, eg:

 CB AB AC
-+|-||-|+
A  C  B
func (es *EditScript[T]) FormatVertical(out io.Writer, a []T)

FormatVertical prints a representation of the edit script with one item per line, eg:

  • 6864772235558415538 -8997218578518345818
  • -6615550055289275125
  • -7192184552745107772 5717881983045765875
func (es *EditScript[T]) Reverse() *EditScript[T]

Reverse returns a new edit script that is the inverse of the one supplied. That is, of the original script would transform A to B, then the results of this function will transform B to A.

func (es *EditScript[T]) String() string

String implements stringer.

Type Myers
type Myers[T comparable] struct {
	// contains filtered or unexported fields
}

Myers represents an implementation of Myer's longest common subsequence and shortest edit script algorithm as as documented in: An O(ND) Difference Algorithm and Its Variations, 1986.

Functions
func NewMyers[T comparable](a, b []T) *Myers[T]

NewMyers returns a new instance of Myers. The implementation supports slices of bytes/uint8, rune/int32 and int64s.

Methods
func (m *Myers[T]) LCS() []T

LCS returns the longest common subsquence.

func (m *Myers[T]) SES() *EditScript[T]

SES returns the shortest edit script.

Examples

ExampleDP
ExampleMyers
TODO
  • cnicolaou: improve DP implementation to use only one row+column to store lcs lengths rather than row * column.
  • cnicolaou: improve the Myers implementation as described in An O(NP) Sequence Comparison Algorithm, Wu, Manber, Myers.

Documentation

Overview

Package lcs provides implementations of algorithms to find the longest common subsequence/shortest edit script (LCS/SES) between two slices suitable for use with unicode/utf8 and other alphabets.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DP

type DP[T comparable] struct {
	// contains filtered or unexported fields
}

DP represents a dynamic programming based implementation for finding the longest common subsequence and shortest edit script (LCS/SES) for transforming A to B. See https://en.wikipedia.org/wiki/Longest_common_subsequence_problem. This implementation can return all LCS and SES rather than just the first one found. If a single LCS or SES is sufficient then the Myer's algorithm implementation is lilkey a better choice.

Example
package main

import (
	"fmt"
	"unicode/utf8"

	"cloudeng.io/algo/codec"
	"cloudeng.io/algo/lcs"
)

func main() {
	runeDecoder := codec.NewDecoder(utf8.DecodeRune)
	a, b := runeDecoder.Decode([]byte("AGCAT")), runeDecoder.Decode([]byte("GAC"))
	all := lcs.NewDP(a, b).AllLCS()
	for _, lcs := range all {
		fmt.Printf("%s\n", string(lcs))
	}
}
Output:

GA
GA
GC
AC

func NewDP

func NewDP[T comparable](a, b []T) *DP[T]

NewDP creates a new instance of DP. The implementation supports slices of bytes/uint8, rune/int32 and int64s.

func (*DP[T]) AllLCS

func (dp *DP[T]) AllLCS() [][]T

AllLCS returns all of the the longest common subsquences.

func (*DP[T]) Fprint

func (dp *DP[T]) Fprint(out io.Writer)

func (*DP[T]) LCS

func (dp *DP[T]) LCS() []T

LCS returns the longest common subsquence.

func (*DP[T]) SES

func (dp *DP[T]) SES() *EditScript[T]

SES returns the shortest edit script.

type Edit

type Edit[T comparable] struct {
	Op   EditOp
	A, B int
	Val  T
}

Edit represents a single edit. For deletions, an edit specifies the index in the original (A) slice to be deleted. For insertions, an edit specifies the new value and the index in the original (A) slice that the new value is to be inserted at, but immediately after the existing value if that value was not deleted. Insertions also provide the index of the new value in the new (B) slice. A third operation is provided, that identifies identical values, ie. the members of the LCS and their position in the original and new slices. This allows for the LCS to retrieved from the SES.

An EditScript that can be trivially 'replayed' to create the new slice from the original one.

var b []uint8
 for _, action := range actions {
   switch action.Op {
   case Insert:
     b = append(b, action.Val.(int64))
   case Identical:
     b = append(b, a[action.A])
   }
 }

type EditOp

type EditOp int

EditOp represents an edit operation.

const (
	Insert EditOp = iota
	Delete
	Identical
)

Values for EditOP.

type EditScript

type EditScript[T comparable] []Edit[T]

EditScript represents a series of Edits.

func (*EditScript[T]) Apply

func (es *EditScript[T]) Apply(a []T) []T

Apply transforms the original slice to the new slice by applying the SES.

func (*EditScript[T]) FormatHorizontal

func (es *EditScript[T]) FormatHorizontal(out io.Writer, a []T)

FormatVertical prints a representation of the edit script across three lines, with the top line showing the result of applying the edit, the middle line the operations applied and the bottom line any items deleted, eg:

 CB AB AC
-+|-||-|+
A  C  B

func (*EditScript[T]) FormatVertical

func (es *EditScript[T]) FormatVertical(out io.Writer, a []T)

FormatVertical prints a representation of the edit script with one item per line, eg:

  • 6864772235558415538 -8997218578518345818
  • -6615550055289275125
  • -7192184552745107772 5717881983045765875

func (*EditScript[T]) Reverse

func (es *EditScript[T]) Reverse() *EditScript[T]

Reverse returns a new edit script that is the inverse of the one supplied. That is, of the original script would transform A to B, then the results of this function will transform B to A.

func (*EditScript[T]) String

func (es *EditScript[T]) String() string

String implements stringer.

type Myers

type Myers[T comparable] struct {
	// contains filtered or unexported fields
}

Myers represents an implementation of Myer's longest common subsequence and shortest edit script algorithm as as documented in: An O(ND) Difference Algorithm and Its Variations, 1986.

Example
package main

import (
	"fmt"
	"unicode/utf8"

	"cloudeng.io/algo/codec"
	"cloudeng.io/algo/lcs"
)

func main() {
	runeDecoder := codec.NewDecoder(utf8.DecodeRune)
	a, b := runeDecoder.Decode([]byte("ABCABBA")), runeDecoder.Decode([]byte("CBABAC"))
	fmt.Printf("%v\n", string(lcs.NewMyers(a, b).LCS()))
}
Output:

BABA

func NewMyers

func NewMyers[T comparable](a, b []T) *Myers[T]

NewMyers returns a new instance of Myers. The implementation supports slices of bytes/uint8, rune/int32 and int64s.

func (*Myers[T]) LCS

func (m *Myers[T]) LCS() []T

LCS returns the longest common subsquence.

func (*Myers[T]) SES

func (m *Myers[T]) SES() *EditScript[T]

SES returns the shortest edit script.

Directories

Path Synopsis
Package textdiff providers support for diff'ing text.
Package textdiff providers support for diff'ing text.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL