Redirected from github.com/merovius/diff.

diff

package module

v1.0.1 Latest Latest Go to latest Published: Oct 7, 2020 License: Apache-2.0 Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Merovius/diff

README ¶

Documentation

Simple, yet flexible diffing package for Go

This is an implementation of the algorithm from "An Algorithm for Differential File Comparison" by Hunt and McIlroy. I wrote it because I needed something more flexible than the common line-based diff and it seemed to cumbersome to do it as a transformation on the input (and the reverse on the output). As such, I attempted to make it flexible enough for whatever semantics you might need. It's intended to be mostly finished and frozen, but feel free to file issues if you have a problem.

The implementation is intentionally kept close to the paper, with just a little bit of extra convenience provided by Go. As such, the performance is probably "meh" (though I haven't perceived any issues yet) and I don't intend to improve it considerably.

License

Copyright 2020 Axel Wagner

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Documentation ¶

Overview ¶

Package diff calculates the differences between two sequences.

It implements the algorithm from "An Algorithm for Differential File Comparison" by Hunt and McIlroy: https://www.cs.dartmouth.edu/~doug/diff.pdf

For flexibility, the algorithm itself operates on a sequence of integers. This allows you to compare arbitrary sequences, as long as you can map their elements to a uint64.

To generate a diff for text, the inputs need to be split and hashed. Splitting should be done to reduce algorithmic complexity (which is O(m•n•log(m)) in the worst case). It also creates diffs that are better suited for human consumption. Hashing means that collisions are possible, but they should be rare enough in practice to not matter. If they do happen, the resulting diff might be subpoptimal.

Index ¶

func SplitLines(b []byte) (tok, skip int)
type HashFunc
- func DefaultHash() HashFunc
type Op
- func Uint64(a, b []uint64) []Op
type SplitFunc
type TextDelta
- func Lines(a, b []byte) []TextDelta
- func Text(a, b []byte, s SplitFunc, h HashFunc) []TextDelta

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func SplitLines ¶

func SplitLines(b []byte) (tok, skip int)

SplitLines splits at newlines, stripping them in the process.

Types ¶

type HashFunc ¶

type HashFunc func(b []byte) uint64

A HashFunc maps a token to an integer.

func DefaultHash ¶

func DefaultHash() HashFunc

DefaultHash returns a sensible (but unspecified) hash function for TextDiff.

type Op ¶

type Op int

const (
	OpA, OpEq, OpB Op = -1, 0, 1
)

func Uint64 ¶

func Uint64(a, b []uint64) []Op

Diff calculates a minimal diff between a and b as a series of operations. See the example for how to interpret the result.

Example ¶

package main

import (
	"fmt"

	"github.com/Merovius/diff"
)

func main() {
	a := []uint64{1, 1, 1, 3, 4, 4}
	b := []uint64{0, 1, 0, 1, 0, 3, 1, 4, 5, 4, 6}
	for _, o := range diff.Uint64(a, b) {
		switch o {
		case diff.OpA:
			fmt.Printf("-%d\n", a[0])
			a = a[1:]
		case diff.OpEq:
			fmt.Printf(" %d\n", a[0])
			a, b = a[1:], b[1:]
		case diff.OpB:
			fmt.Printf("+%d\n", b[0])
			b = b[1:]
		}
	}
}

Output:

+0
 1
+0
 1
+0
+3
 1
-3
 4
+5
 4
+6

type SplitFunc ¶

type SplitFunc func(b []byte) (tok, skip int)

A SplitFunc splits a token from b. tok specifies the length of the next token and skip specifies how many bytes to skip after the token. If neither of them is positive or either of them is negative, TextDiff will panic.

type TextDelta ¶

type TextDelta struct {
	Op
	Text []byte
}

TextDelta describes a line of the resulting diff.

func Lines ¶

func Lines(a, b []byte) []TextDelta

LineDiff is equivalent to TextDiff(a, b, nil, nil).

func Text ¶

func Text(a, b []byte, s SplitFunc, h HashFunc) []TextDelta

TextDiff calculates a diff between a and b. s is used to separate them into tokens and h is used to map those to integers. If s is nil, SplitLines is used. If h is nil, DefaultHash is used.

The resulting diff will contain separate TextDelta values per token (even if consecutive elements use the same Op). See the example for how to use construct the diff from it.

In case of an EqOp delta where the corresponding tokens of a and b differ (but hash to the same value), it is unspecified which of the two is returned.

Example ¶

package main

import (
	"fmt"

	"github.com/Merovius/diff"
)

func main() {
	a := []byte("a\nb\nc\nd\nf\ng\nh\nj\nq\nz")
	b := []byte("a\nb\nc\nd\ne\nf\ng\ni\nj\nk\nr\nx\ny\nz")
	for _, δ := range diff.Text(a, b, nil, nil) {
		switch δ.Op {
		case diff.OpA:
			fmt.Printf("- %s\n", δ.Text)
		case diff.OpEq:
			fmt.Printf("  %s\n", δ.Text)
		case diff.OpB:
			fmt.Printf("+ %s\n", δ.Text)
		}
	}
}

Output:

  a
  b
  c
  d
+ e
  f
  g
- h
+ i
  j
- q
+ k
+ r
+ x
+ y
  z

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL