diff

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 7, 2020 License: Apache-2.0 Imports: 4 Imported by: 0

README

Documentation

Simple, yet flexible diffing package for Go

This is an implementation of the algorithm from "An Algorithm for Differential File Comparison" by Hunt and McIlroy. I wrote it because I needed something more flexible than the common line-based diff and it seemed to cumbersome to do it as a transformation on the input (and the reverse on the output). As such, I attempted to make it flexible enough for whatever semantics you might need. It's intended to be mostly finished and frozen, but feel free to file issues if you have a problem.

The implementation is intentionally kept close to the paper, with just a little bit of extra convenience provided by Go. As such, the performance is probably "meh" (though I haven't perceived any issues yet) and I don't intend to improve it considerably.

License

Copyright 2020 Axel Wagner

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Documentation

Overview

Package diff calculates the differences between two sequences.

It implements the algorithm from "An Algorithm for Differential File Comparison" by Hunt and McIlroy: https://www.cs.dartmouth.edu/~doug/diff.pdf

For flexibility, the algorithm itself operates on a sequence of integers. This allows you to compare arbitrary sequences, as long as you can map their elements to a uint64.

To generate a diff for text, the inputs need to be split and hashed. Splitting should be done to reduce algorithmic complexity (which is O(m•n•log(m)) in the worst case). It also creates diffs that are better suited for human consumption. Hashing means that collisions are possible, but they should be rare enough in practice to not matter. If they do happen, the resulting diff might be subpoptimal.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func SplitLines

func SplitLines(b []byte) (tok, skip int)

SplitLines splits at newlines, stripping them in the process.

Types

type HashFunc

type HashFunc func(b []byte) uint64

A HashFunc maps a token to an integer.

func DefaultHash

func DefaultHash() HashFunc

DefaultHash returns a sensible (but unspecified) hash function for TextDiff.

type Op

type Op int
const (
	OpA, OpEq, OpB Op = -1, 0, 1
)

func Uint64

func Uint64(a, b []uint64) []Op

Diff calculates a minimal diff between a and b as a series of operations. See the example for how to interpret the result.

Example
package main

import (
	"fmt"

	"github.com/Merovius/diff"
)

func main() {
	a := []uint64{1, 1, 1, 3, 4, 4}
	b := []uint64{0, 1, 0, 1, 0, 3, 1, 4, 5, 4, 6}
	for _, o := range diff.Uint64(a, b) {
		switch o {
		case diff.OpA:
			fmt.Printf("-%d\n", a[0])
			a = a[1:]
		case diff.OpEq:
			fmt.Printf(" %d\n", a[0])
			a, b = a[1:], b[1:]
		case diff.OpB:
			fmt.Printf("+%d\n", b[0])
			b = b[1:]
		}
	}
}
Output:

+0
 1
+0
 1
+0
+3
 1
-3
 4
+5
 4
+6

type SplitFunc

type SplitFunc func(b []byte) (tok, skip int)

A SplitFunc splits a token from b. tok specifies the length of the next token and skip specifies how many bytes to skip after the token. If neither of them is positive or either of them is negative, TextDiff will panic.

type TextDelta

type TextDelta struct {
	Op
	Text []byte
}

TextDelta describes a line of the resulting diff.

func Lines

func Lines(a, b []byte) []TextDelta

LineDiff is equivalent to TextDiff(a, b, nil, nil).

func Text

func Text(a, b []byte, s SplitFunc, h HashFunc) []TextDelta

TextDiff calculates a diff between a and b. s is used to separate them into tokens and h is used to map those to integers. If s is nil, SplitLines is used. If h is nil, DefaultHash is used.

The resulting diff will contain separate TextDelta values per token (even if consecutive elements use the same Op). See the example for how to use construct the diff from it.

In case of an EqOp delta where the corresponding tokens of a and b differ (but hash to the same value), it is unspecified which of the two is returned.

Example
package main

import (
	"fmt"

	"github.com/Merovius/diff"
)

func main() {
	a := []byte("a\nb\nc\nd\nf\ng\nh\nj\nq\nz")
	b := []byte("a\nb\nc\nd\ne\nf\ng\ni\nj\nk\nr\nx\ny\nz")
	for _, δ := range diff.Text(a, b, nil, nil) {
		switch δ.Op {
		case diff.OpA:
			fmt.Printf("- %s\n", δ.Text)
		case diff.OpEq:
			fmt.Printf("  %s\n", δ.Text)
		case diff.OpB:
			fmt.Printf("+ %s\n", δ.Text)
		}
	}
}
Output:

  a
  b
  c
  d
+ e
  f
  g
- h
+ i
  j
- q
+ k
+ r
+ x
+ y
  z

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL