diff

package module
v0.0.0-...-b13b7c6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 30, 2025 License: Apache-2.0 Imports: 5 Imported by: 1

README

znkr.io/diff

Go Reference Go Report Card

A high-performance difference algorithm module for Go.

Difference algorithms compare two inputs and find the edits that transform one to the other. This is very useful to understand changes, for example when comparing a test result with the expected result or to understand which changes have been made to a file.

This module provides diffing for arbitrary Go slices and text.

Installation

To use this module in your Go project, run:

go get znkr.io/diff

API Documentation

Full documentation available at pkg.go.dev/znkr.io/diff.

Examples

Comparing Slices

Diffing two slices produces either the full list of edits

x := strings.Fields("calm seas reflect the sky")
y := strings.Fields("restless seas reflect the sky defiantly")
edits := diff.Edits(x, y)
for i, edit := range edits {
    if i > 0 {
        fmt.Print(" ")
    }
    switch edit.Op {
    case diff.Match:
        fmt.Printf("%s", edit.X)
    case diff.Delete:
        fmt.Printf("[-%s-]", edit.X)
    case diff.Insert:
        fmt.Printf("{+%s+}", edit.Y)
    default:
        panic("never reached")
    }
}
// Output:
// [-calm-] {+restless+} seas reflect the sky {+defiantly+}

or a list of hunks representing consecutive edits

x := strings.Fields("calm seas reflect the sky")
y := strings.Fields("restless seas reflect the sky defiantly")
hunks := diff.Hunks(x, y, diff.Context(1))
for i, h := range hunks {
    if i > 0 {
        fmt.Print(" … ")
    }
    for i, edit := range h.Edits {
        if i > 0 {
            fmt.Print(" ")
        }
        switch edit.Op {
        case diff.Match:
            fmt.Printf("%s", edit.X)
        case diff.Delete:
            fmt.Printf("[-%s-]", edit.X)
        case diff.Insert:
            fmt.Printf("{+%s+}", edit.Y)
        default:
            panic("never reached")
        }
    }
}
// Output:
// [-calm-] {+restless+} seas … sky {+defiantly+}

For both functions, a ...Func variant exists that works with arbitrary slices by taking an equality function.

Comparing Text

Because of its importance, comparing text line by line has special support and produces output in the unified diff format:

x := `this paragraph
is not
changed and
barely long
enough to
create a
new hunk

this paragraph
is going to be
removed
`

y := `this is a new paragraph
that is inserted at the top

this paragraph
is not
changed and
barely long
enough to
create a
new hunk
`
fmt.Print(textdiff.Unified(x, y))
// Output:
// @@ -1,3 +1,6 @@
// +this is a new paragraph
// +that is inserted at the top
// +
//  this paragraph
//  is not
//  changed and
// @@ -5,7 +8,3 @@
//  enough to
//  create a
//  new hunk
// -
// -this paragraph
// -is going to be
// -removed

Stability

Status: Beta - This project is in beta, pending API reviews and general feedback, both are very welcome.

As a general rule, the exact diff output will never be guaranteed to be stable: I expect that performance and quality improvements will always be possible and they will likely change the output of a diff. Therefore, committing to a stable diff result would be too limiting.

Diff Readability

Diffs produced by this module are intended to be readable by humans.

Readable diffs have been the subject of a lot of discussions and have even resulted in some new diffing algorithms like the patience or histogram algorithms in git. However, the best work about diff readability by far is diff-slider-tools by Michael Haggerty. He implemented a heuristic that's applied in a post-processing step to improve the readability. This module implements this heuristic in the textdiff package.

For example:

x := `// ...
["foo", "bar", "baz"].map do |i|
  i.upcase
end
`

y := `// ...
["foo", "bar", "baz"].map do |i|
  i
end

["foo", "bar", "baz"].map do |i|
  i.upcase
end
`

fmt.Println("With textdiff.IndentHeuristic:")
fmt.Print(textdiff.Unified(x, y, textdiff.IndentHeuristic()))
fmt.Println()
fmt.Println("Without textdiff.IndentHeuristic:")
fmt.Print(textdiff.Unified(x, y))
// Output:
// With textdiff.IndentHeuristic:
// @@ -1,4 +1,8 @@
//  // ...
// +["foo", "bar", "baz"].map do |i|
// +  i
// +end
// +
//  ["foo", "bar", "baz"].map do |i|
//    i.upcase
//  end
//
// Without textdiff.IndentHeuristic:
// @@ -1,4 +1,8 @@
//  // ...
//  ["foo", "bar", "baz"].map do |i|
// +  i
// +end
// +
// +["foo", "bar", "baz"].map do |i|
//    i.upcase
//  end

Performance

By default, the underlying diff algorithm used is Myers' algorithm augmented by a number of heuristics to speed up the algorithm in exchange for non-minimal diffs. The diff.Optimal option is provided to skip these heuristics to get a minimal diff independent of the costs and diff.Fast to use a fast heuristic to get a non-minimal diff as fast as possible.

On an M1 Mac, the default settings almost always result in runtimes < 1 ms, but truly large diffs (e.g. caused by changing generators for generated files) can result in runtimes of almost 100 ms. Below is the distribution of runtimes applying textdiff.Unified to every commit in the Go repository (y-axis is in log scale):

histogram of textdiff.Unified runtime

Comparison with other Implementations

Comparing the performance with other Go modules that implement the same features is always interesting, because it can surface missed optimization opportunities. This is especially interesting for larger inputs where superlinear growth can become a problem. Below are benchmarks of znkr.io/diff against other popular Go diff modules:

  • znkr: Default configuration with performance optimizations enabled
  • znkr-optimal: With diff.Optimal() option for minimal diffs
  • znkr-fast: With diff.Fast() option for fastest possible diffing
  • go-internal: Patience diff algorithm from github.com/rogpeppe/go-internal
  • diffmatchpatch: Implementation from github.com/sergi/go-diff

Note: It's possible that the benchmark is using diffmatchpatch incorrectly, the benchmark numbers certainly look suspiciously high. However, the way it's used in the benchmark is used in at least one large open source project.

Runtime Performance (seconds per operation)

On the benchmarks used for this comparison znkr.io/diff almost always outperforms the other implementations. However, there's one case where go-internal is significantly faster, but the resulting diff is 10% larger (see numbers below).

Test Case znkr (baseline) znkr-optimal znkr-fast go-internal diffmatchpatch
large_01 2.605ms 10.813ms
(+315.06%)
2.606ms
(±0%)
4.888ms
(+87.60%)
42.546ms
(+1533.09%)
large_02 20.611ms 49.379ms
(+139.58%)
1.805ms
(-91.24%)
4.190ms
(-79.67%)
628.690ms
(+2950.32%)
large_03 3.054ms 14.910ms
(+388.18%)
3.129ms
(±0%)
4.789ms
(+56.80%)
30.803ms
(+908.56%)
large_04 6.707ms 251.801ms
(+3654.45%)
5.749ms
(-14.28%)
8.735ms
(+30.24%)
1011.684ms
(+14984.61%)
medium 25.00µs 24.72µs
(±0%)
27.46µs
(+9.86%)
67.55µs
(+170.24%)
247.63µs
(+890.72%)
small 17.15µs 16.99µs
(±0%)
16.71µs
(-2.56%)
39.48µs
(+130.15%)
72.75µs
(+324.13%)
Diff Minimality (number of edits produced)
Test Case znkr (baseline) znkr-optimal znkr-fast go-internal diffmatchpatch
large_01 5.615k edits 5.615k edits
(±0%)
5.615k edits
(±0%)
5.617k edits
(+0.04%)
5.616k edits
(+0.02%)
large_02 28.87k edits 28.83k edits
(-0.15%)
31.80k edits
(+10.15%)
31.81k edits
(+10.17%)
28.83k edits
(-0.14%)
large_03 5.504k edits 5.504k edits
(±0%)
5.504k edits
(±0%)
5.506k edits
(+0.04%)
5.505k edits
(+0.02%)
large_04 26.99k edits 26.99k edits
(-0.01%)
27.80k edits
(+2.99%)
27.80k edits
(+2.99%)
60.36k edits
(+123.65%)
medium 277 edits 277 edits
(±0%)
277 edits
(±0%)
283 edits
(+2.17%)
278 edits
(+0.36%)
small 108 edits 108 edits
(±0%)
114 edits
(+5.56%)
120 edits
(+11.11%)
109 edits
(+0.93%)

Correctness

I tested this diff implementation against every commit in the Go repository using the standard unix patch tool to ensure that all diff results are correct.

This test is part of the test suite for this module and can be run with

go run ./internal/cmd/eval -repo <repo>

License

This module is distributed under the Apache License, Version 2.0, see LICENSE for more information.

Documentation

Overview

Package diff provides functions to efficiently compare two slices similar to the Unix diff command line tool to compare files.

The main functions are Hunks, which groups changes into contextual blocks, and Edits, which returns every individual change. By default, the algorithms are optimized for performance and may use heuristics for very large inputs. Use Optimal to disable these heuristics when you need the shortest possible diff.

Performance: Default complexity is O(N^1.5 log N) time and O(N) space. With Optimal, time complexity becomes O(ND) where N = len(x) + len(y) and D is the number of edits.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Edit

type Edit[T any] struct {
	Op   Op
	X, Y T
}

Edit describes a single edit of a diff.

  • For Match, both X and Y contain the matching element.
  • For Delete, X contains the deleted element and Y is unset (zero value).
  • For Insert, Y contains the inserted element and X is unset (zero value).

func Edits

func Edits[T comparable](x, y []T, opts ...Option) []Edit[T]

Edits compares the contents of x and y and returns the changes necessary to convert from one to the other.

Edits returns one edit for every element in the input slices. If x and y are identical, the output will consist of a match edit for every input element.

The following option is supported: diff.Optimal, diff.Fast

Important: The output is not guaranteed to be stable and may change with minor version upgrades. DO NOT rely on the output being stable.

Example (Runes)

Compare two strings rune by rune.

package main

import (
	"fmt"

	"znkr.io/diff"
)

func main() {
	x := []rune("Hello, World")
	y := []rune("Hello, 世界")
	edits := diff.Edits(x, y)
	for _, edit := range edits {
		switch edit.Op {
		case diff.Match:
			fmt.Printf("%s", string(edit.X))
		case diff.Delete:
			fmt.Printf("[-%s-]", string(edit.X))
		case diff.Insert:
			fmt.Printf("{+%s+}", string(edit.Y))
		default:
			panic("never reached")
		}
	}
}
Output:

Hello, [-W-][-o-][-r-][-l-][-d-]{+世+}{+界+}
Example (Words)

Compare two strings word by word.

package main

import (
	"fmt"
	"strings"

	"znkr.io/diff"
)

func main() {
	x := strings.Fields("calm seas reflect the sky")
	y := strings.Fields("restless seas reflect the sky defiantly")
	edits := diff.Edits(x, y)
	for i, edit := range edits {
		if i > 0 {
			fmt.Print(" ")
		}
		switch edit.Op {
		case diff.Match:
			fmt.Printf("%s", edit.X)
		case diff.Delete:
			fmt.Printf("[-%s-]", edit.X)
		case diff.Insert:
			fmt.Printf("{+%s+}", edit.Y)
		default:
			panic("never reached")
		}
	}
}
Output:

[-calm-] {+restless+} seas reflect the sky {+defiantly+}

func EditsFunc

func EditsFunc[T any](x, y []T, eq func(a, b T) bool, opts ...Option) []Edit[T]

EditsFunc compares the contents of x and y using the provided equality comparison and returns the changes necessary to convert from one to the other.

EditsFunc returns edits for every element in the input. If both x and y are identical, the output will consist of a match edit for every input element.

The following option is supported: diff.Optimal

Note that this function has generally worse performance than Edits for diffs with many changes.

Important: The output is not guaranteed to be stable and may change with minor version upgrades. DO NOT rely on the output being stable.

type Hunk

type Hunk[T any] struct {
	PosX, EndX int       // Start and end position in x.
	PosY, EndY int       // Start and end position in y.
	Edits      []Edit[T] // Edits to transform x[PosX:EndX] to y[PosY:EndY]
}

Hunk describes a sequence of consecutive edits.

func Hunks

func Hunks[T comparable](x, y []T, opts ...Option) []Hunk[T]

Hunks compares the contents of x and y and returns the changes necessary to convert from one to the other.

The output is a sequence of hunks. A hunk represents a contiguous block of changes (insertions and deletions) along with some surrounding context. The amount of context can be configured using Context.

If x and y are identical, the output has length zero.

The following options are supported: diff.Context, diff.Optimal, diff.Fast

Important: The output is not guaranteed to be stable and may change with minor version upgrades. DO NOT rely on the output being stable.

Example (PseudoUnified)

Compare to strings line by line and output the difference as a pseudo-unified diff output (i.e. it's similar to what diff -u would produce). The format is not a correct unified diff though, in particular line endings (esp. at the end of the input) are handled differently.

More generally, comparing text line by line is better solved with the textdiff subpackage.

package main

import (
	"fmt"
	"strings"

	"znkr.io/diff"
)

func main() {
	x := `this paragraph
is not
changed and
barely long
enough to
create a
new hunk

this paragraph
is going to be
removed`

	y := `this is a new paragraph
that is inserted at the top

this paragraph
is not
changed and
barely long
enough to
create a
new hunk`

	xlines := strings.Split(x, "\n")
	ylines := strings.Split(y, "\n")
	hunks := diff.Hunks(xlines, ylines)
	for _, h := range hunks {
		fmt.Printf("@@ -%d,%d +%d,%d @@\n", h.PosX+1, h.EndX-h.PosX, h.PosY+1, h.EndY-h.PosY)
		for _, edit := range h.Edits {
			switch edit.Op {
			case diff.Match:
				fmt.Printf(" %s\n", edit.X)
			case diff.Delete:
				fmt.Printf("-%s\n", edit.X)
			case diff.Insert:
				fmt.Printf("+%s\n", edit.Y)
			default:
				panic("never reached")
			}
		}
	}
}
Output:

@@ -1,3 +1,6 @@
+this is a new paragraph
+that is inserted at the top
+
 this paragraph
 is not
 changed and
@@ -5,7 +8,3 @@
 enough to
 create a
 new hunk
-
-this paragraph
-is going to be
-removed

func HunksFunc

func HunksFunc[T any](x, y []T, eq func(a, b T) bool, opts ...Option) []Hunk[T]

HunksFunc compares the contents of x and y using the provided equality comparison and returns the changes necessary to convert from one to the other.

The output is a sequence of hunks that each describe a number of consecutive edits. Hunks include a number of matching elements before and after the last delete or insert operation. The number of elements can be configured using Context.

If x and y are identical, the output has length zero.

The following options are supported: diff.Context, diff.Optimal

Note that this function has generally worse performance than Hunks for diffs with many changes.

Important: The output is not guaranteed to be stable and may change with minor version upgrades. DO NOT rely on the output being stable.

type Op

type Op int

Op describes an edit operation.

const (
	Match  Op = iota // Two slice elements match
	Delete           // A deletion from an element on the left slice
	Insert           // An insertion of an element from the right side
)

func (Op) String

func (i Op) String() string

type Option

type Option = config.Option

Option configures the behavior of comparison functions.

func Context

func Context(n int) Option

Context sets the number of unchanged elements to include around each hunk. The default is 3.

Context anchors diffs in the surrounding context in addition to position information. For example, with Context(2), you'll see 2 unchanged elements before and after each group of changes.

Only supported by functions that return hunks.

Example
package main

import (
	"fmt"
	"strings"

	"znkr.io/diff"
)

func main() {
	x := strings.Fields("calm seas reflect the sky")
	y := strings.Fields("restless seas reflect the sky defiantly")
	hunks := diff.Hunks(x, y, diff.Context(1))
	for i, h := range hunks {
		if i > 0 {
			fmt.Print(" … ")
		}
		for i, edit := range h.Edits {
			if i > 0 {
				fmt.Print(" ")
			}
			switch edit.Op {
			case diff.Match:
				fmt.Printf("%s", edit.X)
			case diff.Delete:
				fmt.Printf("[-%s-]", edit.X)
			case diff.Insert:
				fmt.Printf("{+%s+}", edit.Y)
			default:
				panic("never reached")
			}
		}
	}
}
Output:

[-calm-] {+restless+} seas … sky {+defiantly+}

func Fast

func Fast() Option

Fast uses a heuristic to find a reasonable diff instead of trying to find a minimal diff.

This option trades diff minimality for runtime performance. The resulting diff can be a lot larger than the diff created by default. The speedup from using Fast only really manifests for relatively few, very large inputs because the default already use the underlying heuristic to speed up large inputs.

The heuristic only works for comparable types.

Performance impact: This option changes the complexity to O(N log N).

func Optimal

func Optimal() Option

Optimal ensures the diff algorithm finds the shortest possible diff by disabling performance heuristics.

By default, the diff functions use heuristics to speed up computation for large inputs with many changes, which may produce slightly longer diffs. Use this option when you need the absolute shortest diff, at the cost of potentially slower performance.

Performance impact: Changes time complexity from O(N^1.5 log N) to O(ND) where N = len(x) + len(y) and D is the number of differences.

Directories

Path Synopsis
internal
byteview
Package byteview provides a mechanism to handle strings and []byte as immutable byte views.
Package byteview provides a mechanism to handle strings and []byte as immutable byte views.
cmd/eval command
eval provides a way to validate the diffing algorithm by applying the resulting diffs using the unix patch tool and checking that they produce the input again.
eval provides a way to validate the diffing algorithm by applying the resulting diffs using the unix patch tool and checking that they produce the input again.
cmd/eval/internal/git
Package git provides a simplified git interface for reading a repository for evaluations
Package git provides a simplified git interface for reading a repository for evaluations
cmd/gitdiff command
gitdiff is a tool that can be used with git using GIT_EXTERNAL_DIFF.
gitdiff is a tool that can be used with git using GIT_EXTERNAL_DIFF.
cmd/specializemyers command
specializemyers is a bit of an abomination, it takes the myers implementation and generates a specialization for int.
specializemyers is a bit of an abomination, it takes the myers implementation and generates a specialization for int.
config
Package config provides shared configuration mechanisms for packages this module.
Package config provides shared configuration mechanisms for packages this module.
impl
Package impl contains an implementation of diff algorithms.
Package impl contains an implementation of diff algorithms.
indentheuristic
Package indentheuristic is an implementation of the indentation heuristic by Michael Haggerty (https://github.com/mhagger/diff-slider-tools).
Package indentheuristic is an implementation of the indentation heuristic by Michael Haggerty (https://github.com/mhagger/diff-slider-tools).
rvecs
Package rvecs contains functions to work with the result vectors, the internal representation that's used by the myers algorithm and is then translated to a user facing API.
Package rvecs contains functions to work with the result vectors, the internal representation that's used by the myers algorithm and is then translated to a user facing API.
unixpatch
Package unixpatch provides a simple wrapper around the unix patch tool.
Package unixpatch provides a simple wrapper around the unix patch tool.
Package textdiff provides functions to efficiently compare text line-by-line.
Package textdiff provides functions to efficiently compare text line-by-line.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL