Documentation
¶
Overview ¶
Package deepdiff is a structured data differ that aims for near-linear time complexity. It's intended to calculate differences & apply patches to structured data ranging from 0-500MBish of encoded JSON
Diffing structured data carries additional complexity when compared to the standard unix diff utility, which operates on lines of text. By using the structure of data itself, deepdiff is able to provide a rich description of changes that maps onto the structure of the data itself. deepdiff ignores semantically irrelevant changes like whitespace, and can isolate changes like column changes to tabular data to only the relevant switches
Most algorithms in this space have quadratic time complexity, which from our testing makes them very slow on 3MB JSON documents and unable to complete on 5MB or more. deepdiff currently hovers around the 0.9Sec/MB range on 4 core processors
Instead of operating on JSON directly, deepdiff operates on document trees consisting of the go types created by unmarshaling from JSON, which aretwo complex types:
map[string]interface{} []interface{}
and five scalar types:
string, int, float64, bool, nil
by operating on native go types deepdiff can compare documents encoded in different formats, for example decoded CSV or CBOR.
deepdiff is based off an algorithm designed for diffing XML documents outlined in: Detecting Changes in XML Documents by Grégory Cobéna & Amélie Marian https://ieeexplore.ieee.org/document/994696 it's been adapted to fit purposes of diffing for Qri: https://github.com/qri-io/qri the guiding use case for this work
deepdiff also includes a tool for applying patches, see documentation for details
Index ¶
- Constants
- Variables
- func FormatPretty(changes []*Delta) (string, error)
- func FormatPrettyColor(changes []*Delta) (string, error)
- func FormatPrettyStats(diffStat *Stats) string
- func FormatPrettyStatsColor(diffStat *Stats) string
- func Patch(v interface{}, patch []*Delta) (err error)
- type Delta
- type DiffConfig
- type DiffOption
- type Operation
- type Stats
Constants ¶
const ( // DTDelete means making the children of a node // become the children of a node's parent DTDelete = Operation("delete") // DTInsert is the compliment of deleting, adding // children of a parent node to a new node, and making // that node a child of the original parent DTInsert = Operation("insert") // DTMove is the succession of a deletion & insertion // of the same node DTMove = Operation("move") // DTUpdate is an alteration of a scalar data type (string, bool, float, etc) DTUpdate = Operation("update") )
Variables ¶
var NewHash = func() hash.Hash { return fnv.New64() }
NewHash returns a new hash interface, wrapped in a function for easy hash algorithm switching, package consumers can override NewHash with their own desired hash.Hash implementation if the value space is particularly large. default is 64-bit FNV 1 for fast, cheap, (non-cryptographic) hashing
Functions ¶
func FormatPretty ¶
FormatPretty converts a []*Delta into a colored text report, with: red "-" for deletions green "+" for insertions blue "~" for changes (an insert & delete at the same path) This is very much a work in progress
func FormatPrettyColor ¶
FormatPrettyColor is the same as format pretty, but with tty color tags to print colored text to terminals
func FormatPrettyStats ¶
FormatPrettyStats prints a string of stats info
func FormatPrettyStatsColor ¶
FormatPrettyStatsColor prints a string of stats info with ANSI colors
Types ¶
type Delta ¶
type Delta struct { // the type of change Type Operation `json:"type"` // Path is a string representation of the patch to where the delta operation // begins in the destination documents // path should conform to the IETF JSON-pointer specification, outlined // in RFC 6901: https://tools.ietf.org/html/rfc6901 Path string `json:"path"` // The value to change in the destination document Value interface{} `json:"value"` // To make delta's revesible, original values are included // the original path this change from SourcePath string `json:"SourcePath,omitempty"` // the original value this was changed from, will not always be present SourceValue interface{} `json:"originalValue,omitempty"` }
Delta represents a change between a source & destination document a delta is a single "edit" that describes changes to the destination document
func Diff ¶
func Diff(d1, d2 interface{}, opts ...DiffOption) ([]*Delta, error)
Diff computes a slice of deltas that define an edit script for turning the value at d1 into d2 currently Diff will never return an error, error returns are reserved for future use. specifically: bailing before delta calculation based on a configurable threshold
type DiffConfig ¶
type DiffConfig struct { // If true Diff will calculate "moves" that describe changing the parent of // a subtree MoveDeltas bool // Provide a non-nil stats pointer & diff will populate it with data from // the diff process Stats *Stats }
DiffConfig are any possible configuration parameters for calculating diffs
type DiffOption ¶
type DiffOption func(cfg *DiffConfig)
DiffOption is a function that adjust a config, zero or more DiffOptions can be passed to the Diff function
func OptionSetStats ¶
func OptionSetStats(st *Stats) DiffOption
OptionSetStats will set the passed-in stats pointer when Diff is called
type Stats ¶
type Stats struct { Left int `json:"leftNodes"` // count of nodes in the left tree Right int `json:"rightNodes"` // count of nodes in the right tree LeftWeight int `json:"leftWeight"` // byte-ish count of left tree RightWeight int `json:"rightWeight"` // byte-ish count of right tree Inserts int `json:"inserts,omitempty"` // number of nodes inserted Updates int `json:"updates,omitempty"` // number of nodes updated Deletes int `json:"deletes,omitempty"` // number of nodes deleted Moves int `json:"moves,omitempty"` // number of nodes moved }
Stats holds statistical metadata about a diff
func (Stats) NodeChange ¶
NodeChange returns a count of the shift between left & right trees
func (Stats) PctWeightChange ¶
PctWeightChange returns a value from -1.0 to max(float64) representing the size shift between left & right trees