Documentation
¶
Overview ¶
Package deepdiff is a structured data differ that aims for near-linear time complexity. It's intended to calculate differences & apply patches to structured data ranging from 0-500MBish of encoded JSON
Diffing structured data carries additional complexity when compared to the standard unix diff utility, which operates on lines of text. By using the structure of data itself, deepdiff is able to provide a rich description of changes that maps onto the structure of the data itself. deepdiff ignores semantically irrelevant changes like whitespace, and can isolate changes like column changes to tabular data to only the relevant switches
Most algorithms in this space have quadratic time complexity, which from our testing makes them very slow on 3MB JSON documents and unable to complete on 5MB or more. deepdiff currently hovers around the 0.9Sec/MB range on 4 core processors
Instead of operating on JSON directly, deepdiff operates on document trees consisting of the go types created by unmarshaling from JSON, which aretwo complex types:
map[string]interface{} []interface{}
and five scalar types:
string, int, float64, bool, nil
by operating on native go types deepdiff can compare documents encoded in different formats, for example decoded CSV or CBOR.
deepdiff is based off an algorithm designed for diffing XML documents outlined in: Detecting Changes in XML Documents by Grégory Cobéna & Amélie Marian https://ieeexplore.ieee.org/document/994696 it's been adapted to fit purposes of diffing for Qri: https://github.com/qri-io/qri the guiding use case for this work
deepdiff also includes a tool for applying patches, see documentation for details
Example ¶
// we'll use the background as our execution context ctx := context.Background() // start with two slightly different json documents aJSON := []byte(`{ "a": 100, "baz": { "a": { "d": "apples-and-oranges" } } }`) bJSON := []byte(`{ "a": 99, "baz": { "a": { "d": "apples-and-oranges" }, "e": "thirty-thousand-something-dogecoin" } }`) // unmarshal the data into generic interfaces var a, b interface{} if err := json.Unmarshal(aJSON, &a); err != nil { panic(err) } if err := json.Unmarshal(bJSON, &b); err != nil { panic(err) } // create a differ, using the default configuration dd := New() // Diff will produce a slice of Deltas that describe the structured changes. // by default Diff will not calculate moves, only inserts, deletes, and // updates diffs, err := dd.Diff(ctx, a, b) if err != nil { panic(err) } // diffs use a custom compact JSON Marshaller output, err := json.MarshalIndent(diffs, "", " ") if err != nil { panic(err) } fmt.Println(string(output))
Output: [ [ "-", "a", 100 ], [ "+", "a", 99 ], [ " ", "baz", null, [ [ " ", "a", { "d": "apples-and-oranges" } ], [ "+", "e", "thirty-thousand-something-dogecoin" ] ] ] ]
Index ¶
- Constants
- Variables
- func FormatPretty(w io.Writer, changes Deltas, colorTTY bool) error
- func FormatPrettyStats(w io.Writer, diffStat *Stats, colorTTY bool)
- func FormatPrettyStatsString(diffStat *Stats, colorTTY bool) string
- func FormatPrettyString(changes Deltas, colorTTY bool) (string, error)
- func Patch(deltas Deltas, target interface{}) error
- type Addr
- type Config
- type DeepDiff
- type Delta
- type Deltas
- type DiffOption
- type IndexAddr
- type Operation
- type RootAddr
- type Stats
- type StringAddr
Examples ¶
Constants ¶
const ( // DTContext indicates unchanged contextual details present in both A and B DTContext = Operation(" ") // DTDelete means making the children of a node // become the children of a node's parent DTDelete = Operation("-") // DTInsert is the compliment of deleting, adding // children of a parent node to a new node, and making // that node a child of the original parent DTInsert = Operation("+") // DTUpdate is an alteration of a scalar data type (string, bool, float, etc) DTUpdate = Operation("~") )
Variables ¶
var NewHash = func() hash.Hash { return fnv.New64() }
NewHash returns a new hash interface, wrapped in a function for easy hash algorithm switching, package consumers can override NewHash with their own desired hash.Hash implementation if the value space is particularly large. default is 64-bit FNV 1 for fast, cheap, (non-cryptographic) hashing
Functions ¶
func FormatPretty ¶
FormatPretty writes a text report to w. if colorTTY is true it will add red "-" for deletions green "+" for insertions blue "~" for changes (an insert & delete at the same path) This is very much a work in progress
func FormatPrettyStats ¶
FormatPrettyStats writes stats info to a supplied writer destination, optionally adding terminal color tags
func FormatPrettyStatsString ¶ added in v0.2.0
FormatPrettyStatsString prints a string of stats info
func FormatPrettyString ¶ added in v0.2.0
FormatPrettyString is a convenience wrapper that outputs to a string instead of an io.Writer
Types ¶
type Addr ¶ added in v0.2.0
Addr is a single location within a data structure. Multiple path elements can be stitched together into a single
type Config ¶ added in v0.2.0
type Config struct { // Setting CalcChanges to true will have diff represent in-place value shifts // as changes instead of add-delete pairs CalcChanges bool }
Config are any possible configuration parameters for calculating diffs
type DeepDiff ¶ added in v0.2.0
type DeepDiff struct {
// contains filtered or unexported fields
}
DeepDiff is a configuration for performing diffs
func (*DeepDiff) Diff ¶ added in v0.2.0
Diff computes a slice of deltas that define an edit script for turning a into b. currently Diff will never return an error, error returns are reserved for future use. specifically: bailing before delta calculation based on a configurable threshold
type Delta ¶
type Delta struct { // the type of change Type Operation `json:"type"` // Path is a string representation of the patch to where the delta operation // begins in the destination documents // path should conform to the IETF JSON-pointer specification, outlined // in RFC 6901: https://tools.ietf.org/html/rfc6901 Path Addr `json:"path"` // The value to change in the destination document Value interface{} `json:"value"` // To make delta's revesible, original values are included // the original path this change from SourcePath string `json:"SourcePath,omitempty"` // the original value this was changed from, will not always be present SourceValue interface{} `json:"originalValue,omitempty"` // Child Changes Deltas `json:"deltas,omitempty"` }
Delta represents a change between a source & destination document a delta is a single "edit" that describes changes to the destination document
func (*Delta) MarshalJSON ¶ added in v0.2.0
MarshalJSON implements a custom JOSN Marshaller
type Deltas ¶ added in v0.2.0
type Deltas []*Delta
Deltas is a sortable slice of changes
type DiffOption ¶
type DiffOption func(cfg *Config)
DiffOption is a function that adjust a config, zero or more DiffOptions can be passed to the Diff function
type IndexAddr ¶ added in v0.2.0
type IndexAddr int
IndexAddr is the address of a location within list-type structures
func (IndexAddr) MarshalJSON ¶ added in v0.2.0
MarshalJSON implements the json.Marshaller interface
type RootAddr ¶ added in v0.2.0
type RootAddr struct{}
RootAddr is a nihlic address, or a reference to the outside address space
func (RootAddr) MarshalJSON ¶ added in v0.2.0
MarshalJSON writes "null". RootAddress is represented as null in JSON
type Stats ¶
type Stats struct { Left int `json:"leftNodes"` // count of nodes in the left tree Right int `json:"rightNodes"` // count of nodes in the right tree LeftWeight int `json:"leftWeight"` // byte-ish count of left tree RightWeight int `json:"rightWeight"` // byte-ish count of right tree Inserts int `json:"inserts,omitempty"` // number of nodes inserted Updates int `json:"updates,omitempty"` // number of nodes updated Deletes int `json:"deletes,omitempty"` // number of nodes deleted }
Stats holds statistical metadata about a diff
func (Stats) NodeChange ¶
NodeChange returns a count of the shift between left & right trees
func (Stats) PctWeightChange ¶
PctWeightChange returns a value from -1.0 to max(float64) representing the size shift between left & right trees
type StringAddr ¶ added in v0.2.0
type StringAddr string
StringAddr is an arbitrary key representation within a data structure. Most-often used to represent map keys
func (StringAddr) Eq ¶ added in v0.2.0
func (p StringAddr) Eq(b Addr) bool
Eq tests for equality with another address
func (StringAddr) MarshalJSON ¶ added in v0.2.0
func (p StringAddr) MarshalJSON() ([]byte, error)
MarshalJSON implements the json.Marshaller interface
func (StringAddr) String ¶ added in v0.2.0
func (p StringAddr) String() string
String returns this address as a string
func (StringAddr) Value ¶ added in v0.2.0
func (p StringAddr) Value() interface{}
Value returns StringAddr as a string, a common go type