package module
Version: v0.0.0-...-78545d0 Latest Latest

This package is not in the latest version of its module.

Go to latest
Published: Dec 3, 2020 License: BSD-3-Clause Imports: 7 Imported by: 6



This library provides binary diff and patch API in golang.

Supported today:
  • Command line utilities to diff and patch binary files
  • Library for fingerprint generation, rolling hash and block matching
  • NEW : For large files, fingerprint generation automatically switches to parallel mode wherein multiple go routines are used concurrently. For files > 20 MB, there is an improvement of ~50% compared to sequential fingerprint generation.

Reference : [Rsync Algorithm] (https://rsync.samba.org/tech_report/node2.html)

  • Need to have go installed, [golang downloads] (https://golang.org/dl/)

  • Do go get

     `go get github.com/monmohan/xferspdy`
  • Install the command line utilities

    Run go install ./... from the xferspdy directory

Using the API

See GoDoc. The docs also contain an example usage of the API.

Using the fpgen, diff and patch CLI utilities:

The library also provides CLI wrappers on API.

  • You can see the usage of any of these commands using $ GOPATH/bin/<command> --help

  • Lets say you have a binary file (e.g. power point presentation MyPrezVersion1.pptx).

  • First generate a fingerprint of version 1

    $ GOPATH/bin/fpgen -file <path>/MyPrezVersion1.pptx

    This will generate the fingerprint file /MyPrezVersion1.pptx.fingerprint.

  • Lets say that the file was changed now (for example add a slide or image) and saved as MyPrezVersion2.pptx

  • Now Generate a diff (doesn't require original file)

    $ GOPATH/bin/diff -fingerprint <path>/MyPrezVersion1.pptx.fingerprint -file <path>/MyPrezVersion2.pptx

It will create a patch file <path>/MyPrezVersion2.pptx.patch

  • Now patch the Version 1 file to get the Version 2

    $ GOPATH/bin/patch -patch <path>/MyPrezVersion2.pptx.patch -base <path>/MyPrezVersion1.pptx

  • This will generate /Patched_MyPrezVersion1.pptx. This file would exactly be same as MyPrezVersion2.pptx.

NOTE: diff and patch are also common utilities present on most distributions so its better to give explicit path to these binaries. for example use $GOPATH/bin/diff and $GOPATH/bin/patch



Package xferspdy provides the basic interfaces around binary diff and patching process

//Create fingerprint of a file
fingerprint := NewFingerprint("/path/foo_v1.binary", 1024)

//Say the file was updated
//Lets generate the diff
diff := NewDiff("/path/foo_v2.binary", *fingerprint)

//diff is sufficient to recover/recreate the modified file, given the base/source and the diff.
modifiedFile, _ := os.OpenFile("/path/foo_v2_from_v1.binary", os.O_CREATE|os.O_WRONLY, 0777)

//This writes the output to modifiedFile (Writer). The result will be the same binary as /path/foo_v2.binary
PatchFile(diff, "/path/foo_v1.binary", modifiedFile)




This section is empty.


View Source
var (
	DEFAULT_GENERATOR = &FingerprintGenerator{ConcurrentMode: true, NumWorkers: 8}


func Patch

func Patch(delta []Block, sign Fingerprint, t io.Writer)

Patch is a wrapper on PatchFile (current version only supports patching of local files)

func PatchFile

func PatchFile(delta []Block, source string, t io.Writer) error

PatchFile takes a source file and Diff as input, and writes out to the Writer. The source file would normally be the base version of the file and the Diff is the delta computed by using the Fingerprint generated for the base file and the new version of the file


type Block

type Block struct {
	Start, End int64
	Checksum32 uint32
	Sha256hash [sha256.Size]byte
	HasData    bool
	RawBytes   []byte

Block represent a byte slice from the file. For each block, following are computed.

* Adler-32 and SHA256 checksum,

* Start and End byte pos of the block,

* Whether or not its a data block -If this is a data block, RawBytes will capture the byte data represented by this block

func NewDiff

func NewDiff(filename string, sign Fingerprint) []Block

NewDiff computes a diff between a given file and Fingerprint created from some other file The diff is represented as a slice of Blocks. Matching Blocks are represented just by their hashes, start and end byte position Non-matching blocks are raw binary arrays.

func (Block) String

func (b Block) String() string

type Fingerprint

type Fingerprint struct {
	Blocksz  uint32
	BlockMap map[uint32]map[[sha256.Size]byte]Block
	Source   string

Fingerprint of a given File, encapsulates the following mapping -

Adler-32 hash of Block --> SHA256 hash of Block -->Block

Also stores the block size and the source

func NewFingerprint

func NewFingerprint(filename string, blocksize uint32) *Fingerprint

NewFingerprint creates a Fingerprint for a given file and blocksize. By default it does concurrent processing of blocks to generate fingerprint. The generation is switched to sequential mode if the number of blocks is less than 50.

func NewFingerprintFromReader

func NewFingerprintFromReader(r io.Reader, blocksz uint32) *Fingerprint

NewFingerprintFromReader creates a Fingerprint for a given reader and blocksize. By default it does concurrent processing of blocks to generate fingerprint. However if the number of blocks is small <50 , then caller should use sequential generation, since the concurrent processing would not add much value. Or use the function NewFingerrprint(file, blocksize) when dealing with files, which switches mode based on the number of blocks. Number of blocks can be calculated as file size/block size

func (*Fingerprint) DeepEqual

func (f *Fingerprint) DeepEqual(other *Fingerprint) bool

func (Fingerprint) String

func (f Fingerprint) String() string

type FingerprintGenerator

type FingerprintGenerator struct {
	Source         io.Reader
	BlockSize      uint32
	ConcurrentMode bool
	NumWorkers     int

func (*FingerprintGenerator) Generate

func (g *FingerprintGenerator) Generate() *Fingerprint

Generate creates a finger print using the FingerprintGenerator. Processing i.e. concurrent or sequential depends on the generator field ConcurrentMode

type State

type State struct {
	// contains filtered or unexported fields

State of Adler-32 computation It contants, the byte arary window from the most recent computation and interim sum values

func Checksum

func Checksum(p []byte) (uint32, *State)

Checksum returns the Adler-32 checksum, computed for the given byte slice. In addition, it returns a State that captures the interim results during computation. This State can then be used to update the byte[] window and compute rolling hash

func (*State) UpdateWindow

func (s *State) UpdateWindow(nb byte) uint32

Update provides a mechanism to compute the checksum of a rolling window in single byte increments by using the hash parts computed earlier The checksum is not calculated from scratch. Instead the captured byte slice window in State struct is updated, similar to a circular buffer, and a rolling hash is calculated


Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL