chunker

package
v0.0.0-...-ff5f600 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2016 License: BSD-3-Clause Imports: 5 Imported by: 0

Documentation

Overview

Package chunker breaks a stream of bytes into context-defined chunks whose boundaries are chosen based on content checksums of a window that slides over the data. An edited sequence with insertions and removals can share many chunks with the original sequence.

The intent is that when a sequence of bytes is to be transmitted to a recipient that may have much of the data, the sequence can be broken down into chunks. The checksums of the resulting chunks may then be transmitted to the recipient, which can then discover which of the chunks it has, and which it needs.

Example:

     var s *chunker.Stream = chunker.New(&chunker.DefaultParam, anIOReader)
     for s.Advance() {
		var chunk []byte := s.Value()
             // process chunk
	}
	if s.Err() != nil {
		// anIOReader generated an error.
	}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Param

type Param struct {
	WindowWidth int    // the window size to use when looking for chunk boundaries
	MinChunk    int64  // minimum chunk size
	MaxChunk    int64  // maximum chunk size
	Primary     uint64 // primary divisor; the expected chunk size
	Secondary   uint64 // secondary divisor
}

A Param contains the parameters for chunking.

Chunks are broken based on a hash of a sliding window of width WindowWidth bytes. Each chunk is at most MaxChunk bytes long, and, unless end-of-file or an error is reached, at least MinChunk bytes long.

Subject to those constaints, a chunk boundary introduced at the first point where the hash of the sliding window is 1 mod Primary, or if that doesn't occur before MaxChunk bytes, at the last position where the hash is 1 mod Secondary, or if that does not occur, after MaxChunk bytes. Normally, MinChunk < Primary < MaxChunk. Primary is the expected chunk size. The Secondary divisor exists to make it more likely that a chunk boundary is selected based on the local data when the Primary divisor by chance does not find a match for a long distance. It should be a few times smaller than Primary.

Using primes for Primary and Secondary is not essential, but recommended because it guarantees mixing of the checksum bits should their distribution be non-uniform.

var DefaultParam Param = Param{WindowWidth: 48, MinChunk: 512, MaxChunk: 3072, Primary: 601, Secondary: 307}

DefaultParam contains default chunking parameters.

type PosStream

type PosStream struct {
	// contains filtered or unexported fields
}

A PosStream is just like a Stream, except that the Value() method returns only the byte offsets of the ends of chunks, rather than the chunks themselves. It can be used when chunks are too large to buffer a small number comfortably in memory.

func NewPosStream

func NewPosStream(ctx *context.T, param *Param, rd io.Reader) *PosStream

NewPosStream() returns a pointer to a new PosStream instance, with the parameters in *param.

func (*PosStream) Advance

func (ps *PosStream) Advance() bool

Advance() stages the offset of the end of the next chunk so that it may be retrieved via Value(). Returns true iff there is an item to retrieve. Advance() must be called before Value() is called.

func (*PosStream) Cancel

func (ps *PosStream) Cancel()

Cancel() causes the next call to Advance() to return false. It should be used when the client does not wish to iterate to the end of the stream. Never blocks. May be called concurrently with other method calls on ps.

func (*PosStream) Err

func (ps *PosStream) Err() error

Err() returns any error encountered by Advance(). Never blocks.

func (*PosStream) Value

func (ps *PosStream) Value() int64

Value() returns the chunk that was staged by Advance(). May panic if Advance() returned false or was not called. Never blocks.

type Stream

type Stream struct {
	// contains filtered or unexported fields
}

A Stream allows a client to iterate over the chunks within an io.Reader byte stream.

func NewStream

func NewStream(ctx *context.T, param *Param, rd io.Reader) *Stream

NewStream() returns a pointer to a new Stream instance, with the parameters in *param.

func (*Stream) Advance

func (s *Stream) Advance() bool

Advance() stages the next chunk so that it may be retrieved via Value(). Returns true iff there is an item to retrieve. Advance() must be called before Value() is called.

func (*Stream) Cancel

func (s *Stream) Cancel()

Cancel() causes the next call to Advance() to return false. It should be used when the client does not wish to iterate to the end of the stream. Never blocks. May be called concurrently with other method calls on s.

func (*Stream) Err

func (s *Stream) Err() (err error)

Err() returns any error encountered by Advance(). Never blocks.

func (*Stream) Value

func (s *Stream) Value() []byte

Value() returns the chunk that was staged by Advance(). May panic if Advance() returned false or was not called. Never blocks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL