Documentation ¶
Overview ¶
Package fastcdc is a Go implementation of the FastCDC content defined chunking algorithm. See https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf for details.
Example (Basic) ¶
data := make([]byte, 10*1024*1024) rand.Seed(4542) rand.Read(data) rd := bytes.NewReader(data) chunker, err := fastcdc.NewChunker(rd, fastcdc.Options{ AverageSize: 1024 * 1024, // target 1 MiB average chunk size }) if err != nil { log.Fatal(err) } fmt.Printf("%-32s %s\n", "CHECKSUM", "CHUNK SIZE") for { chunk, err := chunker.Next() if err == io.EOF { break } if err != nil { log.Fatal(err) } fmt.Printf("%x %d\n", md5.Sum(chunk.Data), chunk.Length) }
Output: CHECKSUM CHUNK SIZE d5bb40f862d68f4c3a2682e6d433f0d7 1788060 113a0aa2023d7dce6a2aac1f807b5bd2 1117240 5b9147b10d4fe6f96282da481ce848ca 1180487 dcc4644befb599fa644635b0c5a1ea2c 1655501 224db3de422ad0dd2c840e3e24e0cb03 363172 e071658eccda587789f1dabb6f773851 1227750 215868103f0b4ea7f715e179e5b9a6c7 1451026 21e65e40970ec22f5b13ddf60493b746 1150129 b8209a1dbef955ef64636af796450252 552395
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Chunk ¶
type Chunk struct { // Offset is the number of bytes from the start of the reader to the beginning of // the chunk. Offset int // Length is the length of the chunk in bytes. Same as len(Data). Length int // Data is the chunk data. Data []byte // Fingerprint is the value of the rolling hash algorithm for the chunk data. Fingerprint uint64 }
Chunk stores a content-defined chunk returned by a Chunker.
type Chunker ¶
type Chunker struct {
// contains filtered or unexported fields
}
Chunker implements the FastCDC content defined chunking algorithm. See https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf.
func NewChunker ¶
NewChunker returns a Chunker with the given Options.
type Options ¶
type Options struct { // NormalSize is the target chunk size. Typically a power of 2. It must be in the // range 64B to 1GiB. AverageSize int // (Optional) MinSize is the minimum allowed chunk size. By default, it's set to // AverageSize / 4. MinSize int // (Optional) MaxSize is the maximum allowed chunk size. By default, it's set to // AverageSize * 4. MaxSize int // (Optional) Sets the chunk normalization level. It may be set to 1, 2 or 3, // unless DisableNormalization is set, in which case it's ignored. By default, // it's set to 2. Normalization int // (Optional) DisableNormalization turns normalization off. By default, it's set to // false. DisableNormalization bool // (Optional) Seed alters the lookup table of the rolling hash algorithm to mitigate // chunk-size based fingerprinting attacks. It may be set to a random uint64. Seed uint64 // (Optional) BufSize is the size of the internal buffer used while chunking. It has // no effect on the chuking output, but performance is improved with larger buffers. // It must be at least MaxSize. Recommended values are 1 to 3 times MaxSize. By // default it is set to MaxSize * 2. BufSize int }
Options configures the options for the Chunker.
Click to show internal directories.
Click to hide internal directories.