Documentation
¶
Overview ¶
Package buzhash implements the buzhash rolling hash algorithm for content-defined chunking.
Content-defined chunking splits a byte stream into variable-size chunks based on the data content rather than fixed boundaries. This enables deduplication: identical data produces identical chunk boundaries, so unchanged regions between backups yield the same chunks.
Configuration ¶
Use NewConfig to create a Config from a desired average chunk size. The config determines minimum, maximum, and average chunk sizes as well as the hash mask and threshold used for boundary detection:
cfg, err := buzhash.NewConfig(4096) // ~4 KiB average chunks
if err != nil {
log.Fatal(err)
}
Chunking a Stream ¶
Create a Chunker from an io.Reader and call Next repeatedly:
chunker := buzhash.NewChunker(reader, cfg)
for {
chunk, err := chunker.Next()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
// process chunk
}
Low-level Hasher ¶
For custom chunking logic, use Hasher directly. It provides a 32-bit rolling hash over a 64-byte sliding window:
h := buzhash.NewHasher() h.Update(byte) sum := h.Sum()
Index ¶
Constants ¶
const WindowSize = 64
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Chunker ¶
type Chunker struct {
// contains filtered or unexported fields
}
Chunker splits a data stream into variable-size chunks using buzhash content-defined chunking. It performs zero heap allocations during scanning.
The returned slice from Next references internal buffers and is valid only until the next call to Next. Callers must copy the data if they need to retain it.
func NewChunker ¶
NewChunker creates a chunker that reads from r with the given config.
type Config ¶
type Config struct {
AvgChunkSize int
MinChunkSize int
MaxChunkSize int
Mask uint32
Threshold uint32
}
Config holds chunking parameters derived from an average chunk size.
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns the standard 4MB average chunk size configuration.
type Hasher ¶
type Hasher struct {
// contains filtered or unexported fields
}
Hasher implements the buzhash rolling hash.
func (*Hasher) BytesProcessed ¶
BytesProcessed returns the total number of bytes fed to the hasher.
func (*Hasher) InitFromData ¶
InitFromData initializes the hash from the first WindowSize bytes of data. If data has fewer than WindowSize bytes, all bytes are consumed.