buzhash

package
v0.28.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2026 License: MIT Imports: 2 Imported by: 0

Documentation

Overview

Package buzhash implements the buzhash rolling hash algorithm for content-defined chunking.

Content-defined chunking splits a byte stream into variable-size chunks based on the data content rather than fixed boundaries. This enables deduplication: identical data produces identical chunk boundaries, so unchanged regions between backups yield the same chunks.

Configuration

Use NewConfig to create a Config from a desired average chunk size. The config determines minimum, maximum, and average chunk sizes as well as the hash mask and threshold used for boundary detection:

cfg, err := buzhash.NewConfig(4096) // ~4 KiB average chunks
if err != nil {
    log.Fatal(err)
}

Chunking a Stream

Create a Chunker from an io.Reader and call Next repeatedly:

chunker := buzhash.NewChunker(reader, cfg)
for {
    chunk, err := chunker.Next()
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }
    // process chunk
}

Low-level Hasher

For custom chunking logic, use Hasher directly. It provides a 32-bit rolling hash over a 64-byte sliding window:

h := buzhash.NewHasher()
h.Update(byte)
sum := h.Sum()

Index

Constants

View Source
const WindowSize = 64

Variables

This section is empty.

Functions

This section is empty.

Types

type Chunker

type Chunker struct {
	// contains filtered or unexported fields
}

Chunker splits a data stream into variable-size chunks using buzhash content-defined chunking. It performs zero heap allocations during scanning.

The returned slice from Next references internal buffers and is valid only until the next call to Next. Callers must copy the data if they need to retain it.

func NewChunker

func NewChunker(r io.Reader, config Config) *Chunker

NewChunker creates a chunker that reads from r with the given config.

func (*Chunker) Next

func (c *Chunker) Next() ([]byte, error)

Next returns the next chunk of data. The returned slice references internal buffers and is valid only until the next call to Next. Returns io.EOF when there is no more data.

func (*Chunker) Reset

func (c *Chunker) Reset(r io.Reader)

Reset resets the chunker to process a new stream.

type Config

type Config struct {
	AvgChunkSize int
	MinChunkSize int
	MaxChunkSize int
	Mask         uint32
	Threshold    uint32
}

Config holds chunking parameters derived from an average chunk size.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns the standard 4MB average chunk size configuration.

func NewConfig

func NewConfig(avgChunkSize int) (Config, error)

NewConfig creates a Config from an average chunk size, which must be a power of two.

type Hasher

type Hasher struct {
	// contains filtered or unexported fields
}

Hasher implements the buzhash rolling hash.

func NewHasher

func NewHasher() *Hasher

NewHasher creates a new Hasher.

func (*Hasher) BytesProcessed

func (h *Hasher) BytesProcessed() int

BytesProcessed returns the total number of bytes fed to the hasher.

func (*Hasher) InitFromData

func (h *Hasher) InitFromData(data []byte)

InitFromData initializes the hash from the first WindowSize bytes of data. If data has fewer than WindowSize bytes, all bytes are consumed.

func (*Hasher) Reset

func (h *Hasher) Reset()

Reset returns the hasher to its initial state.

func (*Hasher) Sum

func (h *Hasher) Sum() uint32

Sum returns the current hash value.

func (*Hasher) Update

func (h *Hasher) Update(in byte)

Update slides the window by one byte.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL