zstd

package module
v0.0.0-...-e60ae61 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2026 License: BSD-3-Clause Imports: 11 Imported by: 0

README

Go zstd library

This is a proposal of implementation for compress/zstd.

See https://github.com/golang/go/issues/62513

It is based off github.com/klauspost/compress/zstd but rewritten for the interface of outlined in the issue above.

Performance is somewhere in the area of 5-10% slower than the upstream Go implementations. Mainly due to code simplification for easier maintainability.

Browse documentation: Go Reference

Notable differences to upstream:

  • Fully single-threaded.
  • All assembly removed.
  • Simplified errors.
  • Dictionary code simplified.
  • All "unsafe" removed.
  • Allows using the zero value Reader/Writer (not proposed explicitly, but seems reasonable).

Additions to proposal

Levels 0 -> 9 are mapped to 4 internal levels:

Level Encoder Window
0 raw blocks
1–2 fastEncoder 4/8MB
3–4 doubleFastEncoder 4/8MB
5–7 betterFastEncoder 8MB
8–9 bestFastEncoder 8MB

Default is level 3 similar to zstandard.

We can make changes to the individual levels, but that will require some amount of code duplication, or slower processing due to more branching. Given that levels 1 + default + 9 by far are the most common, it is not a top priority.

We can also reduce the lower levels default window size to somewhere between 1-2MB, though it will only affect RAM usage, not speed in particular.

Supporting io.WriterTo and io.ReaderFrom on Writer and Reader:

  • Added (*Writer).ReadFrom(r io.Reader) (int64, error)
  • Added (*Reader).WriteTo(w io.Writer) (int64, error)

Bytes interface:

  • Added (*Writer).AppendCompress(dst, src []byte) []byte
  • Added (*Reader).AppendDecompress(dst, src []byte) ([]byte, error)

Pre-PR

I have not added any significant test data yet. I didn't want to bloat the standard library too much. My plan was to add maybe around a MB of (compressed) fuzz base data. This would then be the regression tests for the implementation.

_testref is sanify checks that crosschecks with github.com/klauspost/compress/zstd. This will not be part of the code submitted to the Go repo.

I haven't done detailed benchmarking yet, outside that it looks reasonable compared to the upstream implementations.

Documentation

Overview

Package zstd implements the zstandard/rfc 8878 compression algorithm.

Example (Reset)
proverbs := []string{
	"Don't communicate by sharing memory, share memory by communicating.",
	"Concurrency is not parallelism.",
	"The bigger the interface, the weaker the abstraction.",
	"Documentation is for users.",
}

var buf bytes.Buffer
w := NewWriter(&buf)
r, err := NewReader(&buf)
if err != nil {
	log.Fatal(err)
}

for _, p := range proverbs {
	buf.Reset()
	w.Reset(&buf)

	if _, err := w.Write([]byte(p)); err != nil {
		log.Fatal(err)
	}
	if err := w.Close(); err != nil {
		log.Fatal(err)
	}

	if err := r.Reset(&buf); err != nil {
		log.Fatal(err)
	}
	if _, err := io.Copy(os.Stdout, r); err != nil {
		log.Fatal(err)
	}
	fmt.Println()
}
r.Close()
Output:
Don't communicate by sharing memory, share memory by communicating.
Concurrency is not parallelism.
The bigger the interface, the weaker the abstraction.
Documentation is for users.
Example (WriterReader)
var buf bytes.Buffer
w := NewWriter(&buf)
_, err := w.Write([]byte("Hello, zstd!"))
if err != nil {
	log.Fatal(err)
}
if err := w.Close(); err != nil {
	log.Fatal(err)
}

r, err := NewReader(&buf)
if err != nil {
	log.Fatal(err)
}
if _, err := io.Copy(os.Stdout, r); err != nil {
	log.Fatal(err)
}
_ = r.Close()
Output:
Hello, zstd!

Index

Examples

Constants

View Source
const (
	MinWindowSize = 1 << 10 // 1 KiB
	MaxWindowSize = 1 << 29 // 512 MiB
)

Limits for the decoder window size, as defined by the zstd specification.

View Source
const (
	DefaultCompression = -1 // level 3
	NoCompression      = 0  // store blocks without compression
	BestSpeed          = 1  // lowest compression, fastest speed
	BestCompression    = 9  // highest compression, slowest speed
)

Compression level constants. These map to the levels accepted by Writer.SetLevel.

Variables

View Source
var (
	ErrWindowSizeExceeded = errors.New("window size exceeded")
	ErrUnknownDictionary  = errors.New("unknown dictionary")
	ErrDecoderClosed      = errors.New("decoder used after Close")
	ErrEncoderClosed      = errors.New("encoder used after Close")
)

Errors returned by the zstd encoder and decoder.

Functions

This section is empty.

Types

type Dict

type Dict struct {
	// contains filtered or unexported fields
}

Dict is a parsed zstd dictionary.

func ParseDict

func ParseDict(b []byte) (*Dict, error)

ParseDict parses a zstd dictionary from its binary representation.

func (*Dict) Bytes

func (d *Dict) Bytes() []byte

Bytes returns the raw dictionary bytes.

func (*Dict) ID

func (d *Dict) ID() uint32

ID returns the dictionary ID.

func (*Dict) MarshalBinary

func (d *Dict) MarshalBinary() ([]byte, error)

MarshalBinary implements encoding.BinaryMarshaler.

func (*Dict) UnmarshalBinary

func (d *Dict) UnmarshalBinary(b []byte) error

UnmarshalBinary implements encoding.BinaryUnmarshaler.

type ErrCorrupted

type ErrCorrupted struct {
	// contains filtered or unexported fields
}

ErrCorrupted indicates that the input data is not valid zstd. Use errors.Is(err, &ErrCorrupted{}) to test for any corruption error.

func (*ErrCorrupted) Error

func (e *ErrCorrupted) Error() string

Error implements the error interface.

func (*ErrCorrupted) Is

func (e *ErrCorrupted) Is(target error) bool

Is reports whether target is an *ErrCorrupted.

func (*ErrCorrupted) Unwrap

func (e *ErrCorrupted) Unwrap() error

Unwrap returns the underlying error, if any.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader decompresses a zstd-compressed stream.

func NewReader

func NewReader(r io.Reader) (*Reader, error)

NewReader creates a new Reader reading from r. If r is nil, the Reader may only be used with [Reader.DecodeBytes]; call Reader.Reset before streaming.

func (*Reader) AddDict

func (z *Reader) AddDict(d *Dict)

AddDict registers a dictionary for decompression.

func (*Reader) AppendDecompress

func (z *Reader) AppendDecompress(dst, src []byte) ([]byte, error)

AppendDecompress decompresses src and appends the decompressed bytes to dst, returning the extended buffer. It is a one-shot alternative to the streaming Read interface.

src may contain one or more concatenated zstd frames.

The Reader must not be in the middle of a streaming Read; call Reset before switching from streaming to one-shot use.

Passing nil for dst allocates a new slice. Passing a non-nil dst lets the caller reuse memory or prepend existing data:

result, err := r.AppendDecompress(existingPrefix, compressed)

Any registered dictionaries (via AddDict or SetRawDict) apply.

Example
src := []byte("appended to existing buffer")

w := NewWriter(nil)
compressed := w.AppendCompress(nil, src)

r, err := NewReader(bytes.NewReader(nil))
if err != nil {
	log.Fatal(err)
}
defer r.Close()

prefix := []byte("data: ")
result, err := r.AppendDecompress(prefix, compressed)
if err != nil {
	log.Fatal(err)
}
fmt.Println(string(result))
Output:
data: appended to existing buffer

func (*Reader) Close

func (z *Reader) Close() error

Close releases resources. After Close, the Reader may be reused by calling Reader.Reset.

func (*Reader) Read

func (z *Reader) Read(p []byte) (int, error)

Read decompresses data into p.

func (*Reader) Reset

func (z *Reader) Reset(r io.Reader) error

Reset discards the Reader's state and makes it read from r. If r is nil, Reset is equivalent to Reader.Close.

func (*Reader) SetMaxWindowSize

func (z *Reader) SetMaxWindowSize(n uint64)

SetMaxWindowSize sets the maximum allowed window size for decoding. The default is 128 MiB. The maximum is MaxWindowSize (512 MiB).

func (*Reader) SetRawDict

func (z *Reader) SetRawDict(b []byte)

SetRawDict registers raw bytes as a dictionary with ID 0.

func (*Reader) WriteTo

func (z *Reader) WriteTo(w io.Writer) (int64, error)

WriteTo decompresses data and writes it to w until all frames are consumed or an error occurs. It implements io.WriterTo.

WriteTo writes decoded blocks directly to w without an intermediate copy, making it more efficient than reading into a buffer and writing separately.

If the Reader has buffered data from a previous Read call, that data is flushed to w first.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer compresses data written to it as a zstd stream. Writes are buffered internally; callers must call Writer.Close to flush any remaining data and write the frame trailer.

func NewWriter

func NewWriter(w io.Writer) *Writer

NewWriter returns a new Writer compressing data at the default level. If w is nil the Writer may only be used with [Writer.AppendTo]; call Writer.Reset before streaming.

func (*Writer) AddDict

func (w *Writer) AddDict(d *Dict)

AddDict registers a parsed dictionary for compression.

func (*Writer) AppendCompress

func (w *Writer) AppendCompress(dst, src []byte) []byte

AppendCompress compresses src as a single zstd frame and appends the result to dst, returning the extended buffer. It is a one-shot alternative to the streaming Write/Close interface.

The Writer's current level, dictionary, CRC, and window-size settings apply. The returned frame is self-contained: it includes a frame header, one or more blocks, and an optional checksum.

Passing nil for dst allocates a new slice. Passing a non-nil dst (e.g. buf[:0]) lets the caller reuse memory. Multiple calls may be made to concatenate frames:

var frames []byte
frames = w.AppendCompress(part1, frames)
frames = w.AppendCompress(part2, frames)

If src is nil or empty, a minimal valid frame is appended to dst.

AppendTo must not be called concurrently with other Writer methods, but successive calls on the same Writer are safe without Reset.

Example
src := []byte("One-shot compression is the simplest API.")

w := NewWriter(nil)
compressed := w.AppendCompress(nil, src)

r, err := NewReader(bytes.NewReader(nil))
if err != nil {
	log.Fatal(err)
}
defer r.Close()
decompressed, err := r.AppendDecompress(nil, compressed)
if err != nil {
	log.Fatal(err)
}
fmt.Println(string(decompressed))
Output:
One-shot compression is the simplest API.

func (*Writer) Close

func (w *Writer) Close() error

Close flushes any remaining data, writes the frame trailer (and optional checksum), and releases encoder resources. After Close, the Writer must be Writer.Reset before it can be used again.

func (*Writer) Flush

func (w *Writer) Flush() error

Flush writes any buffered data to the underlying writer as a compressed block. It does not write the frame trailer; use Writer.Close to finalize the frame.

Example
var buf bytes.Buffer
w := NewWriter(&buf)

if _, err := w.Write([]byte("first part.")); err != nil {
	log.Fatal(err)
}
if err := w.Flush(); err != nil {
	log.Fatal(err)
}

if _, err := w.Write([]byte("second part.")); err != nil {
	log.Fatal(err)
}
if err := w.Close(); err != nil {
	log.Fatal(err)
}

r, err := NewReader(&buf)
if err != nil {
	log.Fatal(err)
}
defer r.Close()
if _, err := io.Copy(os.Stdout, r); err != nil {
	log.Fatal(err)
}
Output:
first part.second part.

func (*Writer) ReadFrom

func (w *Writer) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads data from r until io.EOF and compresses it to the underlying writer. It implements io.ReaderFrom.

Example
input := strings.NewReader("ReadFrom compresses data from an io.Reader efficiently.")
var buf bytes.Buffer
w := NewWriter(&buf)

if _, err := w.ReadFrom(input); err != nil {
	log.Fatal(err)
}
if err := w.Close(); err != nil {
	log.Fatal(err)
}

r, err := NewReader(&buf)
if err != nil {
	log.Fatal(err)
}
defer r.Close()
if _, err := io.Copy(os.Stdout, r); err != nil {
	log.Fatal(err)
}
Output:
ReadFrom compresses data from an io.Reader efficiently.

func (*Writer) Reset

func (w *Writer) Reset(wr io.Writer)

Reset discards the Writer's state and prepares it to write a new frame to wr. Configuration (level, window size, CRC, dictionary) is preserved.

func (*Writer) ResetContentSize

func (w *Writer) ResetContentSize(wr io.Writer, size int64)

ResetContentSize resets the Writer for a new stream to wr and records the uncompressed content size in the frame header. If size is negative the content size is omitted.

func (*Writer) SetCRC

func (w *Writer) SetCRC(b bool)

SetCRC controls whether the writer appends a xxHash-64 checksum to each frame. The default is true.

func (*Writer) SetLevel

func (w *Writer) SetLevel(level int) error

SetLevel sets the compression level. Valid values range from NoCompression (0) to BestCompression (9); DefaultCompression (-1) selects level 3. SetLevel must be called before Writer.Reset or Writer.Write to take effect.

Example
data := []byte(strings.Repeat("the quick brown fox jumps over the lazy dog. ", 100))

compress := func(level int) []byte {
	w := NewWriter(nil)
	if err := w.SetLevel(level); err != nil {
		log.Fatal(err)
	}
	return w.AppendCompress(nil, data)
}

fast := compress(BestSpeed)
best := compress(BestCompression)

r, err := NewReader(bytes.NewReader(nil))
if err != nil {
	log.Fatal(err)
}
defer r.Close()

dec, err := r.AppendDecompress(nil, fast)
if err != nil || string(dec) != string(data) {
	log.Fatal("BestSpeed mismatch")
}
fmt.Println("BestSpeed: OK")

dec, err = r.AppendDecompress(nil, best)
if err != nil || string(dec) != string(data) {
	log.Fatal("BestCompression mismatch")
}
fmt.Println("BestCompression: OK")
fmt.Println("BestCompression <= BestSpeed:", len(best) <= len(fast))
Output:
BestSpeed: OK
BestCompression: OK
BestCompression <= BestSpeed: true

func (*Writer) SetLowMemory

func (w *Writer) SetLowMemory(b bool)

SetLowMemory controls whether the encoder should trade speed for lower memory usage.

func (*Writer) SetRawDict

func (w *Writer) SetRawDict(b []byte)

SetRawDict registers raw bytes as a dictionary prefix.

Example
dict := []byte("the quick brown fox jumps over the lazy dog")
data := []byte("the quick brown fox leaps over the sleepy dog")

compressWithDict := func(d []byte) []byte {
	w := NewWriter(nil)
	if d != nil {
		w.SetRawDict(d)
	}
	return w.AppendCompress(nil, data)
}

without := compressWithDict(nil)
with := compressWithDict(dict)

r, err := NewReader(bytes.NewReader(nil))
if err != nil {
	log.Fatal(err)
}
defer r.Close()

dec, err := r.AppendDecompress(nil, without)
if err != nil || string(dec) != string(data) {
	log.Fatal("without dict mismatch")
}
fmt.Println("without dict: OK")

r.SetRawDict(dict)
dec, err = r.AppendDecompress(nil, with)
if err != nil || string(dec) != string(data) {
	log.Fatal("with dict mismatch")
}
fmt.Println("with dict: OK")
fmt.Println("dict smaller:", len(with) < len(without))
Output:
without dict: OK
with dict: OK
dict smaller: true

func (*Writer) SetWindowSize

func (w *Writer) SetWindowSize(n int) error

SetWindowSize overrides the window size for compression. This allows limiting memory usage both for compression and decompression.

n must be in the range [MinWindowSize, MaxWindowSize].

func (*Writer) Write

func (w *Writer) Write(p []byte) (n int, err error)

Write compresses p and writes it to the underlying writer. The compressed bytes are not necessarily flushed until Writer.Close or Writer.Flush is called.

Directories

Path Synopsis
internal
fse
Package fse provides Finite State Entropy encoding.
Package fse provides Finite State Entropy encoding.
huff0
Package huff0 implements Huffman entropy coding for zstd.
Package huff0 implements Huffman entropy coding for zstd.
le
Package le provides little endian loading and storing.
Package le provides little endian loading and storing.
xxhash
Package xxhash implements the 64-bit xxHash algorithm (XXH64).
Package xxhash implements the 64-bit xxHash algorithm (XXH64).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL