zstd

package module
v0.0.0-...-a55d1ff Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 11, 2021 License: BSD-3-Clause Imports: 20 Imported by: 0

README

zstd-go

zstd-go is zstd compression in pure Go based on klauspost's version with these changes:

  1. removing concurrent Encoding/Decoding
  2. using 3rd pkg xxhash directly

Documentation

Overview

Package zstd provides decompression of zstandard files.

For advanced usage and examples, go to the README: https://github.com/klauspost/compress/tree/master/zstd#zstd

Index

Examples

Constants

View Source
const (
	// MinWindowSize is the minimum Window Size, which is 1 KB.
	MinWindowSize = 1 << 10

	// MaxWindowSize is the maximum encoder window size
	// and the default decoder maximum window size.
	MaxWindowSize = 1 << 29
)
View Source
const HeaderMaxSize = 14 + 3

HeaderMaxSize is the maximum size of a Frame and Block Header. If less is sent to Header.Decode it *may* still contain enough information.

View Source
const ZipMethodPKWare = 20

ZipMethodPKWare is the original method number used by PKWARE to indicate Zstandard compression. Deprecated: This has been deprecated by PKWARE, use ZipMethodWinZip instead for compression. See https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.9.TXT

View Source
const ZipMethodWinZip = 93

ZipMethodWinZip is the method for Zstandard compressed data inside Zip files for WinZip. See https://www.winzip.com/win/en/comp_info.html

Variables

View Source
var (
	// ErrSnappyCorrupt reports that the input is invalid.
	ErrSnappyCorrupt = errors.New("snappy: corrupt input")
	// ErrSnappyTooLarge reports that the uncompressed length is too large.
	ErrSnappyTooLarge = errors.New("snappy: decoded block is too large")
	// ErrSnappyUnsupported reports that the input isn't supported.
	ErrSnappyUnsupported = errors.New("snappy: unsupported input")
)
View Source
var (
	// ErrReservedBlockType is returned when a reserved block type is found.
	// Typically this indicates wrong or corrupted input.
	ErrReservedBlockType = errors.New("invalid input: reserved block type encountered")

	// ErrCompressedSizeTooBig is returned when a block is bigger than allowed.
	// Typically this indicates wrong or corrupted input.
	ErrCompressedSizeTooBig = errors.New("invalid input: compressed size too big")

	// ErrBlockTooSmall is returned when a block is too small to be decoded.
	// Typically returned on invalid input.
	ErrBlockTooSmall = errors.New("block too small")

	// ErrMagicMismatch is returned when a "magic" number isn't what is expected.
	// Typically this indicates wrong or corrupted input.
	ErrMagicMismatch = errors.New("invalid input: magic number mismatch")

	// ErrWindowSizeExceeded is returned when a reference exceeds the valid window size.
	// Typically this indicates wrong or corrupted input.
	ErrWindowSizeExceeded = errors.New("window size exceeded")

	// ErrWindowSizeTooSmall is returned when no window size is specified.
	// Typically this indicates wrong or corrupted input.
	ErrWindowSizeTooSmall = errors.New("invalid input: window size was too small")

	// ErrDecoderSizeExceeded is returned if decompressed size exceeds the configured limit.
	ErrDecoderSizeExceeded = errors.New("decompressed size exceeds configured limit")

	// ErrUnknownDictionary is returned if the dictionary ID is unknown.
	// For the time being dictionaries are not supported.
	ErrUnknownDictionary = errors.New("unknown dictionary")

	// ErrFrameSizeExceeded is returned if the stated frame size is exceeded.
	// This is only returned if SingleSegment is specified on the frame.
	ErrFrameSizeExceeded = errors.New("frame size exceeded")

	// ErrCRCMismatch is returned if CRC mismatches.
	ErrCRCMismatch = errors.New("CRC check failed")

	// ErrDecoderClosed will be returned if the Decoder was used after
	// Close has been called.
	ErrDecoderClosed = errors.New("decoder used after Close")

	// ErrDecoderNilInput is returned when a nil Reader was provided
	// and an operation other than Reset/DecodeAll/Close was attempted.
	ErrDecoderNilInput = errors.New("nil input provided as reader")
)

Functions

func ZipCompressor

func ZipCompressor(opts ...EOption) func(w io.Writer) (io.WriteCloser, error)

ZipCompressor returns a compressor that can be registered with zip libraries. The provided encoder options will be used on all encodes.

Example
// Get zstandard de/compressors for zip.
// These can be used by multiple readers and writers.
compr := zstd.ZipCompressor(zstd.WithWindowSize(1<<20), zstd.WithEncoderCRC(false))
decomp := zstd.ZipDecompressor()

// Try it out...
var buf bytes.Buffer
zw := zip.NewWriter(&buf)
zw.RegisterCompressor(zstd.ZipMethodWinZip, compr)
zw.RegisterCompressor(zstd.ZipMethodPKWare, compr)

// Create 1MB data
tmp := make([]byte, 1<<20)
for i := range tmp {
	tmp[i] = byte(i)
}
w, err := zw.CreateHeader(&zip.FileHeader{
	Name:   "file1.txt",
	Method: zstd.ZipMethodWinZip,
})
if err != nil {
	panic(err)
}
w.Write(tmp)

// Another...
w, err = zw.CreateHeader(&zip.FileHeader{
	Name:   "file2.txt",
	Method: zstd.ZipMethodPKWare,
})
w.Write(tmp)
zw.Close()

zr, err := zip.NewReader(bytes.NewReader(buf.Bytes()), int64(buf.Len()))
if err != nil {
	panic(err)
}
zr.RegisterDecompressor(zstd.ZipMethodWinZip, decomp)
zr.RegisterDecompressor(zstd.ZipMethodPKWare, decomp)
for _, file := range zr.File {
	rc, err := file.Open()
	if err != nil {
		panic(err)
	}
	b, err := ioutil.ReadAll(rc)
	rc.Close()
	if bytes.Equal(b, tmp) {
		fmt.Println(file.Name, "ok")
	} else {
		fmt.Println(file.Name, "mismatch")
	}
}
Output:

file1.txt ok
file2.txt ok

func ZipDecompressor

func ZipDecompressor() func(r io.Reader) io.ReadCloser

ZipDecompressor returns a decompressor that can be registered with zip libraries. See ZipCompressor for example.

Types

type DOption

type DOption func(*decoderOptions) error

DOption is an option for creating a decoder.

func WithDecoderConcurrency

func WithDecoderConcurrency(n int) DOption

WithDecoderConcurrency will set the concurrency, meaning the maximum number of decoders to run concurrently. The value supplied must be at least 1. By default this will be set to GOMAXPROCS. Deprecated.

func WithDecoderDicts

func WithDecoderDicts(dicts ...[]byte) DOption

WithDecoderDicts allows to register one or more dictionaries for the decoder. If several dictionaries with the same ID is provided the last one will be used.

func WithDecoderLowmem

func WithDecoderLowmem(b bool) DOption

WithDecoderLowmem will set whether to use a lower amount of memory, but possibly have to allocate more while running.

func WithDecoderMaxMemory

func WithDecoderMaxMemory(n uint64) DOption

WithDecoderMaxMemory allows to set a maximum decoded size for in-memory non-streaming operations or maximum window size for streaming operations. This can be used to control memory usage of potentially hostile content. Maximum and default is 1 << 63 bytes.

func WithDecoderMaxWindow

func WithDecoderMaxWindow(size uint64) DOption

WithDecoderMaxWindow allows to set a maximum window size for decodes. This allows rejecting packets that will cause big memory usage. The Decoder will likely allocate more memory based on the WithDecoderLowmem setting. If WithDecoderMaxMemory is set to a lower value, that will be used. Default is 512MB, Maximum is ~3.75 TB as per zstandard spec.

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder provides decoding of zstandard streams. The decoder has been designed to operate without allocations after a warmup. This means that you should store the decoder for best performance. To re-use a stream decoder, use the Reset(r io.Reader) error to switch to another stream. A decoder can safely be re-used even if the previous stream failed. To release the resources, you must call the Close() function on a decoder.

func NewReader

func NewReader(r io.Reader, opts ...DOption) (*Decoder, error)

NewReader creates a new decoder. A nil Reader can be provided in which case Reset can be used to start a decode.

A Decoder can be used in two modes:

1) As a stream, or 2) For stateless decoding using DecodeAll.

Only a single stream can be decoded concurrently, but the same decoder can run multiple concurrent stateless decodes. It is even possible to use stateless decodes while a stream is being decoded.

The Reset function can be used to initiate a new stream, which is will considerably reduce the allocations normally caused by NewReader.

func (*Decoder) Close

func (d *Decoder) Close()

Close will release all resources. It is NOT possible to reuse the decoder after this.

func (*Decoder) DecodeAll

func (d *Decoder) DecodeAll(input, dst []byte) ([]byte, error)

DecodeAll allows stateless decoding of a blob of bytes. Output will be appended to dst, so if the destination size is known you can pre-allocate the destination slice to avoid allocations. DecodeAll can be used concurrently. The Decoder concurrency limits will be respected.

func (*Decoder) IOReadCloser

func (d *Decoder) IOReadCloser() io.ReadCloser

IOReadCloser returns the decoder as an io.ReadCloser for convenience. Any changes to the decoder will be reflected, so the returned ReadCloser can be reused along with the decoder. io.WriterTo is also supported by the returned ReadCloser.

func (*Decoder) Read

func (d *Decoder) Read(p []byte) (int, error)

Read bytes from the decompressed stream into p. Returns the number of bytes written and any error that occurred. When the stream is done, io.EOF will be returned.

func (*Decoder) Reset

func (d *Decoder) Reset(r io.Reader) error

Reset will reset the decoder the supplied stream after the current has finished processing. Note that this functionality cannot be used after Close has been called. Reset can be called with a nil reader to release references to the previous reader. After being called with a nil reader, no other operations than Reset or DecodeAll or Close should be used.

func (*Decoder) WriteTo

func (d *Decoder) WriteTo(w io.Writer) (int64, error)

WriteTo writes data to w until there's no more data to write or when an error occurs. The return value n is the number of bytes written. Any error encountered during the write is also returned.

type EOption

type EOption func(*encoderOptions) error

EOption is an option for creating a encoder.

func WithAllLitEntropyCompression

func WithAllLitEntropyCompression(b bool) EOption

WithAllLitEntropyCompression will apply entropy compression if no matches are found. Disabling this will skip incompressible data faster, but in cases with no matches but skewed character distribution compression is lost. Default value depends on the compression level selected.

func WithEncoderCRC

func WithEncoderCRC(b bool) EOption

WithEncoderCRC will add CRC value to output. Output will be 4 bytes larger.

func WithEncoderConcurrency

func WithEncoderConcurrency(n int) EOption

WithEncoderConcurrency will set the concurrency, meaning the maximum number of encoders to run concurrently. The value supplied must be at least 1. By default this will be set to GOMAXPROCS. Deprecated.

func WithEncoderDict

func WithEncoderDict(dict []byte) EOption

WithEncoderDict allows to register a dictionary that will be used for the encode. The encoder *may* choose to use no dictionary instead for certain payloads.

func WithEncoderLevel

func WithEncoderLevel(l EncoderLevel) EOption

WithEncoderLevel specifies a predefined compression level.

func WithEncoderPadding

func WithEncoderPadding(n int) EOption

WithEncoderPadding will add padding to all output so the size will be a multiple of n. This can be used to obfuscate the exact output size or make blocks of a certain size. The contents will be a skippable frame, so it will be invisible by the decoder. n must be > 0 and <= 1GB, 1<<30 bytes. The padded area will be filled with data from crypto/rand.Reader. If `EncodeAll` is used with data already in the destination, the total size will be multiple of this.

func WithLowerEncoderMem

func WithLowerEncoderMem(b bool) EOption

WithLowerEncoderMem will trade in some memory cases trade less memory usage for slower encoding speed. This will not change the window size which is the primary function for reducing memory usage. See WithWindowSize.

func WithNoEntropyCompression

func WithNoEntropyCompression(b bool) EOption

WithNoEntropyCompression will always skip entropy compression of literals. This can be useful if content has matches, but unlikely to benefit from entropy compression. Usually the slight speed improvement is not worth enabling this.

func WithSingleSegment

func WithSingleSegment(b bool) EOption

WithSingleSegment will set the "single segment" flag when EncodeAll is used. If this flag is set, data must be regenerated within a single continuous memory segment. In this case, Window_Descriptor byte is skipped, but Frame_Content_Size is necessarily present. As a consequence, the decoder must allocate a memory segment of size equal or larger than size of your content. In order to preserve the decoder from unreasonable memory requirements, a decoder is allowed to reject a compressed frame which requests a memory size beyond decoder's authorized range. For broader compatibility, decoders are recommended to support memory sizes of at least 8 MB. This is only a recommendation, each decoder is free to support higher or lower limits, depending on local limitations. If this is not specified, block encodes will automatically choose this based on the input size. This setting has no effect on streamed encodes.

func WithWindowSize

func WithWindowSize(n int) EOption

WithWindowSize will set the maximum allowed back-reference distance. The value must be a power of two between MinWindowSize and MaxWindowSize. A larger value will enable better compression but allocate more memory and, for above-default values, take considerably longer. The default value is determined by the compression level.

func WithZeroFrames

func WithZeroFrames(b bool) EOption

WithZeroFrames will encode 0 length input as full frames. This can be needed for compatibility with zstandard usage, but is not needed for this package.

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

Encoder provides encoding to Zstandard. An Encoder can be used for either compressing a stream via the io.WriteCloser interface supported by the Encoder or as multiple independent tasks via the EncodeAll function. Smaller encodes are encouraged to use the EncodeAll function. Use NewWriter to create a new instance.

func NewWriter

func NewWriter(w io.Writer, opts ...EOption) (*Encoder, error)

NewWriter will create a new Zstandard encoder. If the encoder will be used for encoding blocks a nil writer can be used.

func (*Encoder) Close

func (e *Encoder) Close() error

Close will flush the final output and close the stream. The function will block until everything has been written. The Encoder can still be re-used after calling this.

func (*Encoder) EncodeAll

func (e *Encoder) EncodeAll(src, dst []byte) []byte

EncodeAll will encode all input in src and append it to dst. This function can be called concurrently, but each call will only run on a single goroutine. If empty input is given, nothing is returned, unless WithZeroFrames is specified. Encoded blocks can be concatenated and the result will be the combined input stream. Data compressed with EncodeAll can be decoded with the Decoder, using either a stream or DecodeAll.

func (*Encoder) Flush

func (e *Encoder) Flush() error

Flush will send the currently written data to output and block until everything has been written. This should only be used on rare occasions where pushing the currently queued data is critical.

func (*Encoder) ReadFrom

func (e *Encoder) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads data from r until EOF or error. The return value n is the number of bytes read. Any error except io.EOF encountered during the read is also returned.

The Copy function uses ReaderFrom if available.

func (*Encoder) Reset

func (e *Encoder) Reset(w io.Writer)

Reset will re-initialize the writer and new writes will encode to the supplied writer as a new, independent stream.

func (*Encoder) ResetContentSize

func (e *Encoder) ResetContentSize(w io.Writer, size int64)

ResetContentSize will reset and set a content size for the next stream. If the bytes written does not match the size given an error will be returned when calling Close(). This is removed when Reset is called. Sizes <= 0 results in no content size set.

func (*Encoder) Write

func (e *Encoder) Write(p []byte) (n int, err error)

Write data to the encoder. Input data will be buffered and as the buffer fills up content will be compressed and written to the output. When done writing, use Close to flush the remaining output and write CRC if requested.

type EncoderLevel

type EncoderLevel int

EncoderLevel predefines encoder compression levels. Only use the constants made available, since the actual mapping of these values are very likely to change and your compression could change unpredictably when upgrading the library.

const (

	// SpeedFastest will choose the fastest reasonable compression.
	// This is roughly equivalent to the fastest Zstandard mode.
	SpeedFastest EncoderLevel

	// SpeedDefault is the default "pretty fast" compression option.
	// This is roughly equivalent to the default Zstandard mode (level 3).
	SpeedDefault

	// SpeedBetterCompression will yield better compression than the default.
	// Currently it is about zstd level 7-8 with ~ 2x-3x the default CPU usage.
	// By using this, notice that CPU usage may go up in the future.
	SpeedBetterCompression

	// SpeedBestCompression will choose the best available compression option.
	// This will offer the best compression no matter the CPU cost.
	SpeedBestCompression
)

func EncoderLevelFromString

func EncoderLevelFromString(s string) (bool, EncoderLevel)

EncoderLevelFromString will convert a string representation of an encoding level back to a compression level. The compare is not case sensitive. If the string wasn't recognized, (false, SpeedDefault) will be returned.

func EncoderLevelFromZstd

func EncoderLevelFromZstd(level int) EncoderLevel

EncoderLevelFromZstd will return an encoder level that closest matches the compression ratio of a specific zstd compression level. Many input values will provide the same compression level.

func (EncoderLevel) String

func (e EncoderLevel) String() string

String provides a string representation of the compression level.

type Header struct {
	// Window Size the window of data to keep while decoding.
	// Will only be set if HasFCS is false.
	WindowSize uint64

	// Frame content size.
	// Expected size of the entire frame.
	FrameContentSize uint64

	// Dictionary ID.
	// If 0, no dictionary.
	DictionaryID uint32

	// First block information.
	FirstBlock struct {
		// OK will be set if first block could be decoded.
		OK bool

		// Is this the last block of a frame?
		Last bool

		// Is the data compressed?
		// If true CompressedSize will be populated.
		// Unfortunately DecompressedSize cannot be determined
		// without decoding the blocks.
		Compressed bool

		// DecompressedSize is the expected decompressed size of the block.
		// Will be 0 if it cannot be determined.
		DecompressedSize int

		// CompressedSize of the data in the block.
		// Does not include the block header.
		// Will be equal to DecompressedSize if not Compressed.
		CompressedSize int
	}

	// Skippable will be true if the frame is meant to be skipped.
	// No other information will be populated.
	Skippable bool

	// If set there is a checksum present for the block content.
	HasCheckSum bool

	// If this is true FrameContentSize will have a valid value
	HasFCS bool

	SingleSegment bool
}

Header contains information about the first frame and block within that.

func (*Header) Decode

func (h *Header) Decode(in []byte) error

Decode the header from the beginning of the stream. This will decode the frame header and the first block header if enough bytes are provided. It is recommended to provide at least HeaderMaxSize bytes. If the frame header cannot be read an error will be returned. If there isn't enough input, io.ErrUnexpectedEOF is returned. The FirstBlock.OK will indicate if enough information was available to decode the first block header.

type SnappyConverter

type SnappyConverter struct {
	// contains filtered or unexported fields
}

SnappyConverter can read SnappyConverter-compressed streams and convert them to zstd. Conversion is done by converting the stream directly from Snappy without intermediate full decoding. Therefore the compression ratio is much less than what can be done by a full decompression and compression, and a faulty Snappy stream may lead to a faulty Zstandard stream without any errors being generated. No CRC value is being generated and not all CRC values of the Snappy stream are checked. However, it provides really fast recompression of Snappy streams. The converter can be reused to avoid allocations, even after errors.

func (*SnappyConverter) Convert

func (r *SnappyConverter) Convert(in io.Reader, w io.Writer) (int64, error)

Convert the Snappy stream supplied in 'in' and write the zStandard stream to 'w'. If any error is detected on the Snappy stream it is returned. The number of bytes written is returned.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL