lz

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2023 License: BSD-3-Clause Imports: 11 Imported by: 2

README

Module LZ

The LZ module provides sequencers that convert byte streams into blocks of Lempel-Ziv 77 sequences. It is designed to support multiple compression methods that differ in the way they are encoding those LZ77 sequences.

Documentation

Overview

Package lz supports encoding and decoding of LZ77 sequences. A sequence, as described in the Zstandard specification, consists of a literal copy command followed by a match copy command. The literal copy command is described by the length in literal bytes to be copied and the match command consists of the distance of the match to copy and the length of the match in bytes.

A Sequencer is an encoder that converts a byte stream into blocks of sequences. A Decoder converts the block of sequences into the original decompressed byte stream. We provide a Sequencer interface only supporting the Sequence interface.

The actual basic Sequencer provided by the package support the SeqBuffer interface, which has methods for writing and reading from the buffer. A pure Sequencer is provided by the [Wrap function.

The module provides multiple sequencer implementations that provide different combinations of encoding speed and compression ratios. Usually a slower sequencer will generate a better compression ratio.

The Decoder slides the decompression window through a larger buffer implemented by DecBuffer.

The library supports the implementation of Sequencers outside the package that can then be used by real compressors as provided by the github.com/ulikunitz/xz module.

Index

Constants

View Source
const (
	// NoTrailingLiterals tells a sequencer that trailing literals don't
	// need to be included in the block.
	NoTrailingLiterals = 1 << iota
)

Flags for the sequence function stored in the block structure.

Variables

View Source
var (
	ErrOutOfBuffer = errors.New("lz: offset out of buffer")
	ErrEndOfBuffer = errors.New("lz: end of buffer")
)

Errors returned by SeqBuffer.ReadAt

View Source
var ErrEmptyBuffer = errors.New("lz: no more data in buffer")

ErrEmptyBuffer indicates that no more data is available in the buffer. It will be returned by the Sequence method of Sequencer.

View Source
var ErrFullBuffer = errors.New("lz: buffer is full")

ErrFullBuffer indicates that the buffer is full. It will be returned by the Write and ReadFrom methods of the Sequencer.

Functions

func XZCost added in v0.2.0

func XZCost(m, o uint32) uint64

XZCost models the cost of the bits going into the XZ encoding. The maximum edge length is 273.

Types

type BDHSConfig

type BDHSConfig struct {
	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int

	InputLen1 int
	HashBits1 int
	InputLen2 int
	HashBits2 int
}

BDHSConfig provides the configuration parameters for the backward-looking double Hash Sequencer.

func (*BDHSConfig) BufConfig added in v0.2.1

func (cfg *BDHSConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*BDHSConfig) MarshalJSON added in v0.2.1

func (cfg *BDHSConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "BDHS" to the structure.

func (BDHSConfig) NewSequencer added in v0.1.0

func (cfg BDHSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer creates a new DoubleHashSequencer.

func (*BDHSConfig) SetBufConfig added in v0.2.1

func (cfg *BDHSConfig) SetBufConfig(bc BufConfig)

func (*BDHSConfig) SetDefaults added in v0.2.1

func (cfg *BDHSConfig) SetDefaults()

SetDefaults uses the defaults for the configuration parameters that are set to zero.

func (*BDHSConfig) Verify

func (cfg *BDHSConfig) Verify() error

Verify checks the configuration for errors.

type BHSConfig

type BHSConfig struct {
	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int

	InputLen int
	HashBits int
}

BHSConfig provides the parameters for the backward hash sequencer.

func (*BHSConfig) BufConfig added in v0.2.1

func (cfg *BHSConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*BHSConfig) MarshalJSON added in v0.2.1

func (cfg *BHSConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "BHS" to the structure.

func (BHSConfig) NewSequencer added in v0.1.0

func (cfg BHSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer creates a new Backward Hash Sequencer.

func (*BHSConfig) SetBufConfig added in v0.2.1

func (cfg *BHSConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the backward hash sequencer configuration.

func (*BHSConfig) SetDefaults added in v0.2.1

func (cfg *BHSConfig) SetDefaults()

SetDefaults sets values that are zero to their defaults values.

func (*BHSConfig) Verify

func (cfg *BHSConfig) Verify() error

Verify checks the configuration for correctness.

type BUHSConfig added in v0.1.1

type BUHSConfig struct {
	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int

	InputLen   int
	HashBits   int
	BucketSize int
}

BUHSConfig provides the configuration parameters for the bucket hash sequencer.

func (*BUHSConfig) BufConfig added in v0.2.1

func (cfg *BUHSConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*BUHSConfig) MarshalJSON added in v0.2.1

func (cfg *BUHSConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "BUHS" to the structure.

func (BUHSConfig) NewSequencer added in v0.1.1

func (cfg BUHSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer creates a new hash sequencer.

func (*BUHSConfig) SetBufConfig added in v0.2.1

func (cfg *BUHSConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the sequencer configuration.

func (*BUHSConfig) SetDefaults added in v0.2.1

func (cfg *BUHSConfig) SetDefaults()

SetDefaults sets values that are zero to their defaults values.

func (*BUHSConfig) Verify added in v0.1.1

func (cfg *BUHSConfig) Verify() error

Verify checks the config for correctness.

type Block

type Block struct {
	Sequences []Seq
	Literals  []byte
}

Block stores sequences and literals. Note that the sequences stores in the Sequences slice might not consume the whole Literals slice. They must be added to the decoded text after all the sequences have been decoded and their content added to the decoder buffer.

func (*Block) Len

func (b *Block) Len() int64

Len computes the length of the block in bytes. It assumes that the sum of the literal lengths in the sequences doesn't exceed that length of the Literals byte slice.

type BufConfig added in v0.2.0

type BufConfig struct {
	ShrinkSize int
	BufferSize int

	WindowSize int
	BlockSize  int
}

BufConfig describes the various sizes relevant for the buffer. Note that ShrinkSize should be significantly smaller than BufferSize at most 50%. The WindowSize is independent of the BufferSize, but usually the BufferSize should be larger or equal the WindowSize. A typical BlockSize for instance for the ZStandard compression is 128 kByte and limits the largest match len.

func (*BufConfig) BufferConfig added in v0.2.0

func (cfg *BufConfig) BufferConfig() BufConfig

BufferConfig returns itself, which will be used by the structures embedding the value.

func (*BufConfig) SetDefaults added in v0.2.1

func (cfg *BufConfig) SetDefaults()

SetDefaults sets the defaults for the various size values. The defaults are given below.

BufferSize:   8 MiB
ShrinkSize:  32 KiB (or half of BufferSize, if it is smaller than 64 KiB)
WindowSize: BufferSize
BlockSize:  128 KiB

func (*BufConfig) Verify added in v0.2.0

func (cfg *BufConfig) Verify() error

Verify checks the buffer configuration. Note that window size and block size are independent of the rest of the other sizes only the shrink size must be less than the buffer size.

type DHSConfig

type DHSConfig struct {
	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int

	InputLen1 int
	HashBits1 int
	InputLen2 int
	HashBits2 int
}

DHSConfig provides the configuration parameters for the DoubleHashSequencer.

func (*DHSConfig) BufConfig added in v0.2.1

func (cfg *DHSConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*DHSConfig) MarshalJSON added in v0.2.1

func (cfg *DHSConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "DHS" to the structure.

func (DHSConfig) NewSequencer added in v0.1.0

func (cfg DHSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer creates a new DoubleHashSequencer.

func (*DHSConfig) SetBufConfig added in v0.2.1

func (cfg *DHSConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the sequencer configuration.

func (*DHSConfig) SetDefaults added in v0.2.1

func (cfg *DHSConfig) SetDefaults()

SetDefaults uses the defaults for the configuration parameters that are set to zero.

func (*DHSConfig) Verify

func (cfg *DHSConfig) Verify() error

Verify checks the configuration for errors.

type DecBuffer added in v0.1.1

type DecBuffer struct {
	// Data is the actual buffer. The end of the slice is also the head of
	// the dictionary window.
	Data []byte
	// R tracks the position of the reads from the buffer and must be less
	// or equal the length of the Data slice.
	R int
	// Off records the total offset and marks the end of the Data slice,
	// which is also the end of the dictionary window.
	Off int64

	// DecConfig provides the configuration parameters WindowSize and
	// BufferSize.
	DecConfig
}

DecBuffer provides a simple buffer for the decoding of LZ77 sequences.

func (*DecBuffer) ByteAtEnd added in v0.1.1

func (b *DecBuffer) ByteAtEnd(off int) byte

ByteAtEnd returns byte at end of the buffer

func (*DecBuffer) Init added in v0.1.1

func (b *DecBuffer) Init(cfg DecConfig) error

Init initializes the DecBuffer value.

func (*DecBuffer) Read added in v0.1.1

func (b *DecBuffer) Read(p []byte) (n int, err error)

Read reads decoded data from the buffer.

func (*DecBuffer) Reset added in v0.1.1

func (b *DecBuffer) Reset()

Reset puts the DecBuffer back to the initialized status.

func (*DecBuffer) Write added in v0.1.1

func (b *DecBuffer) Write(p []byte) (n int, err error)

Write puts the slice into the buffer. The method will write the slice only fully or will return 0, ErrFullBuffer.

func (*DecBuffer) WriteBlock added in v0.1.1

func (b *DecBuffer) WriteBlock(blk Block) (n, k, l int, err error)

WriteBlock writes sequences from the block into the buffer. A single sequence will be written in an atomic manner, because the block value will not be modified. If there is not enough space in the buffer ErrFullBuffer will be returned.

We are not limiting the growth of the array to BufferSize. We may consume more memory but we are faster.

The return values n, k and l provide the number of bytes written into the buffer, the number of sequences as well as the number of literals.

func (*DecBuffer) WriteByte added in v0.1.1

func (b *DecBuffer) WriteByte(c byte) error

WriteByte writes a single byte into the buffer.

func (*DecBuffer) WriteMatch added in v0.1.1

func (b *DecBuffer) WriteMatch(m, o uint32) (n int, err error)

WriteMatch puts the match at the end of the buffer. The match will only be written completely or n=0 and ErrFullBuffer will be returned.

func (*DecBuffer) WriteTo added in v0.1.1

func (b *DecBuffer) WriteTo(w io.Writer) (n int64, err error)

WriteTo writes the decoded data to the writer.

type DecConfig added in v0.2.1

type DecConfig struct {
	// Size of the sliding dictionary window in bytes.
	WindowSize int
	// Maximum size of the buffer in bytes.
	BufferSize int
}

DecConfig contains the parameters for the DecBuffer and Decoder types. The WindowSize must be smaller than the BufferSize. It is recommended to set the BufferSize twice as large as the WindowSize.

func (*DecConfig) SetDefaults added in v0.2.1

func (cfg *DecConfig) SetDefaults()

SetDefaults sets the zero values in DecConfig to default values. Note that the default BufferSize is twice the WindowSize.

func (*DecConfig) Verify added in v0.2.1

func (cfg *DecConfig) Verify() error

Verify checks the parameters of the DecConfig value and returns an error for the first problem.

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder decodes LZ77 sequences and writes them into the writer.

func NewDecoder

func NewDecoder(w io.Writer, cfg DecConfig) (*Decoder, error)

NewDecoder creates a new decoder. The first issue with the configuration will be reported.

func (*Decoder) Flush

func (d *Decoder) Flush() error

Flush writes all remaining data in the buffer to the underlying writer.

func (*Decoder) Init

func (d *Decoder) Init(w io.Writer, cfg DecConfig) error

Init initializes the decoder. The first issue of the configuration value will be reported as error.

func (*Decoder) Reset

func (d *Decoder) Reset(w io.Writer)

Reset initializes the decoder with a new io.Writer.

func (*Decoder) Write

func (d *Decoder) Write(p []byte) (n int, err error)

Write writes the slice into the buffer.

func (*Decoder) WriteBlock

func (d *Decoder) WriteBlock(blk Block) (n, k, l int, err error)

WriteBlock writes the block into the decoder. It returns the number n of bytes, the number k of sequencers and the number l of literal bytes written to the decoder.

func (*Decoder) WriteByte added in v0.2.1

func (d *Decoder) WriteByte(c byte) error

WriteByte writes a single byte into the decoder.

type GSASConfig

type GSASConfig struct {
	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int

	// minimum match len
	MinMatchLen int
}

GSASConfig defines the configuration parameter for the greedy suffix array sequencer.

func (*GSASConfig) BufConfig added in v0.2.1

func (cfg *GSASConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*GSASConfig) MarshalJSON added in v0.2.1

func (cfg *GSASConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "GSAS" to the structure.

func (GSASConfig) NewSequencer added in v0.1.0

func (cfg GSASConfig) NewSequencer() (s Sequencer, err error)

NewSequencer generates a new sequencer using the configuration parameters in the structure.

func (*GSASConfig) SetBufConfig added in v0.2.1

func (cfg *GSASConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the sequencer configuration.

func (*GSASConfig) SetDefaults added in v0.2.1

func (cfg *GSASConfig) SetDefaults()

SetDefaults sets configuration parameters to its defaults. The code doesn't provide consistency.

func (*GSASConfig) Verify

func (cfg *GSASConfig) Verify() error

Verify checks the configuration for inconsistencies.

type HSConfig

type HSConfig struct {
	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int

	InputLen int
	HashBits int
}

HSConfig provides the configuration parameters for the HashSequencer. Sequencer doesn't use ShrinkSize and and BufferSize itself, but it provides it to other code that have to handle the buffer.

func (*HSConfig) BufConfig added in v0.2.1

func (cfg *HSConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*HSConfig) MarshalJSON added in v0.2.1

func (cfg *HSConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "HS" to the structure.

func (HSConfig) NewSequencer added in v0.1.0

func (cfg HSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer creates a new hash sequencer.

func (*HSConfig) SetBufConfig added in v0.2.1

func (cfg *HSConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the sequencer configuration.

func (*HSConfig) SetDefaults added in v0.2.1

func (cfg *HSConfig) SetDefaults()

SetDefaults sets values that are zero to their defaults values.

func (*HSConfig) Verify

func (cfg *HSConfig) Verify() error

Verify checks the configuration for correctness.

type OSASConfig

type OSASConfig struct {
	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int

	MinMatchLen int
	MaxMatchLen int

	Cost string
}

OSASConfig provides the configuration parameters for the Optimizing Suffix Array Sequencer (OSAS).

func (*OSASConfig) BufConfig added in v0.2.1

func (cfg *OSASConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value for the OSAS configuration.

func (*OSASConfig) NewSequencer added in v0.2.0

func (cfg *OSASConfig) NewSequencer() (s Sequencer, err error)

NewSequencer returns the Optimizing Sequencer Array Sequencer.

func (*OSASConfig) SetBufConfig added in v0.2.1

func (cfg *OSASConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the sequencer configuration.

func (*OSASConfig) SetDefaults added in v0.2.1

func (cfg *OSASConfig) SetDefaults()

SetDefaults sets the defaults for the zero values of the the OSAS configuration.

func (*OSASConfig) Verify

func (cfg *OSASConfig) Verify() error

Verify verifies the configuration for the Optimizing Suffix Array Sequencer.

type Seq

type Seq struct {
	LitLen   uint32
	MatchLen uint32
	Offset   uint32
	Aux      uint32
}

Seq represents a single Lempel-Ziv 77 Sequence describing a match, consisting of the offset, the length of the match and the number of literals preceding the match. The Aux field can be used on upper layers to store additional information.

func (Seq) Len

func (s Seq) Len() int64

Len returns the complete length of the sequence.

type SeqBuffer added in v0.1.1

type SeqBuffer struct {
	// actual buffer data
	Data []byte

	// w position of the head of the window in data.
	W int

	// off start of the data slice, counts all data written and discarded
	// from the buffer.
	Off int64

	BufConfig
}

SeqBuffer provides a base for Sequencer implementation. Since the package allows implementations outside of the package. All members are public.

func (*SeqBuffer) ByteAt added in v0.2.1

func (b *SeqBuffer) ByteAt(off int64) (c byte, err error)

ByteAt returns the byte at total offset off, if it can be provided. If off points to the end of the buffer, ErrEndOfBuffer will be returned otherwise ErrOutOfBuffer.

func (*SeqBuffer) Init added in v0.1.1

func (b *SeqBuffer) Init(cfg BufConfig) error

Init initializes the buffer. The function sets the defaults for the buffer configuration if required and verifies it. Errors will be reported.

func (*SeqBuffer) PeekAt added in v0.2.1

func (b *SeqBuffer) PeekAt(n int, off int64) (p []byte, err error)

PeekAt returns part of the internal data slice starting at total offset off. The total offset takes all data written to the buffer into account. If the off parameter is outside the current buffer ErrOutOfBuffer will be returned. If less than n bytes of data can be provided ErrEndOfBuffer will be returned.

func (*SeqBuffer) ReadAt added in v0.1.1

func (b *SeqBuffer) ReadAt(p []byte, off int64) (n int, err error)

ReadAt reads data from the buffer at position off. If off is is outside the buffer ErrOutOfBuffer will be reported. If there is not enough data to fill p ErrEndOfBuffer will be reported. See SeqBuffer.PeekAt for avoiding the copy.

func (*SeqBuffer) ReadFrom added in v0.1.1

func (b *SeqBuffer) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads the data from reader into the buffer. If there is an error it will be reported. If the buffer is full, ErrFullBuffer will be reported.

func (*SeqBuffer) Reset added in v0.1.1

func (b *SeqBuffer) Reset(data []byte) error

Reset initializes the buffer with new data. The data slice requires a margin of 7 bytes for the hash sequencers to be used directly. If there is no margin the data will be copied into a slice with enough capacity.

func (*SeqBuffer) Shrink added in v0.2.1

func (b *SeqBuffer) Shrink() int

Shrink will move the window head to the shrink size if it is larger. The amount of data discarded from the buffer, named delta, will be returned.

func (*SeqBuffer) Write added in v0.1.1

func (b *SeqBuffer) Write(p []byte) (n int, err error)

Write writes data into the buffer. If not the complete p slice can be copied into the buffer, Write will return ErrFullBuffer.

type SeqConfig added in v0.1.1

type SeqConfig interface {
	NewSequencer() (s Sequencer, err error)
	BufConfig() BufConfig
	SetBufConfig(bc BufConfig)
	SetDefaults()
	Verify() error
}

SeqConfig generates new sequencer instances. Note that the sequencer doesn't use ShrinkSize and BufferSize directly but we added it here, so it can be used for the WriteSequencer which provides a WriteCloser interface.

func ParseJSON added in v0.2.1

func ParseJSON(p []byte) (s SeqConfig, err error)

ParseJSON parses a JSON structure

type Sequencer

type Sequencer interface {
	Sequence(blk *Block, flags int) (n int, err error)
	Reset(data []byte) error
	Shrink() int
	SeqConfig() SeqConfig
	BufferConfig() BufConfig
	Write(p []byte) (n int, err error)
	ReadFrom(r io.Reader) (n int64, err error)
	ReadAt(p []byte, off int64) (n int, err error)
	ByteAt(off int64) (c byte, err error)
}

Sequencer provides the basic interface of a Sequencer. It provides the functions provided by SeqBuffer.

type WrappedSequencer

type WrappedSequencer struct {
	// contains filtered or unexported fields
}

WrappedSequencer is returned by the Wrap function. It provides the Sequence method and reads the data required automatically from the stored reader.

func Wrap

func Wrap(r io.Reader, seq Sequencer) *WrappedSequencer

Wrap combines a reader and a Sequencer and makes a Sequencer. The user doesn't need to take care of filling the Sequencer with additional data. The returned sequencer returns EOF if no further data is available.

Wrap chooses the minimum of 32 kbyte or half of the window size as shrink size.

func (*WrappedSequencer) Reset

func (s *WrappedSequencer) Reset(r io.Reader)

Reset puts the WrappedSequencer in its initial state and changes the wrapped reader to another reader.

func (*WrappedSequencer) Sequence

func (s *WrappedSequencer) Sequence(blk *Block, flags int) (n int, err error)

Sequence creates a block of sequences but reads the required data from the reader if necessary. The function returns io.EOF if no further data is available.

Directories

Path Synopsis
Package lz provides encoders and decoders for LZ77 sequences.
Package lz provides encoders and decoders for LZ77 sequences.
Package lz provides encoders and decoders for LZ77 sequences.
Package lz provides encoders and decoders for LZ77 sequences.
Package suffix provides a suffix sort algorithm.
Package suffix provides a suffix sort algorithm.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL