lz

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 16, 2023 License: BSD-3-Clause Imports: 9 Imported by: 2

README

Module LZ

The LZ module provides sequencers that convert byte streams into blocks of Lempel-Ziv 77 sequences. It is designed to support multiple compression methods that differ in the way they are encoding those LZ77 sequences.

Documentation

Overview

Package lz provides encoders and decoders for LZ77 sequences. A sequence, as described in the zstd specification, describes a number of literal bytes and a match.

A Sequencer is an encoder that converts a byte stream into blocks of sequences. A Decoder converts the block of sequences into the original decompressed byte stream. A wrapped Sequencer reads the byte stream from a reader. The sequencers are provided here separately because they are more efficient for encoding byte slices directly.

The module provides multiple sequencers that provide different combinations of encoding speed and compression ratios. Usually a slower sequencer will generate a better compression ratio.

The Decoder slides the decompression window through a larger buffer implemented by DecBuffer.

Index

Constants

View Source
const (
	// NoTrailingLiterals tells a sequencer that trailing literals don't
	// need to be included in the block.
	NoTrailingLiterals = 1 << iota
)

Flags for the Sequence function.

Variables

View Source
var ErrEmptyBuffer = errors.New("lz: empty buffer")

ErrEmptyBuffer indicates that the buffer is empty and no more data can be read or processed. More data must be provided to the buffer.

View Source
var ErrFullBuffer = errors.New("lz: full buffer")

ErrFullBuffer indicates that the buffer is full and no further data can be written.

Functions

This section is empty.

Types

type BDHSConfig

type BDHSConfig struct {
	// sequence buffer configuration
	SBConfig

	// smaller hash input length; range 2 to 8
	InputLen1 int
	// hash bits for the smaller hash input length
	HashBits1 int
	// larger input length; range 2 to 8
	InputLen2 int
	// hash bits for the larger hash input length
	HashBits2 int
}

BDHSConfig provides the configuration parameters for the backward-looking double Hash Sequencer.

func (*BDHSConfig) ApplyDefaults

func (cfg *BDHSConfig) ApplyDefaults()

ApplyDefaults sets for the zero fields in the configuration to the default values.

func (BDHSConfig) NewSequencer added in v0.1.0

func (cfg BDHSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer creates a new BackwardDoubleHashSequencer.

func (*BDHSConfig) Verify

func (cfg *BDHSConfig) Verify() error

Verify checks the configuration for errors.

type BHSConfig

type BHSConfig struct {
	SBConfig
	// number of bits of the hash index
	HashBits int
	// length of the input used; range [2,8]
	InputLen int
}

BHSConfig provides the parameters for the backward hash sequencer.

func (*BHSConfig) ApplyDefaults

func (cfg *BHSConfig) ApplyDefaults()

ApplyDefaults sets values that are zero to their defaults values.

func (BHSConfig) NewSequencer added in v0.1.0

func (cfg BHSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer create a new backward hash sequencer.

func (*BHSConfig) Verify

func (cfg *BHSConfig) Verify() error

Verify checks the config for correctness.

type BTreeConfig added in v0.1.1

type BTreeConfig struct {
	Order   int
	Matches int
}

func (*BTreeConfig) ApplyDefaults added in v0.1.1

func (cfg *BTreeConfig) ApplyDefaults()

func (*BTreeConfig) NewMatchFinder added in v0.1.1

func (cfg *BTreeConfig) NewMatchFinder() (mf MatchFinder, err error)

func (*BTreeConfig) Verify added in v0.1.1

func (cfg *BTreeConfig) Verify() error

type BTreeHashConfig added in v0.1.1

type BTreeHashConfig struct {
	Order    int
	Matches  int
	InputLen int
	HashBits int
}

func (*BTreeHashConfig) ApplyDefaults added in v0.1.1

func (cfg *BTreeHashConfig) ApplyDefaults()

func (*BTreeHashConfig) NewMatchFinder added in v0.1.1

func (cfg *BTreeHashConfig) NewMatchFinder() (mf MatchFinder, err error)

func (*BTreeHashConfig) Verify added in v0.1.1

func (cfg *BTreeHashConfig) Verify() error

type BUHSConfig added in v0.1.1

type BUHSConfig struct {
	SBConfig
	// number of bits of the hash index
	HashBits int
	// length of the input used; range [2,8]
	InputLen int
	// size of a bucket; range [1,128]
	BucketSize int
}

BUHSConfig provides the configuration parameters for the bucket hash sequencer.

func (*BUHSConfig) ApplyDefaults added in v0.1.1

func (cfg *BUHSConfig) ApplyDefaults()

ApplyDefaults sets values that are zero to their defaults values.

func (BUHSConfig) NewSequencer added in v0.1.1

func (cfg BUHSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer creates a new hash sequencer.

func (*BUHSConfig) Verify added in v0.1.1

func (cfg *BUHSConfig) Verify() error

Verify checks the config for correctness.

type Block

type Block struct {
	Sequences []Seq
	Literals  []byte
}

Block stores sequences and literals. Note that literals that are not consumed by the Sequences slice need to be added to the end of the reconstructed data.

func (*Block) Len

func (b *Block) Len() int64

Len computes the length of the block in bytes. It assumes that the sum of the literal lengths in the sequences doesn't exceed that length of the Literals byte slice.

type CostEstimator added in v0.1.1

type CostEstimator interface {
	Cost(m, o uint32) uint64
	Push(o uint32)
	Reset()
}

CostEstimator provides a cost estimation to encode matches and literals. The costs are provided for a match with a non-zero offset or m literal bytes with a zero o value. The Costs should be provided in bits, but other measures like 1/100th of a bit are also possible. The Update method is provided to update the offset history.

type DConfig

type DConfig struct {
	WindowSize int
	MaxSize    int
}

DConfig contains the configuration for a simple Decoder. It provides the window size and the MaxSize of the buffer.

func (*DConfig) ApplyDefaults

func (cfg *DConfig) ApplyDefaults()

ApplyDefaults applies the defaults for the configuration.

func (*DConfig) Verify

func (cfg *DConfig) Verify() error

Verify checks the configuration and returns any errors.

type DHSConfig

type DHSConfig struct {
	SBConfig
	// smaller hash input length; range 2 to 8
	InputLen1 int
	// hash bits for the smaller hash input length
	HashBits1 int
	// larger input length; range 2 to 8
	InputLen2 int
	// hash bits for the larger hash input length
	HashBits2 int
}

DHSConfig provides the configuration parameters for the DoubleHashSequencer.

func (*DHSConfig) ApplyDefaults

func (cfg *DHSConfig) ApplyDefaults()

ApplyDefaults uses the defaults for the configuration parameters that are set to zero.

func (DHSConfig) NewSequencer added in v0.1.0

func (cfg DHSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer creates a new DoubleHashSequencer.

func (*DHSConfig) Verify

func (cfg *DHSConfig) Verify() error

Verify checks the configuration for errors.

type DecBuffer added in v0.1.1

type DecBuffer struct {
	// contains filtered or unexported fields
}

DecBuffer provides a simple buffer to decode sequences. The max field gives a target that can be exceeded once.

func (*DecBuffer) Available added in v0.1.1

func (buf *DecBuffer) Available() int

Available provides the amount of data that can be written into the buffer.

func (*DecBuffer) ByteAtEnd added in v0.1.1

func (buf *DecBuffer) ByteAtEnd(i int) byte

ByteAtEnd reads the byte with offset i from the end. If it it points outside the window the value returned is 0.

func (*DecBuffer) Init added in v0.1.1

func (buf *DecBuffer) Init(windowSize, max int) error

Init initialized the buffer. The window size must be larger than 1 and max must be larger then the windowSize.

func (*DecBuffer) Len added in v0.1.1

func (buf *DecBuffer) Len() int

Len returns the number of bytes in the unread portion of the buffer.

func (*DecBuffer) Pos added in v0.1.1

func (buf *DecBuffer) Pos() int64

Pos returns the file position of the window head.

func (*DecBuffer) Read added in v0.1.1

func (buf *DecBuffer) Read(p []byte) (n int, err error)

Read reads data from the buffer. The function never returns an error.

func (*DecBuffer) Reset added in v0.1.1

func (buf *DecBuffer) Reset()

Reset puts the buffer into its initial state.

func (*DecBuffer) Write added in v0.1.1

func (buf *DecBuffer) Write(p []byte) (n int, err error)

Write writes the provided byte slice into the buffer and extends the window accordingly.

func (*DecBuffer) WriteBlock added in v0.1.1

func (buf *DecBuffer) WriteBlock(blk Block) (k, l, n int, err error)

WriteBlock writes a whole list of sequences, each sequence will be written atomically. The functions returns the number of sequences k written, the number of literals l consumed and the number of bytes n generated.

func (*DecBuffer) WriteByte added in v0.1.1

func (buf *DecBuffer) WriteByte(c byte) error

WriteByte writes a single byte to the buffer and extends the window.

func (*DecBuffer) WriteMatch added in v0.1.1

func (buf *DecBuffer) WriteMatch(n, offset int) error

WriteMatch writes a match into the buffer and extends the window by the match.

func (*DecBuffer) WriteTo added in v0.1.1

func (buf *DecBuffer) WriteTo(w io.Writer) (n int64, err error)

WriteTo writes all data to read into the writer. It only returns an error if the Write fails.

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

A Decoder decodes sequences and writes data into the writer.

func NewDecoder

func NewDecoder(w io.Writer, cfg DConfig) (*Decoder, error)

NewDecoder allocates and initializes a decoder. If the windowSize is not positive an error will be returned.

func (*Decoder) Flush

func (d *Decoder) Flush() error

Flush writes all decoded data to the underlying writer.

func (*Decoder) Init

func (d *Decoder) Init(w io.Writer, cfg DConfig) error

Init initializes the decoder. Internal buffers will be reused if they are large enough.

func (*Decoder) Reset

func (d *Decoder) Reset(w io.Writer)

Reset resets the decoder to its initial state.

func (*Decoder) Write

func (d *Decoder) Write(p []byte) (n int, err error)

Write writes data directly into the decoder.

func (*Decoder) WriteBlock

func (d *Decoder) WriteBlock(blk Block) (k, l, n int, err error)

WriteBlock writes a complete block into the decoder.

func (*Decoder) WriteMatch

func (d *Decoder) WriteMatch(n int, offset int) error

WriteMatch writes a single match into the decoder.

type GSASConfig

type GSASConfig struct {
	SBConfig
	// minimum match len
	MinMatchLen int
}

GSASConfig defines the configuration parameter for the greedy suffix array sequencer.

func (*GSASConfig) ApplyDefaults

func (cfg *GSASConfig) ApplyDefaults()

ApplyDefaults sets configuration parameters to its defaults. The code doesn't provide consistency.

func (GSASConfig) NewSequencer added in v0.1.0

func (cfg GSASConfig) NewSequencer() (s Sequencer, err error)

NewSequencer generates a new sequencer using the configuration parameters in the structure.

func (*GSASConfig) Verify

func (cfg *GSASConfig) Verify() error

Verify checks the configuration for inconsistencies.

type GenericSequencerConfig added in v0.1.1

type GenericSequencerConfig struct {
	SBConfig
	MatchFinderConfigs []MatchFinderConfig
	CostEstimator      CostEstimator
}

func (*GenericSequencerConfig) ApplyDefaults added in v0.1.1

func (cfg *GenericSequencerConfig) ApplyDefaults()

func (*GenericSequencerConfig) NewSequencer added in v0.1.1

func (cfg *GenericSequencerConfig) NewSequencer() (s Sequencer, err error)

func (*GenericSequencerConfig) Verify added in v0.1.1

func (cfg *GenericSequencerConfig) Verify() error

type HSConfig

type HSConfig struct {
	SBConfig
	// number of bits of the hash index
	HashBits int
	// length of the input used; range [2,8]
	InputLen int
}

HSConfig provides the configuration parameters for the HashSequencer.

func (*HSConfig) ApplyDefaults

func (cfg *HSConfig) ApplyDefaults()

ApplyDefaults sets values that are zero to their defaults values.

func (HSConfig) NewSequencer added in v0.1.0

func (cfg HSConfig) NewSequencer() (s Sequencer, err error)

NewSequencer creates a new hash sequencer.

func (*HSConfig) Verify

func (cfg *HSConfig) Verify() error

Verify checks the config for correctness.

type HashConfig added in v0.1.1

type HashConfig struct {
	InputLen int
	HashBits int
}

func (*HashConfig) ApplyDefaults added in v0.1.1

func (cfg *HashConfig) ApplyDefaults()

func (*HashConfig) NewMatchFinder added in v0.1.1

func (cfg *HashConfig) NewMatchFinder() (mf MatchFinder, err error)

func (*HashConfig) Verify added in v0.1.1

func (cfg *HashConfig) Verify() error

type MatchFinder added in v0.1.1

type MatchFinder interface {
	Add(pos uint32, x uint64)
	AppendMatchesAndAdd(m []uint32, pos uint32, x uint64) []uint32
	Adapt(delta uint32)

	// Resets the match finder and sets the pointer to the new data slice. The
	// pointer is used to ensure that length changes are available to the match
	// finders.
	Reset(pdata *[]byte)
}

type MatchFinderConfig added in v0.1.1

type MatchFinderConfig interface {
	NewMatchFinder() (mf MatchFinder, err error)
	ApplyDefaults()
	Verify() error
}

type Params added in v0.1.1

type Params struct {
	// MemoryBudget specifies the memory budget in bytes for the sequencer. The
	// budget controls how much memory the sequencer has for the window size and the
	// match search data structures. It doesn't control temporary memory
	// allocations. It is a budget, so it can be overdrawn, right?
	MemoryBudget int
	// Effort is scale from 1 to 10 controlling the CPU consumption. A
	// sequencer with an effort of 1 might be extremely fast but will have a
	// worse compression ratio. The default effort is 6 and will provide a
	// reasonable compromise between compression speed and compression
	// ratio. Effort 10 will provide the best compression ratio but will be
	// very slow.
	Effort int
	// BlockSize defines a maximum block size. Note that the configurator
	// might create a smaller block size to fit the match search data
	// structures into the memory budget. The main consumer is ZStandard
	// which has a maximum block size of 128 kByte.
	BlockSize int
	// WindowSize fixes the window size.
	WindowSize int
}

Params provides a general method to create sequencers.

func (*Params) ApplyDefaults added in v0.1.1

func (p *Params) ApplyDefaults()

ApplyDefaults applies the defaults to the Config structure. The memory budget is set to 2 MB, the effort to 5 and the block size to 128 kByte unless no other non-zero values have been set.

func (*Params) Verify added in v0.1.1

func (p *Params) Verify() error

Verify checks the configuration for errors. Use ApplyDefaults before this function because it doesn't support zero values in all cases.

type SBConfig added in v0.1.1

type SBConfig struct {
	// WindowSize is the maximum window size in bytes
	WindowSize int
	// ShrinkSize provides the size the buffer is shrunk to if the buffer
	// has been completely filled and encoded. It must be smaller than the
	// BufferSize, and should be significantly so.
	ShrinkSize int
	// BufferSize defines the maximum size of the buffer. The BufferSize
	// must be greater or equal the window size.
	BufferSize int

	// BlockSize provides the block size.
	BlockSize int
}

SBConfig stores the parameter for the Window.

func (*SBConfig) ApplyDefaults added in v0.1.1

func (cfg *SBConfig) ApplyDefaults()

ApplyDefaults sets the defaults for the sequencer buffer configuration.

func (*SBConfig) BufferConfig added in v0.1.1

func (cfg *SBConfig) BufferConfig() *SBConfig

BufferConfig returns the a pointer to the sequencer buffer configuration, SBConfig.

func (*SBConfig) SetWindowSize added in v0.1.1

func (cfg *SBConfig) SetWindowSize(s int) error

SetWindowSize sets the window size. BufferSize and ShrinkSize will be adapted.

func (*SBConfig) Verify added in v0.1.1

func (cfg *SBConfig) Verify() error

Verify checks the sequencer buffer configuration for issues and returns the first issue found as error.

type Seq

type Seq struct {
	LitLen   uint32
	MatchLen uint32
	Offset   uint32
	Aux      uint32
}

Seq represents a single Lempel-Ziv 77 Sequence describing a match, consisting of the offset, the length of the match and the number of literals preceding the match. The Aux field can be used on upper layers to store additional information.

func (Seq) Len

func (s Seq) Len() int64

Len returns the complete length of the sequence.

type SeqBuffer added in v0.1.1

type SeqBuffer struct {

	// SBConfig stores the configuration parameters
	SBConfig
	// contains filtered or unexported fields
}

SeqBuffer acts as a buffer for the sequencers. The buffer contains the window from which matches can't be copied in a sequence. Data is written into the buffer, the sequencer creates Lempel-Ziv sequences and advances the window head. Since all positions behind the window head are in the window we even save one check in the sequencer loop.

The Sequencer ensures that len(w.data)+7 < cap(w.data), which allows 64-bit reads on all byte position of the window.

func (*SeqBuffer) Available added in v0.1.1

func (w *SeqBuffer) Available() int

Available returns the number of bytes are available for writing into the buffer.

func (*SeqBuffer) Buffer added in v0.1.1

func (w *SeqBuffer) Buffer() *SeqBuffer

Buffer returns a pointer to itself. It provides the function to the sequencer structure who embed SeqBuffer.

func (*SeqBuffer) Buffered added in v0.1.1

func (w *SeqBuffer) Buffered() int

Buffered returns the number of bytes buffered but are not yet part of the window. They have to be sequenced first.

func (*SeqBuffer) Init added in v0.1.1

func (w *SeqBuffer) Init(cfg SBConfig) error

Init initializes the window. The parameter size must be positive.

func (*SeqBuffer) Len added in v0.1.1

func (w *SeqBuffer) Len() int

Len returns the actual length of the current window

func (*SeqBuffer) Pos added in v0.1.1

func (w *SeqBuffer) Pos() int64

Pos returns the absolute position of the window head

func (*SeqBuffer) ReadAt added in v0.1.1

func (w *SeqBuffer) ReadAt(p []byte, pos int64) (n int, err error)

ReadAt allows to read data from the window directly.

func (*SeqBuffer) ReadByteAt added in v0.1.1

func (w *SeqBuffer) ReadByteAt(pos int64) (c byte, err error)

ReadByteAt returns the byte at the absolute position pos unless pos is outside of the data stored in window.

func (*SeqBuffer) ReadFrom added in v0.1.1

func (w *SeqBuffer) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom transfers data from the reader into the buffer.

func (*SeqBuffer) Reset added in v0.1.1

func (w *SeqBuffer) Reset(data []byte) error

Reset cleans the window structure for reuse. It will use the data structure for the data. Note that the condition cap(data) > len(data) + 7 must be met to avoid copying. The data length must not exceed the buffer size.

func (*SeqBuffer) Write added in v0.1.1

func (w *SeqBuffer) Write(p []byte) (n int, err error)

Write puts data into the window. It will return ErrFullBuffer

type SeqConfig added in v0.1.1

type SeqConfig interface {
	NewSequencer() (s Sequencer, err error)
	BufferConfig() *SBConfig
	ApplyDefaults()
	Verify() error
}

SeqConfig generates new sequencer instances.

func Config

func Config(p Params) (c SeqConfig, err error)

Config converts the parameters into an actual configuration.

type Sequencer

type Sequencer interface {
	// Sequence finds Lempel-Ziv sequences.
	Sequence(blk *Block, flags int) (n int, err error)
	// Shrink reduces the actual window length to make more buffer space
	// available.
	Shrink()
	// Buffer returns a pointer to the sequencer buffer.
	Buffer() *SeqBuffer
	// Reset allows the reuse of the Sequencer. The data slice provides new
	// data to sequence but Sequencers are usually also Writers for
	// providing the data.
	Reset(data []byte) error
}

Sequencer transforms byte streams into Lempel-Ziv sequences, that allow the reconstruction of the input data.

type SimpleEstimator added in v0.1.1

type SimpleEstimator struct {
	Rep [4]uint32
}

SimpleEstimator provides a very simple cost model for compression. It supports offset repeats as in LZMA.

func (*SimpleEstimator) Cost added in v0.1.1

func (e *SimpleEstimator) Cost(m, o uint32) uint64

Cost provides a simple cost estimation for the match or a literal, offset 0.

func (*SimpleEstimator) Push added in v0.1.1

func (e *SimpleEstimator) Push(o uint32)

Push writes the offset o into the history.

func (*SimpleEstimator) Reset added in v0.1.1

func (e *SimpleEstimator) Reset()

type WrappedSequencer

type WrappedSequencer struct {
	// contains filtered or unexported fields
}

WrappedSequencer is returned by the Wrap function. It provides the Sequence method and reads the data required automatically from the stored reader.

func Wrap

func Wrap(r io.Reader, seq Sequencer) *WrappedSequencer

Wrap combines a reader and a Sequencer and makes a Sequencer. The user doesn't need to take care of filling the Sequencer with additional data. The returned sequencer returns EOF if no further data is available.

Wrap chooses the minimum of 32 kbyte or half of the window size as shrink size.

func (*WrappedSequencer) MemSize

func (s *WrappedSequencer) MemSize() uintptr

MemSize returns the memory consumption of the wrapped sequencer.

func (*WrappedSequencer) Reset

func (s *WrappedSequencer) Reset(r io.Reader)

Reset puts the WrappedSequencer in its initial state and changes the wrapped reader to another reader.

func (*WrappedSequencer) Sequence

func (s *WrappedSequencer) Sequence(blk *Block, flags int) (n int, err error)

Sequence creates a block of sequences but reads the required data from the reader if necessary. The function returns io.EOF if no further data is available.

Directories

Path Synopsis
Package suffix provides a suffix sort algorithm.
Package suffix provides a suffix sort algorithm.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL