Documentation
¶
Overview ¶
Package lz supports encoding and decoding of LZ77 sequences. A sequence, as described in the Zstandard specification, consists of a literal copy command followed by a match copy command. The literal copy command is described by the length in literal bytes to be copied and the match command consists of the distance of the match to copy and the length of the match in bytes.
A Sequencer is an encoder that converts a byte stream into blocks of sequences. A Decoder converts the block of sequences into the original decompressed byte stream. We provide a Sequencer interface only supporting the Sequence interface.
The actual basic Sequencer provided by the package support the SeqBuffer interface, which has methods for writing and reading from the buffer. A pure Sequencer is provided by the [Wrap function.
The module provides multiple sequencer implementations that provide different combinations of encoding speed and compression ratios. Usually a slower sequencer will generate a better compression ratio.
The Decoder slides the decompression window through a larger buffer implemented by DecBuffer.
The library supports the implementation of Sequencers outside the package that can then be used by real compressors as provided by the github.com/ulikunitz/xz module.
Index ¶
- Constants
- Variables
- func XZCost(m, o uint32) uint64
- type BDHSConfig
- type BHSConfig
- type BUHSConfig
- type Block
- type BufConfig
- type DHSConfig
- type DecBuffer
- func (b *DecBuffer) ByteAtEnd(off int) byte
- func (b *DecBuffer) Init(cfg DecConfig) error
- func (b *DecBuffer) Read(p []byte) (n int, err error)
- func (b *DecBuffer) Reset()
- func (b *DecBuffer) Write(p []byte) (n int, err error)
- func (b *DecBuffer) WriteBlock(blk Block) (n, k, l int, err error)
- func (b *DecBuffer) WriteByte(c byte) error
- func (b *DecBuffer) WriteMatch(m, o uint32) (n int, err error)
- func (b *DecBuffer) WriteTo(w io.Writer) (n int64, err error)
- type DecConfig
- type Decoder
- type GSASConfig
- type HSConfig
- type OSASConfig
- type Seq
- type SeqBuffer
- func (b *SeqBuffer) ByteAt(off int64) (c byte, err error)
- func (b *SeqBuffer) Init(cfg BufConfig) error
- func (b *SeqBuffer) PeekAt(n int, off int64) (p []byte, err error)
- func (b *SeqBuffer) ReadAt(p []byte, off int64) (n int, err error)
- func (b *SeqBuffer) ReadFrom(r io.Reader) (n int64, err error)
- func (b *SeqBuffer) Reset(data []byte) error
- func (b *SeqBuffer) Shrink() int
- func (b *SeqBuffer) Write(p []byte) (n int, err error)
- type SeqConfig
- type Sequencer
- type WrappedSequencer
Constants ¶
const ( // NoTrailingLiterals tells a sequencer that trailing literals don't // need to be included in the block. NoTrailingLiterals = 1 << iota )
Flags for the sequence function stored in the block structure.
Variables ¶
var ( ErrOutOfBuffer = errors.New("lz: offset out of buffer") ErrEndOfBuffer = errors.New("lz: end of buffer") )
Errors returned by SeqBuffer.ReadAt
var ErrEmptyBuffer = errors.New("lz: no more data in buffer")
ErrEmptyBuffer indicates that no more data is available in the buffer. It will be returned by the Sequence method of Sequencer.
var ErrFullBuffer = errors.New("lz: buffer is full")
ErrFullBuffer indicates that the buffer is full. It will be returned by the Write and ReadFrom methods of the Sequencer.
Functions ¶
Types ¶
type BDHSConfig ¶
type BDHSConfig struct {
ShrinkSize int
BufferSize int
WindowSize int
BlockSize int
InputLen1 int
HashBits1 int
InputLen2 int
HashBits2 int
}
BDHSConfig provides the configuration parameters for the backward-looking double Hash Sequencer.
func (*BDHSConfig) BufConfig ¶ added in v0.2.1
func (cfg *BDHSConfig) BufConfig() BufConfig
BufConfig returns the BufConfig value containing the buffer parameters.
func (*BDHSConfig) MarshalJSON ¶ added in v0.2.1
func (cfg *BDHSConfig) MarshalJSON() (p []byte, err error)
MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "BDHS" to the structure.
func (BDHSConfig) NewSequencer ¶ added in v0.1.0
func (cfg BDHSConfig) NewSequencer() (s Sequencer, err error)
NewSequencer creates a new DoubleHashSequencer.
func (*BDHSConfig) SetBufConfig ¶ added in v0.2.1
func (cfg *BDHSConfig) SetBufConfig(bc BufConfig)
func (*BDHSConfig) SetDefaults ¶ added in v0.2.1
func (cfg *BDHSConfig) SetDefaults()
SetDefaults uses the defaults for the configuration parameters that are set to zero.
func (*BDHSConfig) Verify ¶
func (cfg *BDHSConfig) Verify() error
Verify checks the configuration for errors.
type BHSConfig ¶
type BHSConfig struct {
ShrinkSize int
BufferSize int
WindowSize int
BlockSize int
InputLen int
HashBits int
}
BHSConfig provides the parameters for the backward hash sequencer.
func (*BHSConfig) BufConfig ¶ added in v0.2.1
BufConfig returns the BufConfig value containing the buffer parameters.
func (*BHSConfig) MarshalJSON ¶ added in v0.2.1
MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "BHS" to the structure.
func (BHSConfig) NewSequencer ¶ added in v0.1.0
NewSequencer creates a new Backward Hash Sequencer.
func (*BHSConfig) SetBufConfig ¶ added in v0.2.1
SetBufConfig sets the buffer configuration parameters of the backward hash sequencer configuration.
func (*BHSConfig) SetDefaults ¶ added in v0.2.1
func (cfg *BHSConfig) SetDefaults()
SetDefaults sets values that are zero to their defaults values.
type BUHSConfig ¶ added in v0.1.1
type BUHSConfig struct {
ShrinkSize int
BufferSize int
WindowSize int
BlockSize int
InputLen int
HashBits int
BucketSize int
}
BUHSConfig provides the configuration parameters for the bucket hash sequencer.
func (*BUHSConfig) BufConfig ¶ added in v0.2.1
func (cfg *BUHSConfig) BufConfig() BufConfig
BufConfig returns the BufConfig value containing the buffer parameters.
func (*BUHSConfig) MarshalJSON ¶ added in v0.2.1
func (cfg *BUHSConfig) MarshalJSON() (p []byte, err error)
MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "BUHS" to the structure.
func (BUHSConfig) NewSequencer ¶ added in v0.1.1
func (cfg BUHSConfig) NewSequencer() (s Sequencer, err error)
NewSequencer creates a new hash sequencer.
func (*BUHSConfig) SetBufConfig ¶ added in v0.2.1
func (cfg *BUHSConfig) SetBufConfig(bc BufConfig)
SetBufConfig sets the buffer configuration parameters of the sequencer configuration.
func (*BUHSConfig) SetDefaults ¶ added in v0.2.1
func (cfg *BUHSConfig) SetDefaults()
SetDefaults sets values that are zero to their defaults values.
func (*BUHSConfig) Verify ¶ added in v0.1.1
func (cfg *BUHSConfig) Verify() error
Verify checks the config for correctness.
type Block ¶
Block stores sequences and literals. Note that the sequences stores in the Sequences slice might not consume the whole Literals slice. They must be added to the decoded text after all the sequences have been decoded and their content added to the decoder buffer.
type BufConfig ¶ added in v0.2.0
BufConfig describes the various sizes relevant for the buffer. Note that ShrinkSize should be significantly smaller than BufferSize at most 50%. The WindowSize is independent of the BufferSize, but usually the BufferSize should be larger or equal the WindowSize. A typical BlockSize for instance for the ZStandard compression is 128 kByte and limits the largest match len.
func (*BufConfig) BufferConfig ¶ added in v0.2.0
BufferConfig returns itself, which will be used by the structures embedding the value.
func (*BufConfig) SetDefaults ¶ added in v0.2.1
func (cfg *BufConfig) SetDefaults()
SetDefaults sets the defaults for the various size values. The defaults are given below.
BufferSize: 8 MiB ShrinkSize: 32 KiB (or half of BufferSize, if it is smaller than 64 KiB) WindowSize: BufferSize BlockSize: 128 KiB
type DHSConfig ¶
type DHSConfig struct {
ShrinkSize int
BufferSize int
WindowSize int
BlockSize int
InputLen1 int
HashBits1 int
InputLen2 int
HashBits2 int
}
DHSConfig provides the configuration parameters for the DoubleHashSequencer.
func (*DHSConfig) BufConfig ¶ added in v0.2.1
BufConfig returns the BufConfig value containing the buffer parameters.
func (*DHSConfig) MarshalJSON ¶ added in v0.2.1
MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "DHS" to the structure.
func (DHSConfig) NewSequencer ¶ added in v0.1.0
NewSequencer creates a new DoubleHashSequencer.
func (*DHSConfig) SetBufConfig ¶ added in v0.2.1
SetBufConfig sets the buffer configuration parameters of the sequencer configuration.
func (*DHSConfig) SetDefaults ¶ added in v0.2.1
func (cfg *DHSConfig) SetDefaults()
SetDefaults uses the defaults for the configuration parameters that are set to zero.
type DecBuffer ¶ added in v0.1.1
type DecBuffer struct {
// Data is the actual buffer. The end of the slice is also the head of
// the dictionary window.
Data []byte
// R tracks the position of the reads from the buffer and must be less
// or equal the length of the Data slice.
R int
// Off records the total offset and marks the end of the Data slice,
// which is also the end of the dictionary window.
Off int64
// DecConfig provides the configuration parameters WindowSize and
// BufferSize.
DecConfig
}
DecBuffer provides a simple buffer for the decoding of LZ77 sequences.
func (*DecBuffer) Reset ¶ added in v0.1.1
func (b *DecBuffer) Reset()
Reset puts the DecBuffer back to the initialized status.
func (*DecBuffer) Write ¶ added in v0.1.1
Write puts the slice into the buffer. The method will write the slice only fully or will return 0, ErrFullBuffer.
func (*DecBuffer) WriteBlock ¶ added in v0.1.1
WriteBlock writes sequences from the block into the buffer. A single sequence will be written in an atomic manner, because the block value will not be modified. If there is not enough space in the buffer ErrFullBuffer will be returned.
We are not limiting the growth of the array to BufferSize. We may consume more memory but we are faster.
The return values n, k and l provide the number of bytes written into the buffer, the number of sequences as well as the number of literals.
func (*DecBuffer) WriteMatch ¶ added in v0.1.1
WriteMatch puts the match at the end of the buffer. The match will only be written completely or n=0 and ErrFullBuffer will be returned.
type DecConfig ¶ added in v0.2.1
type DecConfig struct {
// Size of the sliding dictionary window in bytes.
WindowSize int
// Maximum size of the buffer in bytes.
BufferSize int
}
DecConfig contains the parameters for the DecBuffer and Decoder types. The WindowSize must be smaller than the BufferSize. It is recommended to set the BufferSize twice as large as the WindowSize.
func (*DecConfig) SetDefaults ¶ added in v0.2.1
func (cfg *DecConfig) SetDefaults()
SetDefaults sets the zero values in DecConfig to default values. Note that the default BufferSize is twice the WindowSize.
type Decoder ¶
type Decoder struct {
// contains filtered or unexported fields
}
Decoder decodes LZ77 sequences and writes them into the writer.
func NewDecoder ¶
NewDecoder creates a new decoder. The first issue with the configuration will be reported.
func (*Decoder) Init ¶
Init initializes the decoder. The first issue of the configuration value will be reported as error.
func (*Decoder) WriteBlock ¶
WriteBlock writes the block into the decoder. It returns the number n of bytes, the number k of sequencers and the number l of literal bytes written to the decoder.
type GSASConfig ¶
type GSASConfig struct {
ShrinkSize int
BufferSize int
WindowSize int
BlockSize int
// minimum match len
MinMatchLen int
}
GSASConfig defines the configuration parameter for the greedy suffix array sequencer.
func (*GSASConfig) BufConfig ¶ added in v0.2.1
func (cfg *GSASConfig) BufConfig() BufConfig
BufConfig returns the BufConfig value containing the buffer parameters.
func (*GSASConfig) MarshalJSON ¶ added in v0.2.1
func (cfg *GSASConfig) MarshalJSON() (p []byte, err error)
MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "GSAS" to the structure.
func (GSASConfig) NewSequencer ¶ added in v0.1.0
func (cfg GSASConfig) NewSequencer() (s Sequencer, err error)
NewSequencer generates a new sequencer using the configuration parameters in the structure.
func (*GSASConfig) SetBufConfig ¶ added in v0.2.1
func (cfg *GSASConfig) SetBufConfig(bc BufConfig)
SetBufConfig sets the buffer configuration parameters of the sequencer configuration.
func (*GSASConfig) SetDefaults ¶ added in v0.2.1
func (cfg *GSASConfig) SetDefaults()
SetDefaults sets configuration parameters to its defaults. The code doesn't provide consistency.
func (*GSASConfig) Verify ¶
func (cfg *GSASConfig) Verify() error
Verify checks the configuration for inconsistencies.
type HSConfig ¶
type HSConfig struct {
ShrinkSize int
BufferSize int
WindowSize int
BlockSize int
InputLen int
HashBits int
}
HSConfig provides the configuration parameters for the HashSequencer. Sequencer doesn't use ShrinkSize and and BufferSize itself, but it provides it to other code that have to handle the buffer.
func (*HSConfig) BufConfig ¶ added in v0.2.1
BufConfig returns the BufConfig value containing the buffer parameters.
func (*HSConfig) MarshalJSON ¶ added in v0.2.1
MarshalJSON creates the JSON string for the configuration. Note that it adds a property Name with value "HS" to the structure.
func (HSConfig) NewSequencer ¶ added in v0.1.0
NewSequencer creates a new hash sequencer.
func (*HSConfig) SetBufConfig ¶ added in v0.2.1
SetBufConfig sets the buffer configuration parameters of the sequencer configuration.
func (*HSConfig) SetDefaults ¶ added in v0.2.1
func (cfg *HSConfig) SetDefaults()
SetDefaults sets values that are zero to their defaults values.
type OSASConfig ¶
type OSASConfig struct {
ShrinkSize int
BufferSize int
WindowSize int
BlockSize int
MinMatchLen int
MaxMatchLen int
Cost string
}
OSASConfig provides the configuration parameters for the Optimizing Suffix Array Sequencer (OSAS).
func (*OSASConfig) BufConfig ¶ added in v0.2.1
func (cfg *OSASConfig) BufConfig() BufConfig
BufConfig returns the BufConfig value for the OSAS configuration.
func (*OSASConfig) NewSequencer ¶ added in v0.2.0
func (cfg *OSASConfig) NewSequencer() (s Sequencer, err error)
NewSequencer returns the Optimizing Sequencer Array Sequencer.
func (*OSASConfig) SetBufConfig ¶ added in v0.2.1
func (cfg *OSASConfig) SetBufConfig(bc BufConfig)
SetBufConfig sets the buffer configuration parameters of the sequencer configuration.
func (*OSASConfig) SetDefaults ¶ added in v0.2.1
func (cfg *OSASConfig) SetDefaults()
SetDefaults sets the defaults for the zero values of the the OSAS configuration.
func (*OSASConfig) Verify ¶
func (cfg *OSASConfig) Verify() error
Verify verifies the configuration for the Optimizing Suffix Array Sequencer.
type Seq ¶
Seq represents a single Lempel-Ziv 77 Sequence describing a match, consisting of the offset, the length of the match and the number of literals preceding the match. The Aux field can be used on upper layers to store additional information.
type SeqBuffer ¶ added in v0.1.1
type SeqBuffer struct {
// actual buffer data
Data []byte
// w position of the head of the window in data.
W int
// off start of the data slice, counts all data written and discarded
// from the buffer.
Off int64
BufConfig
}
SeqBuffer provides a base for Sequencer implementation. Since the package allows implementations outside of the package. All members are public.
func (*SeqBuffer) ByteAt ¶ added in v0.2.1
ByteAt returns the byte at total offset off, if it can be provided. If off points to the end of the buffer, ErrEndOfBuffer will be returned otherwise ErrOutOfBuffer.
func (*SeqBuffer) Init ¶ added in v0.1.1
Init initializes the buffer. The function sets the defaults for the buffer configuration if required and verifies it. Errors will be reported.
func (*SeqBuffer) PeekAt ¶ added in v0.2.1
PeekAt returns part of the internal data slice starting at total offset off. The total offset takes all data written to the buffer into account. If the off parameter is outside the current buffer ErrOutOfBuffer will be returned. If less than n bytes of data can be provided ErrEndOfBuffer will be returned.
func (*SeqBuffer) ReadAt ¶ added in v0.1.1
ReadAt reads data from the buffer at position off. If off is is outside the buffer ErrOutOfBuffer will be reported. If there is not enough data to fill p ErrEndOfBuffer will be reported. See SeqBuffer.PeekAt for avoiding the copy.
func (*SeqBuffer) ReadFrom ¶ added in v0.1.1
ReadFrom reads the data from reader into the buffer. If there is an error it will be reported. If the buffer is full, ErrFullBuffer will be reported.
func (*SeqBuffer) Reset ¶ added in v0.1.1
Reset initializes the buffer with new data. The data slice requires a margin of 7 bytes for the hash sequencers to be used directly. If there is no margin the data will be copied into a slice with enough capacity.
type SeqConfig ¶ added in v0.1.1
type SeqConfig interface {
NewSequencer() (s Sequencer, err error)
BufConfig() BufConfig
SetBufConfig(bc BufConfig)
SetDefaults()
Verify() error
}
SeqConfig generates new sequencer instances. Note that the sequencer doesn't use ShrinkSize and BufferSize directly but we added it here, so it can be used for the WriteSequencer which provides a WriteCloser interface.
type Sequencer ¶
type Sequencer interface {
Sequence(blk *Block, flags int) (n int, err error)
Reset(data []byte) error
Shrink() int
SeqConfig() SeqConfig
BufferConfig() BufConfig
Write(p []byte) (n int, err error)
ReadFrom(r io.Reader) (n int64, err error)
ReadAt(p []byte, off int64) (n int, err error)
ByteAt(off int64) (c byte, err error)
}
Sequencer provides the basic interface of a Sequencer. It provides the functions provided by SeqBuffer.
type WrappedSequencer ¶
type WrappedSequencer struct {
// contains filtered or unexported fields
}
WrappedSequencer is returned by the Wrap function. It provides the Sequence method and reads the data required automatically from the stored reader.
func Wrap ¶
func Wrap(r io.Reader, seq Sequencer) *WrappedSequencer
Wrap combines a reader and a Sequencer and makes a Sequencer. The user doesn't need to take care of filling the Sequencer with additional data. The returned sequencer returns EOF if no further data is available.
Wrap chooses the minimum of 32 kbyte or half of the window size as shrink size.
func (*WrappedSequencer) Reset ¶
func (s *WrappedSequencer) Reset(r io.Reader)
Reset puts the WrappedSequencer in its initial state and changes the wrapped reader to another reader.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package lz provides encoders and decoders for LZ77 sequences.
|
Package lz provides encoders and decoders for LZ77 sequences. |
|
Package lz provides encoders and decoders for LZ77 sequences.
|
Package lz provides encoders and decoders for LZ77 sequences. |
|
Package suffix provides a suffix sort algorithm.
|
Package suffix provides a suffix sort algorithm. |