Documentation
¶
Overview ¶
Package lz supports encoding and decoding of LZ77 sequences. A sequence, as described in the Zstandard specification, consists of a literal copy command followed by a match copy command. The literal copy command is described by the length in literal bytes to be copied, and the match command consists of the distance of the match to copy and the length of the match in bytes.
A Parser converts a byte stream into blocks of sequences. The Decoder converts the block of sequences into the original decompressed byte stream.
The module provides multiple parser implementations that offer different combinations of encoding speed and compression ratios. Usually, a slower parser will generate a better compression ratio.
The package has a generic parser that needs a path finder, which selects the sequences for data block and a mapper that finds potential matches in the byte stream.
For optimization we may provide custom implementations that integrate matcher or path finder into the parser and avoids the calling overhead through interfaces.
The library supports the implementation of parsers outside of this package.
Index ¶
- Variables
- type AllOption
- type Block
- type Buffer
- func (b *Buffer) ByteAt(off int64) (c byte, err error)
- func (b *Buffer) Init(size, retentionSize int, shift ShiftFunc) error
- func (b *Buffer) Parsable() int
- func (b *Buffer) ReadAt(p []byte, off int64) (n int, err error)
- func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error)
- func (b *Buffer) Reset(data []byte) error
- func (b *Buffer) Write(p []byte) (n int, err error)
- type Decoder
- func (d *Decoder) ByteAtEnd(off int) byte
- func (d *Decoder) Init(opts ...DecoderOption) error
- func (d *Decoder) Read(p []byte) (n int, err error)
- func (d *Decoder) Reset()
- func (d *Decoder) Write(p []byte) (n int, err error)
- func (d *Decoder) WriteBlock(blk *Block) (n int, err error)
- func (d *Decoder) WriteByte(c byte) error
- func (d *Decoder) WriteMatch(mu, ou uint32) (n int, err error)
- func (d *Decoder) WriteTo(w io.Writer) (n int64, err error)
- type DecoderConfig
- type DecoderOption
- type Entry
- type Mapper
- type Matcher
- type Parser
- type ParserConfig
- type ParserFlags
- type ParserOption
- type PathFinder
- type Seq
- type ShiftFunc
Constants ¶
This section is empty.
Variables ¶
var ErrEndOfBuffer = errors.New("lz: end of buffer")
ErrEndOfBuffer is returned at the end of the buffer.
var ErrFullBuffer = errors.New("lz: full buffer")
ErrFullBuffer is returned when the buffer is full and no more data can be written to it.
var ErrOutOfBuffer = errors.New("lz: offset outside of buffer")
ErrOutOfBuffer is returned when the offset is outside of the buffer.
var ErrStartOfBuffer = errors.New("lz: start of buffer")
ErrStartOfBuffer is returned at the start of the buffer.
Functions ¶
This section is empty.
Types ¶
type AllOption ¶ added in v0.7.4
type AllOption interface {
ParserOption
DecoderOption
}
func WithBufferSize ¶ added in v0.7.4
WithBufferSize sets the buffer size for the parser and decoder. The buffer size is the maximum size of the internal buffer for the parser and decoder. The buffer size must be greater than or equal to 2 bytes, and the retention size for the parser and the window size for the decoder must be less than the buffer size.
func WithWindowSize ¶ added in v0.7.4
WithWindowSize sets the window size for the parser and decoder. The window size is the maximum distance of a match to be copied. The window size must be greater than 0. For the decoder it must be significantly smaller than the buffer size, and for the parser it should be significantly smaller than the buffer size.
type Block ¶
Block stores sequences and literals. Note that the sequences stored in the Sequences slice might not consume the entire Literals slice. The remaining literal bytes must be added to the decoded text after all sequences have been decoded.
type Buffer ¶
type Buffer struct {
Data []byte
// Window end index
W int
// maximum buffer size
Size int
// RetentionSize is the number of bytes to keep when pruning the buffer.
RetentionSize int
// offset of Data
Off int64
// Shift will be called when the buffer is pruned to inform other
// components about the number of bytes removed from the start of
// the buffer.
Shift ShiftFunc
}
Buffer is the Buffer used for LZ parsing.
The Off field describes the offset of Data[0] in the original stream. The W points to the end of sliding window used for copying matches.
Data is not fully allocated at the beginning. It grows with the usage. There must be always 7 extra bytes allocated at the end of Data to allow easy reads of data from the Buffer.
func (*Buffer) ByteAt ¶ added in v0.5.0
ByteAt returns the byte at offset off. If off is outside of the buffer, ErrOutOfBuffer is returned.
func (*Buffer) Init ¶
Init initializes the buffer. The old data slice is reused and the capacity might be larger than the new buffer size.
func (*Buffer) Parsable ¶ added in v0.6.11
Parsable returns the number of bytes that can be parsed from the buffer.
func (*Buffer) ReadAt ¶ added in v0.5.0
ReadAt reads len(p) bytes from the buffer starting at byte offset off. It returns the number of bytes read and any error encountered. If off is outside of the buffer, ErrOutOfBuffer is returned.
func (*Buffer) ReadFrom ¶ added in v0.5.0
ReadFrom reads data from r until EOF or error. It returns the number of bytes read and any error encountered.
func (*Buffer) Reset ¶
Reset resets the buffer with the provided data slice. If the data slice is larger than the buffer size, the buffer size will be updated. Note that the data slice should have 7 extra bytes, len(data)+7 <= cap(data). Otherwise the old slice will be used or a new one need to be allocated.
type Decoder ¶
type Decoder struct {
// Data is the actual buffer. The end of the slice is also the head of
// the dictionary window.
Data []byte
// R tracks the position of the reads from the buffer and must be less
// or equal to the length of the Data slice.
R int
// Off records the total offset and marks the end of the Data slice,
// which is also the end of the dictionary window.
Off int64
// DecoderConfig provides the configuration parameters WindowSize and
// BufferSize.
DecoderConfig
}
Decoder provides a simple buffer for decoding LZ77 sequences. Data is the actual buffer. The end of the slice is also the head of the dictionary window. R tracks the read position in the buffer and must be less than or equal to the length of the Data slice. Off records the total offset and marks the end of the Data slice, which is also the end of the dictionary window. DecoderConfig provides the configuration parameters WindowSize and BufferSize.
func NewDecoder ¶
func NewDecoder(opts ...DecoderOption) (*Decoder, error)
func (*Decoder) Init ¶
func (d *Decoder) Init(opts ...DecoderOption) error
Init initializes the DecoderBuffer.
func (*Decoder) Reset ¶
func (d *Decoder) Reset()
Reset returns the DecoderBuffer to its initialized state.
func (*Decoder) Write ¶
Write inserts the slice into the buffer. The method will write the entire slice or return 0 and ErrFullBuffer.
func (*Decoder) WriteBlock ¶
WriteBlock writes sequences from the block into the buffer. Each sequence is written atomically, as the block value is not modified. If there is not enough space in the buffer, ErrFullBuffer will be returned. All written sequences and literals will be removed from the block.
The capacity of the block slices will not be maintained. You have to keep a copy of the block to achieve that.
The growth of the array is limited to BufferSize.
The function returns the number of bytes written.
func (*Decoder) WriteMatch ¶
WriteMatch appends the match to the end of the buffer. The match will be written completely, or n=0 and ErrFullBuffer will be returned.
type DecoderConfig ¶ added in v0.3.0
type DecoderConfig struct {
// Size of the sliding dictionary window in bytes.
WindowSize int
// Maximum size of the buffer in bytes.
BufferSize int
}
DecoderConfig contains the parameters for the DecoderBuffer and decoder types. WindowSize must be smaller than BufferSize. It is recommended to set BufferSize to twice the WindowSize.
type DecoderOption ¶ added in v0.7.4
type DecoderOption interface {
UpdateDecoderConfig(*DecoderConfig) error
}
type Entry ¶ added in v0.6.3
type Entry struct {
// contains filtered or unexported fields
}
Entry is returned by a Mapper for a found match. It provides the position i of the match in the Data slice of the buffer and v contains the leading 4 bytes of the match to avoid a lookup in the buffer.
type Mapper ¶ added in v0.6.3
type Mapper interface {
// InputLen returns the length of the input data into the table. We are
// supporting length from 2 to 8 bytes.
InputLen() int
// Reset resets the internal state of the mapper.
Reset()
// Shift is called by the number of bytes pruned from the buffer and
// provide the new extended buffer to the mapper.
Shift(delta int)
// Put adds all values between a and w to the mapper. We assume that
// cap(p) >= len(p) + 7.
Put(p []byte, a, w int) int
// Get returns all candidate entries for the provided hash value. The
// entry value v contains the all 4 bytes stored a position i.
Get(v uint64) []Entry
}
Mapper provides potential matches for a given position in the byte stream. Is it usually implemented by hash tables.
func NewMapper ¶ added in v0.7.1
NewMapper creates a new Mapper for the provided name of the algorithm. The mappers supported are described below.
hash_<inputLen>:<hashBits> A hash table with the provided input length and hash bits. The input length is between 2 and 8 bytes, and the hash bits can be 24 bits at maximum.
type Matcher ¶ added in v0.6.0
type Matcher interface {
// Edges returns the potential sequence at the current position.
Edges(n int) []Seq
// Skip is called to advance the current position by n bytes. An error
// is only returned if the there are not enough bytes in the buffer.
// Note that n can be negative, to allow to set the current position
// backwards.
Skip(n int) (skipped int, err error)
// Parsable returns the number of bytes that are available in the buffer
// for parsing.
Parsable() int
}
Matcher is responsible to find matches or Literal bytes in the byte stream. It is only relevant for the PathFinder.
type Parser ¶ added in v0.3.0
type Parser interface {
// Parse up to block size bytes from the internal buffer and provides
// the sequences in the block structure. While slices will be reused,
// not old information will be maintained.
Parse(blk *Block, n int, flags ParserFlags) (parsed int, err error)
// Write writes data into the internal buffer.
Write(p []byte) (n int, err error)
// ReadFrom reads data from the provided reader into the internal
// buffer.
ReadFrom(r io.Reader) (n int64, err error)
// ReadAt reads len(p) bytes from the internal buffer at offset off.
ReadAt(p []byte, off int64) (n int, err error)
// ByteAt returns the byte at offset off in the internal buffer.
ByteAt(off int64) (c byte, err error)
// Reset resets the internal buffer to the provided data.
Reset(data []byte) error
// Config returns the parser configuration.
Config() ParserConfig
}
Parser provides the possibility to parse a byte stream into LZ77 sequences.
func NewParser ¶ added in v0.6.4
func NewParser(opts ...ParserOption) (Parser, error)
NewParser creates a new parser for the provided options.
type ParserConfig ¶ added in v0.3.0
type ParserConfig struct {
PathFinder string
Mapper string
WindowSize int
RetentionSize int
BufferSize int
MinMatchLen int
MaxMatchLen int
}
ParserConfig provides the configuration for a parser. PathFinder describes the algorithm to find a path through the different matches. The Mapper is used to find potential matches.
type ParserFlags ¶ added in v0.6.0
type ParserFlags int
ParserFlags define optional parser behavior.
const ( // NoTrailingLiterals indicates that the parser should not generate // trailing literal bytes in the output. NoTrailingLiterals ParserFlags = 1 << iota )
type ParserOption ¶ added in v0.7.4
type ParserOption interface {
UpdateParserConfig(*ParserConfig) error
}
ParserOption represents a functional option for the parser configuration.
func WithMapper ¶ added in v0.7.4
func WithMapper(name string) ParserOption
WithMapper sets the mapper for the parser. The supported mappers are described below.
hash_<inputLen>:<hashBits> A hash table with the provided input length and hash bits. The input length is between 2 and 8 bytes, and the hash bits can be 24 bits at maximum.
func WithMaxMatchLen ¶ added in v0.7.4
func WithMaxMatchLen(size int) ParserOption
WithMaxMatchLen sets the maximum match length for the parser. The maximum match length must be greater than or equal to the minimum match length.
func WithMinMatchLen ¶ added in v0.7.4
func WithMinMatchLen(size int) ParserOption
WithMinMatchLen sets the minimum match length for the parser. The minimum match length must be less than or equal to the maximum match length.
func WithPathFinder ¶ added in v0.7.4
func WithPathFinder(name string) ParserOption
WithPathFinder sets the path finder for the parser. The supported path finders are described below.
greedy The greedy path finder selects the longest match at each position. ^
func WithRetentionSize ¶ added in v0.7.4
func WithRetentionSize(size int) ParserOption
WithRetentionSize sets the retention size for the parser. The retention size is the number of bytes that are kept in the buffer after they have been parsed. The retention size must be less than the buffer size.
type PathFinder ¶ added in v0.7.0
type PathFinder interface {
Parse(block *Block, n int, flags ParserFlags) (parsed int, err error)
Reset()
}
PathFinder implements the central Parse Function of the Parser.
func NewPathFinder ¶ added in v0.7.1
func NewPathFinder(name string, m Matcher) (PathFinder, error)
NewPathFinder creates a new PathFinder for the provided name of the algorithm and the Matcher. The path finders supported are described below.
greedy The greedy path finder selects the longest match at each position.
type Seq ¶
Seq represents a single Lempel-Ziv 77 sequence describing a match, consisting of the offset, the length of the match, and the number of literals preceding the match. The Aux field can be used in upper layers to store additional information. One use case is to store up to 4 bytes of literals.