lz

package module
v0.5.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 26, 2025 License: BSD-3-Clause Imports: 12 Imported by: 2

README

Module LZ

The LZ module provides sequencers that convert byte streams into blocks of Lempel-Ziv 77 sequences. It is designed to support multiple compression methods that differ in the way they are encoding those LZ77 sequences.

Documentation

Overview

Package lz supports encoding and decoding of LZ77 sequences. A sequence, as described in the Zstandard specification, consists of a literal copy command followed by a match copy command. The literal copy command is described by the length in literal bytes to be copied, and the match command consists of the distance of the match to copy and the length of the match in bytes.

A Parser converts a byte stream into blocks of sequences. The DecoderBuffer converts the block of sequences into the original decompressed byte stream.

The module provides multiple parser implementations that offer different combinations of encoding speed and compression ratios. Usually, a slower parser will generate a better compression ratio.

The library supports the implementation of parsers outside of this package.

Index

Constants

View Source
const (
	// NoTrailingLiterals tells a parser that trailing literals do not
	// need to be included in the block.
	NoTrailingLiterals = 1 << iota
)

Flags for the sequence function stored in the block structure.

Variables

View Source
var (
	ErrOutOfBuffer = errors.New("lz: offset outside of buffer")
	ErrEndOfBuffer = errors.New("lz: end of buffer")
)

Errors returned by [SeqBuffer.ReadAt]

View Source
var ErrEmptyBuffer = errors.New("lz: no more data in buffer")

ErrEmptyBuffer indicates that no more data is available in the buffer. It is returned by the Parse method of Parser.

View Source
var ErrFullBuffer = errors.New("lz: buffer is full")

ErrFullBuffer indicates that the buffer is full. It is returned by the Write and ReadFrom methods of Parser.

Functions

func XZCost added in v0.2.0

func XZCost(m, o uint32) uint64

XZCost models the cost of the bits going into the XZ encoding. The maximum edge length is 273.

Types

type BDHPConfig added in v0.3.0

type BDHPConfig struct {
	InputLen1 int
	HashBits1 int
	InputLen2 int
	HashBits2 int

	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int
}

BDHPConfig provides the configuration parameters for the Backward-looking Double Hash Parser.

func (*BDHPConfig) BufConfig added in v0.3.0

func (cfg *BDHPConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*BDHPConfig) Clone added in v0.3.1

func (cfg *BDHPConfig) Clone() ParserConfig

Clone creates a copy of the configuration.

func (*BDHPConfig) MarshalJSON added in v0.3.0

func (cfg *BDHPConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Type with value "BDHP" to the structure.

func (BDHPConfig) NewParser added in v0.3.0

func (cfg BDHPConfig) NewParser() (s Parser, err error)

NewParser creates a new DoubleHashParser.

func (*BDHPConfig) SetBufConfig added in v0.3.0

func (cfg *BDHPConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration.

func (*BDHPConfig) SetDefaults added in v0.3.0

func (cfg *BDHPConfig) SetDefaults()

SetDefaults uses the defaults for the configuration parameters that are set to zero.

func (*BDHPConfig) UnmarshalJSON added in v0.3.0

func (cfg *BDHPConfig) UnmarshalJSON(p []byte) error

UnmarshalJSON parses the JSON value and sets the fields of BDHPConfig.

func (*BDHPConfig) Verify added in v0.3.0

func (cfg *BDHPConfig) Verify() error

Verify checks the configuration for errors.

type BHPConfig added in v0.3.0

type BHPConfig struct {
	InputLen int
	HashBits int

	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int
}

BHPConfig provides the parameters for the backward hash parser.

func (*BHPConfig) BufConfig added in v0.3.0

func (cfg *BHPConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*BHPConfig) Clone added in v0.3.1

func (cfg *BHPConfig) Clone() ParserConfig

Clone creates a copy of the configuration.

func (*BHPConfig) MarshalJSON added in v0.3.0

func (cfg *BHPConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Type with value "BHP" to the structure.

func (BHPConfig) NewParser added in v0.3.0

func (cfg BHPConfig) NewParser() (s Parser, err error)

NewParser creates a new Backward Hash Parser.

func (*BHPConfig) SetBufConfig added in v0.3.0

func (cfg *BHPConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the backward hash parser configuration.

func (*BHPConfig) SetDefaults added in v0.3.0

func (cfg *BHPConfig) SetDefaults()

SetDefaults sets values that are zero to their defaults values.

func (*BHPConfig) UnmarshalJSON added in v0.3.0

func (cfg *BHPConfig) UnmarshalJSON(p []byte) error

UnmarshalJSON parses the JSON value and sets the fields of BHPConfig.

func (*BHPConfig) Verify added in v0.3.0

func (cfg *BHPConfig) Verify() error

Verify checks the configuration for correctness.

type BUPConfig added in v0.3.0

type BUPConfig struct {
	InputLen   int
	HashBits   int
	BucketSize int

	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int
}

BUPConfig provides the configuration parameters for the bucket hash parser.

func (*BUPConfig) BufConfig added in v0.3.0

func (cfg *BUPConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*BUPConfig) Clone added in v0.3.1

func (cfg *BUPConfig) Clone() ParserConfig

Clone creates a copy of the configuration.

func (*BUPConfig) MarshalJSON added in v0.3.0

func (cfg *BUPConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Type with value "BUP" to the structure.

func (BUPConfig) NewParser added in v0.3.0

func (cfg BUPConfig) NewParser() (s Parser, err error)

NewParser creates a new hash parser.

func (*BUPConfig) SetBufConfig added in v0.3.0

func (cfg *BUPConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the parser configuration.

func (*BUPConfig) SetDefaults added in v0.3.0

func (cfg *BUPConfig) SetDefaults()

SetDefaults sets values that are zero to their defaults values.

func (*BUPConfig) UnmarshalJSON added in v0.3.0

func (cfg *BUPConfig) UnmarshalJSON(p []byte) error

UnmarshalJSON parses the JSON value and sets the fields of BUPConfig.

func (*BUPConfig) Verify added in v0.3.0

func (cfg *BUPConfig) Verify() error

Verify checks the config for correctness.

type Block

type Block struct {
	Sequences []Seq
	Literals  []byte
}

Block stores sequences and literals. Note that the sequences stored in the Sequences slice might not consume the entire Literals slice. The remaining literal bytes must be added to the decoded text after all sequences have been decoded.

func (*Block) Len

func (b *Block) Len() int64

Len computes the length of the block in bytes. It assumes that the sum of the literal lengths in the sequences does not exceed the length of the Literals byte slice.

type BufConfig added in v0.2.0

type BufConfig struct {
	ShrinkSize int
	BufferSize int

	WindowSize int
	BlockSize  int
}

BufConfig describes the various sizes relevant for the buffer. Note that ShrinkSize should be significantly smaller than BufferSize and at most 50% of it. The WindowSize is independent of the BufferSize, but usually the BufferSize should be larger or equal the WindowSize. The actual sequencing happens in blocks. A typical BlockSize 128 kByte as used by ZStandard specification.

func (*BufConfig) SetDefaults added in v0.2.1

func (cfg *BufConfig) SetDefaults()

SetDefaults sets the defaults for the various size values. The defaults are given below.

BufferSize:   8 MiB
ShrinkSize:  32 KiB (or half of BufferSize, if it is smaller than 64 KiB)
WindowSize: BufferSize
BlockSize:  128 KiB

func (*BufConfig) Verify added in v0.2.0

func (cfg *BufConfig) Verify() error

Verify checks the buffer configuration. Note that window size and block size are independent of the rest of the other sizes only the shrink size must be less than the buffer size.

type Buffer

type Buffer struct {
	// actual buffer data
	Data []byte

	// w position of the head of the window in data.
	W int

	// off start of the data slice, counts all data written and discarded
	// from the buffer.
	Off int64

	BufConfig
}

Buffer provides a base for Parser implementation. Since the package allows implementations outside of the package. All members are public.

func (*Buffer) ByteAt added in v0.5.0

func (b *Buffer) ByteAt(off int64) (c byte, err error)

ByteAt returns the byte at total offset off, if it can be provided. If off points to the end of the buffer, ErrEndOfBuffer will be returned otherwise ErrOutOfBuffer.

func (*Buffer) Init

func (b *Buffer) Init(cfg BufConfig) error

Init initializes the buffer. The function sets the defaults for the buffer configuration if required and verifies it. Errors will be reported.

func (*Buffer) PeekAt added in v0.5.0

func (b *Buffer) PeekAt(n int, off int64) (p []byte, err error)

PeekAt returns part of the internal data slice starting at total offset off. The total offset takes all data written to the buffer into account. If the off parameter is outside the current buffer ErrOutOfBuffer will be returned. If less than n bytes of data can be provided ErrEndOfBuffer will be returned.

func (*Buffer) ReadAt added in v0.5.0

func (b *Buffer) ReadAt(p []byte, off int64) (n int, err error)

ReadAt reads data from the buffer at position off. If off is is outside the buffer ErrOutOfBuffer will be reported. If there is not enough data to fill p ErrEndOfBuffer will be reported. See [SeqBuffer.PeekAt] for avoiding the copy.

func (*Buffer) ReadFrom added in v0.5.0

func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads the data from reader into the buffer. If there is an error it will be reported. If the buffer is full, ErrFullBuffer will be reported.

func (*Buffer) Reset

func (b *Buffer) Reset(data []byte) error

Reset initializes the buffer with new data. The data slice requires a margin of 7 bytes for the hash parsers to be used directly. If there is no margin the data will be copied into a slice with enough capacity.

func (*Buffer) Shrink added in v0.5.0

func (b *Buffer) Shrink() int

Shrink will move the window head to the shrink size if it is larger. The amount of data discarded from the buffer, named delta, will be returned.

func (*Buffer) Write

func (b *Buffer) Write(p []byte) (n int, err error)

Write writes data into the buffer. If not the complete p slice can be copied into the buffer, Write will return ErrFullBuffer.

type DHPConfig added in v0.3.0

type DHPConfig struct {
	InputLen1 int
	HashBits1 int
	InputLen2 int
	HashBits2 int

	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int
}

DHPConfig provides the configuration parameters for the DoubleHashParser.

func (*DHPConfig) BufConfig added in v0.3.0

func (cfg *DHPConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*DHPConfig) Clone added in v0.3.1

func (cfg *DHPConfig) Clone() ParserConfig

Clone creates a copy of the configuration.

func (*DHPConfig) MarshalJSON added in v0.3.0

func (cfg *DHPConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Type with value "DHP" to the structure.

func (DHPConfig) NewParser added in v0.3.0

func (cfg DHPConfig) NewParser() (s Parser, err error)

NewParser creates a new DoubleHashParser.

func (*DHPConfig) SetBufConfig added in v0.3.0

func (cfg *DHPConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the parser configuration.

func (*DHPConfig) SetDefaults added in v0.3.0

func (cfg *DHPConfig) SetDefaults()

SetDefaults uses the defaults for the configuration parameters that are set to zero.

func (*DHPConfig) UnmarshalJSON added in v0.3.0

func (cfg *DHPConfig) UnmarshalJSON(p []byte) error

UnmarshalJSON parses the JSON value and sets the fields of DHPConfig.

func (*DHPConfig) Verify added in v0.3.0

func (cfg *DHPConfig) Verify() error

Verify checks the configuration for errors.

type DecoderBuffer added in v0.3.0

type DecoderBuffer struct {
	// Data is the actual buffer. The end of the slice is also the head of
	// the dictionary window.
	Data []byte
	// R tracks the position of the reads from the buffer and must be less
	// or equal to the length of the Data slice.
	R int
	// Off records the total offset and marks the end of the Data slice,
	// which is also the end of the dictionary window.
	Off int64

	// DecConfig provides the configuration parameters WindowSize and
	// BufferSize.
	DecoderConfig
}

DecoderBuffer provides a simple buffer for decoding LZ77 sequences. Data is the actual buffer. The end of the slice is also the head of the dictionary window. R tracks the read position in the buffer and must be less than or equal to the length of the Data slice. Off records the total offset and marks the end of the Data slice, which is also the end of the dictionary window. DecoderConfig provides the configuration parameters WindowSize and BufferSize.

func (*DecoderBuffer) ByteAtEnd added in v0.3.0

func (b *DecoderBuffer) ByteAtEnd(off int) byte

ByteAtEnd returns the byte at the end of the buffer.

func (*DecoderBuffer) Init added in v0.3.0

func (b *DecoderBuffer) Init(cfg DecoderConfig) error

Init initializes the DecoderBuffer.

func (*DecoderBuffer) Read added in v0.3.0

func (b *DecoderBuffer) Read(p []byte) (n int, err error)

Read reads decoded data from the buffer.

func (*DecoderBuffer) Reset added in v0.3.0

func (b *DecoderBuffer) Reset()

Reset returns the DecoderBuffer to its initialized state.

func (*DecoderBuffer) Write added in v0.3.0

func (b *DecoderBuffer) Write(p []byte) (n int, err error)

Write inserts the slice into the buffer. The method will write the entire slice or return 0 and ErrFullBuffer.

func (*DecoderBuffer) WriteBlock added in v0.3.0

func (b *DecoderBuffer) WriteBlock(blk Block) (n, k, l int, err error)

WriteBlock writes sequences from the block into the buffer. Each sequence is written atomically, as the block value is not modified. If there is not enough space in the buffer, ErrFullBuffer will be returned.

The growth of the array is not limited to BufferSize. This may consume more memory, but increases speed.

The return values n, k, and l indicate the number of bytes written to the buffer, the number of sequences, and the number of literals, respectively.

func (*DecoderBuffer) WriteByte added in v0.3.0

func (b *DecoderBuffer) WriteByte(c byte) error

WriteByte writes a single byte into the buffer.

func (*DecoderBuffer) WriteMatch added in v0.3.0

func (b *DecoderBuffer) WriteMatch(m, o uint32) (n int, err error)

WriteMatch appends the ma tch to the end of the buffer. The match will be written completely, or n=0 and ErrFullBuffer will be returned.

func (*DecoderBuffer) WriteTo added in v0.3.0

func (b *DecoderBuffer) WriteTo(w io.Writer) (n int64, err error)

WriteTo writes the decoded data to the writer.

type DecoderConfig added in v0.3.0

type DecoderConfig struct {
	// Size of the sliding dictionary window in bytes.
	WindowSize int
	// Maximum size of the buffer in bytes.
	BufferSize int
}

DecoderConfig contains the parameters for the DecoderBuffer and decoder types. WindowSize must be smaller than BufferSize. It is recommended to set BufferSize to twice the WindowSize.

func (*DecoderConfig) SetDefaults added in v0.3.0

func (cfg *DecoderConfig) SetDefaults()

SetDefaults assigns default values to zero fields in DecoderConfig.

func (*DecoderConfig) Verify added in v0.3.0

func (cfg *DecoderConfig) Verify() error

Verify checks the parameters of the DecoderConfig value and returns an error for the first issue found.

type GSAPConfig added in v0.3.0

type GSAPConfig struct {
	// minimum match len
	MinMatchLen int

	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int
}

GSAPConfig defines the configuration parameter for the greedy suffix array parser.

func (*GSAPConfig) BufConfig added in v0.3.0

func (cfg *GSAPConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*GSAPConfig) Clone added in v0.3.1

func (cfg *GSAPConfig) Clone() ParserConfig

Clone creates a copy of the configuration.

func (*GSAPConfig) MarshalJSON added in v0.3.0

func (cfg *GSAPConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Type with value "GSAP" to the structure.

func (GSAPConfig) NewParser added in v0.3.0

func (cfg GSAPConfig) NewParser() (s Parser, err error)

NewParser generates a new parser using the configuration parameters in the structure.

func (*GSAPConfig) SetBufConfig added in v0.3.0

func (cfg *GSAPConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the parser configuration.

func (*GSAPConfig) SetDefaults added in v0.3.0

func (cfg *GSAPConfig) SetDefaults()

SetDefaults sets configuration parameters to its defaults. The code doesn't provide consistency.

func (*GSAPConfig) UnmarshalJSON added in v0.3.0

func (cfg *GSAPConfig) UnmarshalJSON(p []byte) error

UnmarshalJSON parses the JSON value and sets the fields of GSAPConfig.

func (*GSAPConfig) Verify added in v0.3.0

func (cfg *GSAPConfig) Verify() error

Verify checks the configuration for inconsistencies.

type HPConfig added in v0.3.0

type HPConfig struct {
	InputLen int
	HashBits int

	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int
}

HPConfig provides the configuration parameters for the HashParser. Parser doesn't use ShrinkSize and and BufferSize itself, but it provides it to other code that have to handle the buffer.

func (*HPConfig) BufConfig added in v0.3.0

func (cfg *HPConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value containing the buffer parameters.

func (*HPConfig) Clone added in v0.3.1

func (cfg *HPConfig) Clone() ParserConfig

Clone creates a copy of the configuration.

func (*HPConfig) MarshalJSON added in v0.3.0

func (cfg *HPConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Type with value "HP" to the structure.

func (HPConfig) NewParser added in v0.3.0

func (cfg HPConfig) NewParser() (s Parser, err error)

NewParser creates a new hash parser.

func (*HPConfig) SetBufConfig added in v0.3.0

func (cfg *HPConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the parser configuration.

func (*HPConfig) SetDefaults added in v0.3.0

func (cfg *HPConfig) SetDefaults()

SetDefaults sets values that are zero to their defaults values.

func (*HPConfig) UnmarshalJSON added in v0.3.0

func (cfg *HPConfig) UnmarshalJSON(p []byte) error

UnmarshalJSON converts the JSON into the HPConfig structure.

func (*HPConfig) Verify added in v0.3.0

func (cfg *HPConfig) Verify() error

Verify checks the configuration for correctness.

type OSAPConfig added in v0.3.0

type OSAPConfig struct {
	MinMatchLen int
	MaxMatchLen int

	Cost string

	ShrinkSize int
	BufferSize int
	WindowSize int
	BlockSize  int
}

OSAPConfig provides the configuration parameters for the Optimizing Suffix Array Parser (OSAP).

func (*OSAPConfig) BufConfig added in v0.3.0

func (cfg *OSAPConfig) BufConfig() BufConfig

BufConfig returns the BufConfig value for the OSAP configuration.

func (*OSAPConfig) Clone added in v0.3.1

func (cfg *OSAPConfig) Clone() ParserConfig

Clone creates a copy of the configuration.

func (*OSAPConfig) MarshalJSON added in v0.3.0

func (cfg *OSAPConfig) MarshalJSON() (p []byte, err error)

MarshalJSON creates the JSON string for the configuration. Note that it adds a property Type with value "OSAP" to the structure.

func (*OSAPConfig) NewParser added in v0.3.0

func (cfg *OSAPConfig) NewParser() (s Parser, err error)

NewParser returns the Optimizing Parser Array Parser.

func (*OSAPConfig) SetBufConfig added in v0.3.0

func (cfg *OSAPConfig) SetBufConfig(bc BufConfig)

SetBufConfig sets the buffer configuration parameters of the parser configuration.

func (*OSAPConfig) SetDefaults added in v0.3.0

func (cfg *OSAPConfig) SetDefaults()

SetDefaults sets the defaults for the zero values of the the OSAP configuration.

func (*OSAPConfig) UnmarshalJSON added in v0.3.0

func (cfg *OSAPConfig) UnmarshalJSON(p []byte) error

UnmarshalJSON parses the JSON value and sets the fields of OSAPConfig.

func (*OSAPConfig) Verify added in v0.3.0

func (cfg *OSAPConfig) Verify() error

Verify verifies the configuration for the Optimizing Suffix Array Parser.

type Parser added in v0.3.0

type Parser interface {
	Parse(blk *Block, flags int) (n int, err error)
	Reset(data []byte) error
	Shrink() int
	ParserConfig() ParserConfig
	Write(p []byte) (n int, err error)
	ReadFrom(r io.Reader) (n int64, err error)
	ReadAt(p []byte, off int64) (n int, err error)
	ByteAt(off int64) (c byte, err error)
}

Parser provides the basic interface for a parser. Most functions are provided by the underlying Buffer.

type ParserConfig added in v0.3.0

type ParserConfig interface {
	NewParser() (p Parser, err error)
	BufConfig() BufConfig
	SetBufConfig(bcfg BufConfig)
	json.Marshaler
	json.Unmarshaler
	Clone() ParserConfig
	SetDefaults()
	Verify() error
}

ParserConfig provides the interface to parser configurations.

func ParseJSON added in v0.2.1

func ParseJSON(data []byte) (ParserConfig, error)

type Seq

type Seq struct {
	LitLen   uint32
	MatchLen uint32
	Offset   uint32
	Aux      uint32
}

Seq represents a single Lempel-Ziv 77 sequence describing a match, consisting of the offset, the length of the match, and the number of literals preceding the match. The Aux field can be used in upper layers to store additional information.

func (Seq) Len

func (s Seq) Len() int64

Len returns the complete length of the sequence in bytes.

Directories

Path Synopsis
Package suffix provides a suffix sort algorithm.
Package suffix provides a suffix sort algorithm.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL