lz

package module
v0.6.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 20, 2025 License: BSD-3-Clause Imports: 7 Imported by: 2

README

Module LZ

The LZ module provides sequencers that convert byte streams into blocks of Lempel-Ziv 77 sequences. It is designed to support multiple compression methods that differ in the way they are encoding those LZ77 sequences.

Documentation

Overview

Package lz supports encoding and decoding of LZ77 sequences. A sequence, as described in the Zstandard specification, consists of a literal copy command followed by a match copy command. The literal copy command is described by the length in literal bytes to be copied, and the match command consists of the distance of the match to copy and the length of the match in bytes.

A Parser converts a byte stream into blocks of sequences. The Decoder converts the block of sequences into the original decompressed byte stream.

The module provides multiple parser implementations that offer different combinations of encoding speed and compression ratios. Usually, a slower parser will generate a better compression ratio.

Parsers may use different matchers to provide their functionality. One Example is [greedyParser] which can use multiple Matcher implementations.

The library supports the implementation of parsers outside of this package.

Index

Constants

This section is empty.

Variables

View Source
var ErrEndOfBuffer = errors.New("lz: end of buffer")

ErrEndOfBuffer is returned at the end of the buffer.

View Source
var ErrFullBuffer = errors.New("lz: full buffer")

ErrFullBuffer is returned when the buffer is full and no more data can be written to it.

View Source
var ErrOutOfBuffer = errors.New("lz: offset outside of buffer")

ErrOutOfBuffer is returned when the offset is outside of the buffer.

View Source
var ErrStartOfBuffer = errors.New("lz: start of buffer")

ErrStartOfBuffer is returned at the start of the buffer.

Functions

func BufferSize added in v0.6.8

func BufferSize(opts Configurator) int

BufferSize returns the buffer size included in the provided options.

func RetentionSize added in v0.6.9

func RetentionSize(opts Configurator) int

RetentionSize returns the retentions size included in the provided options. The retention size describes the amount of data that will be kept in the buffer. It must not be larger than the WindowSize.

func SetBufferSize added in v0.6.8

func SetBufferSize(opts Configurator, bufferSize int) error

SetBufferSize sets the buffer size in the provided options.

func SetRetentionSize added in v0.6.9

func SetRetentionSize(opts Configurator, retentionSize int) error

SetRetentionSize sets the retention size in the provided options.

func SetWindowSize added in v0.6.7

func SetWindowSize(opts Configurator, windowSize int) error

SetWindowSize sets the window size in the provided options.

func WindowSize added in v0.6.7

func WindowSize(opts Configurator) int

WindowSize returns the window size from the provided options.

Types

type Block

type Block struct {
	Sequences []Seq
	Literals  []byte
}

Block stores sequences and literals. Note that the sequences stored in the Sequences slice might not consume the entire Literals slice. The remaining literal bytes must be added to the decoded text after all sequences have been decoded.

func (*Block) Len

func (b *Block) Len() int64

Len computes the length of the block in bytes. It assumes that the sum of the literal lengths in the sequences does not exceed the length of the Literals byte slice.

func (*Block) LenCheck added in v0.6.5

func (b *Block) LenCheck() (n int64, err error)

LenCheck computes the length of the block in bytes and verifies that the sum of the literal lengths in the sequences is less than the bytes in the Literals field. If that is not the case an error is returned.

type Buffer

type Buffer struct {
	Data []byte
	// Window end index
	W int
	// maximum buffer size
	Size int
	// RetentionSize is the number of bytes to keep when pruning the buffer.
	RetentionSize int
	// offset of Data
	Off int64

	// Shift will be called when the buffer is pruned to inform other
	// components about the number of bytes removed from the start of
	// the buffer.
	Shift ShiftFunc
}

Buffer is the buffer used for LZ parsing.

The Off field describes the offset of Data[0] in the original stream. The W points to the end of sliding window used for copying matches.

Data is not fully allocated at the beginning. It grows with the usage. There must be always 7 extra bytes allocated at the end of Data to allow easy reads of data from the buffer.

func (*Buffer) ByteAt added in v0.5.0

func (b *Buffer) ByteAt(off int64) (c byte, err error)

ByteAt returns the byte at offset off. If off is outside of the buffer, ErrOutOfBuffer is returned.

func (*Buffer) Init

func (b *Buffer) Init(size, retentionSize int, shift ShiftFunc) error

Init initializes the buffer. The old data slice is reused and the capacity might be larger than the new buffer size.

func (*Buffer) ReadAt added in v0.5.0

func (b *Buffer) ReadAt(p []byte, off int64) (n int, err error)

ReadAt reads len(p) bytes from the buffer starting at byte offset off. It returns the number of bytes read and any error encountered. If off is outside of the buffer, ErrOutOfBuffer is returned.

func (*Buffer) ReadFrom added in v0.5.0

func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads data from r until EOF or error. It returns the number of bytes read and any error encountered.

func (*Buffer) Reset

func (b *Buffer) Reset(data []byte) error

Reset resets the buffer with the provided data slice. If the data slice is larger than the buffer size, the buffer size will be updated. Note that the data slice should have 7 extra bytes, len(data)+7 <= cap(data). Otherwise the old slice will be used or a new one need to be allocated.

func (*Buffer) Write

func (b *Buffer) Write(p []byte) (n int, err error)

Write writes data to the buffer. If not all data can be written, ErrFullBuffer is returned.

type Configurator

type Configurator interface {
	NewParser() (Parser, error)
}

Configurator creates a parser. Usually an Options type implements the interface.

func UnmarshalJSONOptions added in v0.6.7

func UnmarshalJSONOptions(data []byte) (Configurator, error)

UnmarshalJSONOptions unmarshals parser options from JSON data.

type Decoder

type Decoder struct {
	// Data is the actual buffer. The end of the slice is also the head of
	// the dictionary window.
	Data []byte
	// R tracks the position of the reads from the buffer and must be less
	// or equal to the length of the Data slice.
	R int
	// Off records the total offset and marks the end of the Data slice,
	// which is also the end of the dictionary window.
	Off int64

	// DecoderOptions provides the configuration parameters WindowSize and
	// BufferSize.
	DecoderOptions
}

Decoder provides a simple buffer for decoding LZ77 sequences. Data is the actual buffer. The end of the slice is also the head of the dictionary window. R tracks the read position in the buffer and must be less than or equal to the length of the Data slice. Off records the total offset and marks the end of the Data slice, which is also the end of the dictionary window. DecoderConfig provides the configuration parameters WindowSize and BufferSize.

func (*Decoder) ByteAtEnd added in v0.6.0

func (d *Decoder) ByteAtEnd(off int) byte

ByteAtEnd returns the byte at the end of the buffer.

func (*Decoder) Init

func (d *Decoder) Init(opts DecoderOptions) error

Init initializes the DecoderBuffer.

func (*Decoder) Read added in v0.6.0

func (d *Decoder) Read(p []byte) (n int, err error)

Read reads decoded data from the buffer.

func (*Decoder) Reset

func (d *Decoder) Reset()

Reset returns the DecoderBuffer to its initialized state.

func (*Decoder) Write

func (d *Decoder) Write(p []byte) (n int, err error)

Write inserts the slice into the buffer. The method will write the entire slice or return 0 and ErrFullBuffer.

func (*Decoder) WriteBlock

func (d *Decoder) WriteBlock(blk *Block) (n int, err error)

WriteBlock writes sequences from the block into the buffer. Each sequence is written atomically, as the block value is not modified. If there is not enough space in the buffer, ErrFullBuffer will be returned. All written sequences and literals will be removed from the block.

The capacity of the block slices will not be maintained. You have to keep a copy of the block to achieve that.

The growth of the array is limited to BufferSize.

The function returns the number of bytes written.

func (*Decoder) WriteByte added in v0.2.1

func (d *Decoder) WriteByte(c byte) error

WriteByte writes a single byte into the buffer.

func (*Decoder) WriteMatch

func (d *Decoder) WriteMatch(mu, ou uint32) (n int, err error)

WriteMatch appends the ma tch to the end of the buffer. The match will be written completely, or n=0 and ErrFullBuffer will be returned.

func (*Decoder) WriteTo added in v0.6.0

func (d *Decoder) WriteTo(w io.Writer) (n int64, err error)

WriteTo writes the decoded data to the writer.

type DecoderOptions added in v0.6.0

type DecoderOptions struct {
	// Size of the sliding dictionary window in bytes.
	WindowSize int
	// Maximum size of the buffer in bytes.
	BufferSize int
}

DecoderOptions contains the parameters for the DecoderBuffer and decoder types. WindowSize must be smaller than BufferSize. It is recommended to set BufferSize to twice the WindowSize.

func (*DecoderOptions) NewDecoder added in v0.6.6

func (opts *DecoderOptions) NewDecoder() (*Decoder, error)

type Entry added in v0.6.3

type Entry struct {
	// contains filtered or unexported fields
}

Entry is returned by a Mapper for a found match.

type GenericMatcherOptions added in v0.6.6

type GenericMatcherOptions struct {
	BufferSize    int
	WindowSize    int
	RetentionSize int
	MinMatchLen   int
	MaxMatchLen   int

	MapperOptions MapperConfigurator
}

GenericMatcherOptions provide the options for a generic matcher.

func (*GenericMatcherOptions) MarshalJSON added in v0.6.6

func (opts *GenericMatcherOptions) MarshalJSON() ([]byte, error)

MarshalJSON marshals the matcher options into JSON and adds the MatcherType field.

func (*GenericMatcherOptions) NewMatcher added in v0.6.6

func (opts *GenericMatcherOptions) NewMatcher() (Matcher, error)

NewMatcher creates a new generic matcher using the generic matcher options.

func (*GenericMatcherOptions) UnmarshalJSON added in v0.6.6

func (opts *GenericMatcherOptions) UnmarshalJSON(data []byte) error

type GreedyParserOptions added in v0.6.0

type GreedyParserOptions struct {
	MatcherOptions MatcherConfigurator
}

GreedyParserOptions defines the configuration options for a greedy parser.

func (*GreedyParserOptions) MarshalJSON added in v0.6.7

func (gpo *GreedyParserOptions) MarshalJSON() ([]byte, error)

MarshalJSON provides a custom JSON marshaller that adds a type field to the JSON structure.

func (*GreedyParserOptions) NewParser added in v0.6.2

func (gpo *GreedyParserOptions) NewParser() (Parser, error)

NewParser creates a new greedy parser using the greedy parser options.

func (*GreedyParserOptions) UnmarshalJSON added in v0.6.7

func (gpo *GreedyParserOptions) UnmarshalJSON(data []byte) error

UnmarshalJSON provides a custom JSON unmarshaler that parses the type field from the JSON structure.

type HashOptions added in v0.6.0

type HashOptions struct {
	InputLen int
	HashBits int
}

HashOptions provides the parameters for the Hash Mapper.

func (*HashOptions) GetInputLen added in v0.6.6

func (hopts *HashOptions) GetInputLen() int

GetInputLen returns the input length.

func (*HashOptions) MarshalJSON added in v0.6.6

func (hopts *HashOptions) MarshalJSON() ([]byte, error)

MarshalJSON generates the JSON representation of HashOptions by adding the Mapper field and set it to "hash".

func (*HashOptions) NewMapper added in v0.6.6

func (hopts *HashOptions) NewMapper() (Mapper, error)

NewMapper creates the hash mapper.

type Mapper added in v0.6.3

type Mapper interface {
	InputLen() int
	Reset()
	// Shift is called by the number of bytes pruned from the buffer.
	Shift(delta int)
	Put(a, w int, p []byte) int

	// Get returns all candidate entries for the provided hash value. The
	// entry value v contains the all 4 bytes stored a position i.
	Get(v uint64) []Entry
}

Mapper will be typically implemented by hash tables.

The Put method returns the number of trailing bytes that could not be hashed. Shift is called, when n bytes have been pruned from the buffer.

type MapperConfigurator added in v0.6.6

type MapperConfigurator interface {
	NewMapper() (Mapper, error)
}

MapperConfigurator creates a mapper, usually an Options type implements this function.

func UnmarshalJSONMapperOptions added in v0.6.6

func UnmarshalJSONMapperOptions(data []byte) (MapperConfigurator, error)

UnmarshalJSONMapperOptions unmarshals mapper options from JSON data. The function looks first for the MapperType field to determine the type of mapper to create.

type Matcher added in v0.6.0

type Matcher interface {
	Edges(n int) []Seq
	Skip(n int) (skipped int, err error)

	Write(p []byte) (n int, err error)
	ReadFrom(r io.Reader) (n int64, err error)

	ReadAt(p []byte, off int64) (n int, err error)
	ByteAt(off int64) (c byte, err error)

	Reset(data []byte) error
	Buf() *Buffer

	Options() MatcherConfigurator
}

Matcher is responsible to find matches or Literal bytes in the byte stream.

type MatcherConfigurator added in v0.6.6

type MatcherConfigurator interface {
	NewMatcher() (Matcher, error)
}

MatcherConfigurator creates a matcher, usually an Options type implements the interface.

func UnmarshalJSONMatcherOptions added in v0.6.6

func UnmarshalJSONMatcherOptions(data []byte) (MatcherConfigurator, error)

UnmarshalJSONMatcherOptions unmarshals matcher options from JSON data. The function looks first for the MatcherType field to determine the type of matcher to create.

type Parser added in v0.3.0

type Parser interface {
	// Parse up to block size bytes from the internal buffer and provides
	// the sequences in the block structure. While slices will be reused,
	// not old information will be maintained.
	Parse(blk *Block, n int, flags ParserFlags) (parsed int, err error)

	// Write writes data into the internal buffer.
	Write(p []byte) (n int, err error)

	// ReadFrom reads data from the provided reader into the internal
	// buffer.
	ReadFrom(r io.Reader) (n int64, err error)

	// ReadAt reads len(p) bytes from the internal buffer at offset off.
	ReadAt(p []byte, off int64) (n int, err error)

	// ByteAt returns the byte at offset off in the internal buffer.
	ByteAt(off int64) (c byte, err error)

	// Reset resets the internal buffer to the provided data.
	Reset(data []byte) error

	// Buf returns the internal buffer used by the parser.
	Buf() *Buffer

	// Options returns the options used to create the parser.
	Options() Configurator
}

Parser provides the possibility to parse a byte stream into LZ77 sequences.

type ParserFlags added in v0.6.0

type ParserFlags int

ParserFlags define optional parser behavior.

const (
	// NoTrailingLiterals indicates that the parser should not generate
	// trailing literal bytes in the output.
	NoTrailingLiterals ParserFlags = 1 << iota
)

type Seq

type Seq struct {
	LitLen   uint32
	MatchLen uint32
	Offset   uint32
	Aux      uint32
}

Seq represents a single Lempel-Ziv 77 sequence describing a match, consisting of the offset, the length of the match, and the number of literals preceding the match. The Aux field can be used in upper layers to store additional information.

func (Seq) Len

func (s Seq) Len() int64

Len returns the complete length of the sequence in bytes.

type ShiftFunc added in v0.6.9

type ShiftFunc func(delta int)

ShiftFunc defines the type of the shift function, called when the buffer is pruned to provide more available space.

Directories

Path Synopsis
Package lz supports encoding and decoding of LZ77 sequences.
Package lz supports encoding and decoding of LZ77 sequences.
Package suffix provides a suffix sort algorithm.
Package suffix provides a suffix sort algorithm.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL