lz

package module

v0.6.1 Latest Latest Go to latest Published: Oct 15, 2025 License: BSD-3-Clause Imports: 5 Imported by: 2

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ulikunitz/lz

Links

Open Source Insights

README ¶

Module LZ

The LZ module provides sequencers that convert byte streams into blocks of Lempel-Ziv 77 sequences. It is designed to support multiple compression methods that differ in the way they are encoding those LZ77 sequences.

Documentation ¶

Overview ¶

Package lz supports encoding and decoding of LZ77 sequences. A sequence, as described in the Zstandard specification, consists of a literal copy command followed by a match copy command. The literal copy command is described by the length in literal bytes to be copied, and the match command consists of the distance of the match to copy and the length of the match in bytes.

A Parser converts a byte stream into blocks of sequences. The Decoder converts the block of sequences into the original decompressed byte stream.

The module provides multiple parser implementations that offer different combinations of encoding speed and compression ratios. Usually, a slower parser will generate a better compression ratio.

Parsers may use different matchers to provide their functionality. One Example is GreedyParser which can use multiple Matcher implementations.

The library supports the implementation of parsers outside of this package.

Index ¶

Variables
type Block
- func (b *Block) Len() int64
type Buffer
- func (b *Buffer) ByteAt(off int64) (c byte, err error)
- func (b *Buffer) Init(size int) error
- func (b *Buffer) Prune(n int) int
- func (b *Buffer) ReadAt(p []byte, off int64) (n int, err error)
- func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error)
- func (b *Buffer) Reset(data []byte) error
- func (b *Buffer) Write(p []byte) (n int, err error)
type Decoder
- func NewDecoder(opts *DecoderOptions) (b *Decoder, err error)
- func (b *Decoder) ByteAtEnd(off int) byte
- func (b *Decoder) Init(opts DecoderOptions) error
- func (b *Decoder) Read(p []byte) (n int, err error)
- func (b *Decoder) Reset()
- func (b *Decoder) Write(p []byte) (n int, err error)
- func (b *Decoder) WriteBlock(blk *Block) (n int, err error)
- func (b *Decoder) WriteByte(c byte) error
- func (b *Decoder) WriteMatch(mu, ou uint32) (n int, err error)
- func (b *Decoder) WriteTo(w io.Writer) (n int64, err error)
type DecoderOptions
- func (cfg *DecoderOptions) SetDefaults()
- func (cfg *DecoderOptions) Verify() error
type GreedyParser
- func NewGreedyParser(m Matcher, opts *GreedyParserOptions) (p *GreedyParser, err error)
- func (p *GreedyParser) Parse(blk *Block, n int, flags ParserFlags) (parsed int, err error)
type GreedyParserOptions
- func (opts *GreedyParserOptions) SetDefaults()
- func (opts *GreedyParserOptions) Verify() error
type HashMatcher
- func NewHashMatcher(opts *HashOptions) (m *HashMatcher, err error)
- func (m *HashMatcher) AppendEdges(q []Seq, n int) []Seq
- func (m *HashMatcher) Buf() *Buffer
- func (m *HashMatcher) Prune(n int) int
- func (m *HashMatcher) Reset(data []byte) error
- func (m *HashMatcher) Skip(n int) (skipped int, err error)
type HashOptions
- func (opt *HashOptions) SetDefaults()
- func (opt *HashOptions) Verify() error
type Matcher
type Parser
type ParserFlags
type Seq
- func (s Seq) Len() int64

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrEndOfBuffer = errors.New("lz: end of buffer")

ErrEndOfBuffer is returned at the end of the buffer.

View Source

var ErrFullBuffer = errors.New("lz: full buffer")

ErrFullBuffer is returned when the buffer is full and no more data can be written to it.

View Source

var ErrOutOfBuffer = errors.New("lz: offset outside of buffer")

ErrOutOfBuffer is returned when the offset is outside of the buffer.

View Source

var ErrStartOfBuffer = errors.New("lz: start of buffer")

ErrStartOfBuffer is returned at the start of the buffer.

Functions ¶

This section is empty.

Types ¶

type Block ¶

type Block struct {
	Sequences []Seq
	Literals  []byte
}

Block stores sequences and literals. Note that the sequences stored in the Sequences slice might not consume the entire Literals slice. The remaining literal bytes must be added to the decoded text after all sequences have been decoded.

func (*Block) Len ¶

func (b *Block) Len() int64

Len computes the length of the block in bytes. It assumes that the sum of the literal lengths in the sequences does not exceed the length of the Literals byte slice.

type Buffer ¶

type Buffer struct {
	Data []byte
	// Window end index
	W int
	// maximum buffer size
	Size int
	// offset of Data
	Off int64
}

Buffer is the buffer used for LZ parsing.

The Off field describes the offset of Data[0] in the original stream. The W points to the end of sliding window used for copying matches.

Data is not fully allocated at the beginning. It grows with the usage. There must be always 7 extra bytes allocated at the end of Data to allow easy reads of data from the buffer.

func (*Buffer) ByteAt ¶ added in v0.5.0

func (b *Buffer) ByteAt(off int64) (c byte, err error)

ByteAt returns the byte at offset off. If off is outside of the buffer, ErrOutOfBuffer is returned.

func (*Buffer) Init ¶

func (b *Buffer) Init(size int) error

Init initializes the buffer. The old data slice is reused and the capacity might be larger than the new buffer size.

func (*Buffer) Prune ¶ added in v0.6.0

func (b *Buffer) Prune(n int) int

Prune cuts the first n bytes from the buffer. If n is larger than the window index W it will be set to W. The number of bytes actually pruned is returned.

func (*Buffer) ReadAt ¶ added in v0.5.0

func (b *Buffer) ReadAt(p []byte, off int64) (n int, err error)

ReadAt reads len(p) bytes from the buffer starting at byte offset off. It returns the number of bytes read and any error encountered. If off is outside of the buffer, ErrOutOfBuffer is returned.

func (*Buffer) ReadFrom ¶ added in v0.5.0

func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads data from r until EOF or error. It returns the number of bytes read and any error encountered.

func (*Buffer) Reset ¶

func (b *Buffer) Reset(data []byte) error

Reset resets the buffer with the provided data slice. If the data slice is larger than the buffer size, the buffer size will be updated. Note that the data slice should have 7 extra bytes, len(data)+7 <= cap(data). Otherwise the old slice will be used or a new one need to be allocated.

func (*Buffer) Write ¶

func (b *Buffer) Write(p []byte) (n int, err error)

Write writes data to the buffer. If not all data can be written, ErrFullBuffer is returned.

type Decoder ¶

type Decoder struct {
	// Data is the actual buffer. The end of the slice is also the head of
	// the dictionary window.
	Data []byte
	// R tracks the position of the reads from the buffer and must be less
	// or equal to the length of the Data slice.
	R int
	// Off records the total offset and marks the end of the Data slice,
	// which is also the end of the dictionary window.
	Off int64

	// DecoderOptions provides the configuration parameters WindowSize and
	// BufferSize.
	DecoderOptions
}

Decoder provides a simple buffer for decoding LZ77 sequences. Data is the actual buffer. The end of the slice is also the head of the dictionary window. R tracks the read position in the buffer and must be less than or equal to the length of the Data slice. Off records the total offset and marks the end of the Data slice, which is also the end of the dictionary window. DecoderConfig provides the configuration parameters WindowSize and BufferSize.

func NewDecoder ¶

func NewDecoder(opts *DecoderOptions) (b *Decoder, err error)

NewDecoder creates and initializes a new Decoder.

func (*Decoder) ByteAtEnd ¶ added in v0.6.0

func (b *Decoder) ByteAtEnd(off int) byte

ByteAtEnd returns the byte at the end of the buffer.

func (*Decoder) Init ¶

func (b *Decoder) Init(opts DecoderOptions) error

Init initializes the DecoderBuffer.

func (*Decoder) Read ¶ added in v0.6.0

func (b *Decoder) Read(p []byte) (n int, err error)

Read reads decoded data from the buffer.

func (*Decoder) Reset ¶

func (b *Decoder) Reset()

Reset returns the DecoderBuffer to its initialized state.

func (*Decoder) Write ¶

func (b *Decoder) Write(p []byte) (n int, err error)

Write inserts the slice into the buffer. The method will write the entire slice or return 0 and ErrFullBuffer.

func (*Decoder) WriteBlock ¶

func (b *Decoder) WriteBlock(blk *Block) (n int, err error)

WriteBlock writes sequences from the block into the buffer. Each sequence is written atomically, as the block value is not modified. If there is not enough space in the buffer, ErrFullBuffer will be returned. All written sequences and literals will be removed from the block.

The capacity of the block slices will not be maintained. You have to keep a copy of the block to achieve that.

The growth of the array is limited to BufferSize.

The function returns the number of bytes written.

func (*Decoder) WriteByte ¶ added in v0.2.1

func (b *Decoder) WriteByte(c byte) error

WriteByte writes a single byte into the buffer.

func (*Decoder) WriteMatch ¶

func (b *Decoder) WriteMatch(mu, ou uint32) (n int, err error)

WriteMatch appends the ma tch to the end of the buffer. The match will be written completely, or n=0 and ErrFullBuffer will be returned.

func (*Decoder) WriteTo ¶ added in v0.6.0

func (b *Decoder) WriteTo(w io.Writer) (n int64, err error)

WriteTo writes the decoded data to the writer.

type DecoderOptions ¶ added in v0.6.0

type DecoderOptions struct {
	// Size of the sliding dictionary window in bytes.
	WindowSize int
	// Maximum size of the buffer in bytes.
	BufferSize int
}

DecoderOptions contains the parameters for the DecoderBuffer and decoder types. WindowSize must be smaller than BufferSize. It is recommended to set BufferSize to twice the WindowSize.

func (*DecoderOptions) SetDefaults ¶ added in v0.6.0

func (cfg *DecoderOptions) SetDefaults()

SetDefaults assigns default values to zero fields in DecoderConfig.

func (*DecoderOptions) Verify ¶ added in v0.6.0

func (cfg *DecoderOptions) Verify() error

Verify checks the parameters of the DecoderConfig value and returns an error for the first issue found.

type GreedyParser ¶ added in v0.6.0

type GreedyParser struct {
	Matcher

	GreedyParserOptions
}

GreedyParser is a simple parser that always chooses the longest match.

func NewGreedyParser ¶ added in v0.6.0

func NewGreedyParser(m Matcher, opts *GreedyParserOptions) (p *GreedyParser, err error)

NewGreedyParser creates a new GreedyParser with the given options. If opts is nil, the default options are used.

func (*GreedyParser) Parse ¶ added in v0.6.0

func (p *GreedyParser) Parse(blk *Block, n int, flags ParserFlags) (parsed int, err error)

Parse parses up to n bytes from the underlying byte stream and appends the resulting sequences and literals to blk. If blk is nil, the parser will skip n bytes in the input stream. The number of bytes parsed or skipped is returned. If no more data is available, ErrEndOfBuffer is returned.

If the NoTrailingLiterals flag is set, the parser will not include trailing literals in the block. This can be used to parse a stream in fixed size blocks without overlapping literals.

type GreedyParserOptions ¶ added in v0.6.0

type GreedyParserOptions struct {
	BlockSize int
}

GreedyParserOptions contains the options for the GreedyParser. Right now only the default block size can be configured.

func (*GreedyParserOptions) SetDefaults ¶ added in v0.6.0

func (opts *GreedyParserOptions) SetDefaults()

SetDefaults sets the default values for the GreedyParser options.

func (*GreedyParserOptions) Verify ¶ added in v0.6.0

func (opts *GreedyParserOptions) Verify() error

Verify checks that the options are valid.

type HashMatcher ¶ added in v0.6.0

type HashMatcher struct {
	Buffer

	HashOptions
	// contains filtered or unexported fields
}

HashMatcher implements matcher of a simple hash with one entry per hash value.

func NewHashMatcher ¶ added in v0.6.0

func NewHashMatcher(opts *HashOptions) (m *HashMatcher, err error)

NewHashMatcher creates a new HashMatcher with the given options.

func (*HashMatcher) AppendEdges ¶ added in v0.6.0

func (m *HashMatcher) AppendEdges(q []Seq, n int) []Seq

AppendEdges appends the literal and the matches found at the current position. This function returns the literal and at most one match.

n limits the maximum length for a match and can be used to restrict the matches to the end of the block to parse.

func (*HashMatcher) Buf ¶ added in v0.6.0

func (m *HashMatcher) Buf() *Buffer

Buf returns the buffer used by the matcher.

func (*HashMatcher) Prune ¶ added in v0.6.0

func (m *HashMatcher) Prune(n int) int

Prune removes n bytes from the beginning of the buffer and updates the hash table accordingly. It returns the actual number of bytes removed which can be less than n if n is greater than the buffer that can be pruned.

func (*HashMatcher) Reset ¶ added in v0.6.0

func (m *HashMatcher) Reset(data []byte) error

Reset resets the matcher to the initial state and uses the data slice into the buffer.

func (*HashMatcher) Skip ¶ added in v0.6.0

func (m *HashMatcher) Skip(n int) (skipped int, err error)

Skip skips n bytes in the buffer and updates the hash table.

type HashOptions ¶ added in v0.6.0

type HashOptions struct {
	InputLen int
	HashBits int

	BufferSize   int
	WindowSize   int
	MinMatchSize int
	MaxMatchSize int
}

HashOptions contains the options for the HashMatcher.

func (*HashOptions) SetDefaults ¶ added in v0.6.0

func (opt *HashOptions) SetDefaults()

SetDefaults sets the default values for the hash options.

func (*HashOptions) Verify ¶ added in v0.6.0

func (opt *HashOptions) Verify() error

Verify checks that the options are valid.

type Matcher ¶ added in v0.6.0

type Matcher interface {
	AppendEdges(q []Seq, n int) []Seq
	Skip(n int) (skipped int, err error)

	Prune(n int) int
	Write(p []byte) (n int, err error)
	ReadFrom(r io.Reader) (n int64, err error)

	ReadAt(p []byte, off int64) (n int, err error)
	ByteAt(off int64) (c byte, err error)

	Reset(data []byte) error
	Buf() *Buffer
}

Matcher is responsible to find matches or Literal bytes in the byte stream.

type Parser ¶ added in v0.3.0

type Parser interface {
	Parse(blk *Block, n int, flags ParserFlags) (parsed int, err error)

	Prune(n int) int
	Write(p []byte) (n int, err error)
	ReadFrom(r io.Reader) (n int64, err error)

	ReadAt(p []byte, off int64) (n int, err error)
	ByteAt(off int64) (c byte, err error)

	Reset(data []byte) error
}

Parser can parse the underlying byte stream into blocks of sequences.

type ParserFlags ¶ added in v0.6.0

type ParserFlags int

ParserFlags define optional parser behavior.

const (
	// NoTrailingLiterals indicates that the parser should not generate
	// trailing literal bytes in the output.
	NoTrailingLiterals ParserFlags = 1 << iota
)

type Seq ¶

type Seq struct {
	LitLen   uint32
	MatchLen uint32
	Offset   uint32
	Aux      uint32
}

Seq represents a single Lempel-Ziv 77 sequence describing a match, consisting of the offset, the length of the match, and the number of literals preceding the match. The Aux field can be used in upper layers to store additional information.

func (Seq) Len ¶

func (s Seq) Len() int64

Len returns the complete length of the sequence in bytes.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
olz Package lz supports encoding and decoding of LZ77 sequences.	Package lz supports encoding and decoding of LZ77 sequences.
suffix Package suffix provides a suffix sort algorithm.	Package suffix provides a suffix sort algorithm.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL