zion

package
v0.0.0-...-86e9f11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2024 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package zion implements a "zipped" ion encoding that compresses streams of ion structures in a manner such that fields within structures in the stream can be decompressed without decompressing the entire input stream.

Currently, the implementation of Encoder.Encode splits fields into buckets (by hashing) and then writes field contents into these buckets and compresses each bucket separately, along with a "shape" bitstream that is also compressed. Decoder.Decode decompresses the shape from the input bitstream and then uses a user-provided field selection to determine which buckets need to be decompressed as it walks the data.

## Shape Encoding Format

The "shape" of a structure encodes just enough information to reconstruct a structure from the sixteen compressed buckets.

Each structure is composed of one or more shape "sequences." Each sequence is composed of a 1-byte length+class descriptor followed by zero or more field descriptors. Each field descriptor is just a bucket number (from 0x0 to 0xf), so these are encoded as individual nibbles. The low 5 bits of the 1-byte length+class descriptor determine the number of fields that follow the descriptor (from 0 to 16 inclusive), and the top two bits of the descriptor encode a "size class" hint. (The size class hint is encoded as the number of bytes *minus one* that the ion structure descriptor would occupy. For example, a structure that was originally 8 bytes would have a descriptor 0xd8, so the size class descriptor would be 0. For 0xde1f it would be 1, and so forth; we do not support descriptors above 3.)

For example, a structure with one field that lives in bucket 0xe would be encoded as:

0x01 0xe0

A structure with four fields that live in buckets 0, 1, 2, 3:

0x04 0x01 0x23

Notice that structures with odd field lengths still consume an integral number of bytes; the final (missing) field must be encoded as the 0 nibble. In other words, the length of the fields following a descriptor can be computed by:

class := shape[0]>>6            // size class
size := shape[0]&0x1f           // descriptor
body := shape[1:1+((size+1)/2)] // bytes of nibbles

Since we can only record up to 16 fields in one sequence, a sequence of 16 fields does not terminate a structure, and the next sequence continues the fields where the previous one left off. (So, a structure with 16 fields will be composed of two sequences, where the second sequence is simply the 0x00 byte.)

## Decoding Process

A "shape" stream composed of multiple structures *must* be decoded sequentially, since the shape stream itself only consists of bucket references.

In order to unpack a structure, the decoder must consume the next ion label *and* ion value in each bucket that it steps through so that the bucket produces a new value each time it is referenced in the stream. (For example, the sequence of fields 0xffff would imply that bucket 15 would have to be decoded fifteen times in sequence).

Index

Constants

View Source
const (
	DefaultTargetWrite = 128 * 1024
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Decoder

type Decoder struct {
	TargetWriteSize int
	// contains filtered or unexported fields
}

Decoder is a stateful decoder of compressed data produced with Encoder.Encode.

The zero value of Decoder has the "wildcard" flag set, which means it decodes 100% of the structure fields in the input. Calls to Decoder.SetComponents can pick a subset of the fields that are to be projected, and calls to Decoder.SetWildcard can re-enable the wildcard flag.

func (*Decoder) CopyBytes

func (d *Decoder) CopyBytes(dst io.Writer, src []byte) (int64, error)

CopyBytes writes ion data into dst as it is decoded from src. CopyBytes works similarly to Decode except that it does not require as much data to be buffered at once.

func (*Decoder) Count

func (d *Decoder) Count(src []byte) (int, error)

Count counts the number of structures in src rather than decompressing the body of src. Note that Count is stateful (it processes symbol tables) so that it may be substituted for a call to Decode where only the number of stored records is of interest.

func (*Decoder) Decode

func (d *Decoder) Decode(src, dst []byte) ([]byte, error)

Decode performs a statefull decoding of src by appending into dst. If a particular field selection has been selected via d.SetComponents, then Decode *may* omit fields that are not part of the selection. Sequential calls to Decode build an ion symbol table internally, so the order in which blocks are presented to Decode as src should match the order in which they were presented to Encoder.Encode.

func (*Decoder) Reset

func (d *Decoder) Reset()

Reset resets the internal decoder state, including the internal symbol table.

func (*Decoder) SetComponents

func (d *Decoder) SetComponents(x []string)

SetComponents sets the leading path components that should be copied out during calls to Decode. SetComponents may be overridden by another call to SetComponents or SetWildcard.

The "leading path component" is the first component of a path, so the path x.y.z has x as its first component.

func (*Decoder) SetPortable

func (d *Decoder) SetPortable(p bool)

SetPortable sets the decoder's portability flag. If the portability flag is set, then the Decoder uses a pure-Go decoding routine even if an architecture-specific non-portable decoding routine is available.

func (*Decoder) SetWildcard

func (d *Decoder) SetWildcard()

SetWildcard tells the decoder to decode all input fields. This clears any field selection made by SetComponents.

The zero value of Decoder has the wildcard flag set.

func (*Decoder) Wildcard

func (d *Decoder) Wildcard() bool

Wildcard reads the status of the decoder wildcard flag. See also SetWildcard and SetComponents.

type Encoder

type Encoder struct {
	// Algo is the current encoder bucket algorithm.
	// Algo may be changed between calls to Encoder.Encode.
	Algo zll.BucketAlgo
	// contains filtered or unexported fields
}

Encoder is used to compress sequential blocks of ion data. See Encoder.Encode and Decoder.Decode.

func (*Encoder) Encode

func (e *Encoder) Encode(src, dst []byte) ([]byte, error)

Encode encodes ion data from src by appending it to dst. Encode parses ion symbol tables from src as they appear, so the output stream may not be order-independent (the chunks encoded via Encode should be decoded via Decoder.Decode in the same order in which they are encoded).

Note that the compression format does not preserve nop padding in ion data. In other words, data passed to Encode may not be bit-identical to data received from Decode if the input data contains nop pads.

func (*Encoder) Reset

func (e *Encoder) Reset()

Reset resets the Encoder's internal symbol table and its seed.

func (*Encoder) SetSymbols

func (e *Encoder) SetSymbols(st *ion.Symtab)

SetSymbols sets the current state of the internal symbol table in the Encoder. This can be used to resume encoding in the middle of an existing ion stream.

Directories

Path Synopsis
Package iguana implements a Lizard-derived compression/decompression pipeline
Package iguana implements a Lizard-derived compression/decompression pipeline
Package zll exposes types and procedures related to low-level zion decoding.
Package zll exposes types and procedures related to low-level zion decoding.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL