encoding

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 27, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BoolValues

func BoolValues(arr *vortex.Array) ([]bool, error)

BoolValues extracts boolean values from a decoded Bool array.

func DecodeArray

func DecodeArray(node *vortex.ArrayNode, allBuffers [][]byte, dtype *vortex.DType, rowCount uint64) (*vortex.Array, error)

DecodeArray recursively decodes an ArrayNode tree into an Array. It resolves the encoding for each node and delegates to the appropriate decoder. allBuffers contains all buffers extracted from the segment; each node's BufferIndices select which buffers it uses.

func DecodeChild

func DecodeChild(node *vortex.ArrayNode, childIdx int, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

DecodeChild is a convenience for decoders that need to recursively decode a child node.

func ExportUnpackFastLanesTyped

func ExportUnpackFastLanesTyped(packed []byte, bw, offset, count, elemSize int) ([]byte, error)

ExportUnpackFastLanesTyped is an exported wrapper around unpackFastLanesTyped for use in cross-package tests (e.g. writer pack→unpack round-trip verification).

func Float32Values

func Float32Values(arr *vortex.Array) ([]float32, error)

Float32Values returns the f32 values from a Primitive array.

func Float64Values

func Float64Values(arr *vortex.Array) ([]float64, error)

Float64Values returns the f64 values from a Primitive array.

func Int32Values

func Int32Values(arr *vortex.Array) ([]int32, error)

Int32Values returns the i32 values from a Primitive array.

func Int64Values

func Int64Values(arr *vortex.Array) ([]int64, error)

Int64Values returns the i64 values from a Primitive array.

func Register

func Register(id string, dec Decoder)

Register adds a decoder for the given encoding ID.

func StringValues

func StringValues(arr *vortex.Array) ([]string, error)

StringValues extracts string values from a VarBinView or Dict string array.

func Uint8Values

func Uint8Values(arr *vortex.Array) ([]uint8, error)

Uint8Values returns the u8 values from a Primitive array.

func Uint16Values

func Uint16Values(arr *vortex.Array) ([]uint16, error)

Uint16Values returns the u16 values from a Primitive array.

func Uint32Values

func Uint32Values(arr *vortex.Array) ([]uint32, error)

Uint32Values returns the u32 values from a Primitive array.

func Uint64Values

func Uint64Values(arr *vortex.Array) ([]uint64, error)

Uint64Values returns the u64 values from a Primitive array.

Types

type ALPDecoder

type ALPDecoder struct{}

ALPDecoder decodes vortex.alp (Adaptive Lossless floating-Point) arrays.

ALP encodes floating-point values as integers using powers-of-10 scaling:

encoded_int = round(float * 10^e / 10^f)
float       = int(encoded_int) * 10^f * 10^(-e)

Children:

  • child[0]: encoded integers (i32 for f32, i64 for f64)
  • child[1..]: optional patches (indices, values, chunk_offsets) for values that don't round-trip cleanly through the ALP transform

func (*ALPDecoder) Decode

func (d *ALPDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type ALPRDDecoder

type ALPRDDecoder struct{}

ALPRDDecoder decodes vortex.alprd (ALP Right-Digit / "Real Doubles") arrays.

ALPRD splits each float's bit representation into:

  • left_parts: most-significant bits, dictionary-coded to u16 codes, then bitpacked
  • right_parts: least-significant bits, bitpacked as uint (u32 for f32, u64 for f64)

Decoding:

  1. Decode left_parts child → u16 codes
  2. Decode right_parts child → uint values
  3. Dict-lookup left codes → u16 bit patterns
  4. Apply patches on left parts (exceptions not in the dictionary)
  5. Reconstruct: float = from_bits((uint(left_u16) << right_bit_width) | right)

Children: [left_parts, right_parts, patch_indices?, patch_values?]

func (*ALPRDDecoder) Decode

func (d *ALPRDDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type BitPackedDecoder

type BitPackedDecoder struct{}

BitPackedDecoder decodes fastlanes.bitpacked arrays.

FastLanes bitpacking stores N-bit integers in a dense buffer using a transposed layout designed for SIMD parallelism. The packed buffer is typed — it's an array of T-sized elements (where T matches the logical PType width) arranged across "lanes".

For a 1024-element block with element type T (having tBits = sizeof(T)*8 bits):

nLanes = 1024 / tBits
rowsPerLane = tBits

The spiraldb/fastlanes Rust crate uses a reordered transposition (not the original FastLanes paper order). The mapping from (row, lane) → logical index is:

FL_ORDER = [0, 4, 2, 6, 1, 5, 3, 7]
logIdx = FL_ORDER[row/8]*16 + (row%8)*128 + lane

The inverse (logical index → row, lane) is:

lane = logIdx % nLanes
s = logIdx / 128
fl = (logIdx - s*128 - lane) / 16
o = FL_ORDER[fl]          // FL_ORDER is its own inverse
row = o*8 + s

Then the physical packed position for a value at (row, lane) with bit_width W:

startBit = row * W
word = startBit / tBits
shift = startBit % tBits
physIdx = word * nLanes + lane

func (*BitPackedDecoder) Decode

func (d *BitPackedDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type BoolDecoder

type BoolDecoder struct{}

BoolDecoder decodes vortex.bool arrays. A Bool array has one buffer containing bit-packed booleans (1 bit per value).

func (*BoolDecoder) Decode

func (d *BoolDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type ByteBoolDecoder

type ByteBoolDecoder struct{}

ByteBoolDecoder decodes vortex.bytebool arrays.

ByteBool encoding stores one byte per boolean value (0x00=false, 0x01=true). This is more compute-friendly than bitpacked bools for certain operations. Single buffer: one byte per element.

func (*ByteBoolDecoder) Decode

func (d *ByteBoolDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type ChunkedDecoder

type ChunkedDecoder struct{}

ChunkedDecoder decodes vortex.chunked arrays (array-level, not layout-level). A Chunked array has N children, each a sub-array. The decoded array is the concatenation of all children.

func (*ChunkedDecoder) Decode

func (d *ChunkedDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type ConstantDecoder

type ConstantDecoder struct{}

ConstantDecoder decodes vortex.constant arrays. A Constant array has no buffers — the scalar value is in the metadata (protobuf ScalarValue). One child may be present (the validity array).

func (*ConstantDecoder) Decode

func (d *ConstantDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type DateTimePartsDecoder

type DateTimePartsDecoder struct{}

DateTimePartsDecoder decodes vortex.datetimeparts arrays.

DateTimeParts splits timestamps into three components for better compression:

days     — number of days since the Unix epoch
seconds  — seconds within the day (0–86399)
subseconds — fractional part in the target time unit

Children: [0] days, [1] seconds, [2] subseconds Metadata: DateTimePartsMetadata { days_ptype, seconds_ptype, subseconds_ptype }

The time unit is extracted from the Extension DType metadata:

byte 0:   TimeUnit tag (0=ns, 1=µs, 2=ms, 3=s)
bytes 1-2: timezone string length (u16 LE)
bytes 3+:  timezone string (UTF-8)

Reconstruction: timestamp[i] = days[i]*86400*divisor + seconds[i]*divisor + subseconds[i]

func (*DateTimePartsDecoder) Decode

func (d *DateTimePartsDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type DecimalBytePartsDecoder

type DecimalBytePartsDecoder struct{}

DecimalBytePartsDecoder decodes vortex.decimal_byte_parts arrays.

This encoding stores decimal values as their fixed-point integer representation in a single child (the "most significant part" or MSP). The child is a signed integer primitive array whose values, when divided by 10^scale, give the actual decimal values.

Metadata: zeroth_child_ptype (PType of the child), lower_part_count (always 0 currently) Children: [msp] — a single signed integer array

func (*DecimalBytePartsDecoder) Decode

func (d *DecimalBytePartsDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type DecodeContext

type DecodeContext struct {
	// AllBuffers is the complete buffer set extracted from the segment.
	AllBuffers [][]byte
}

DecodeContext carries all state needed for recursive array decoding.

type Decoder

type Decoder interface {
	// Decode decodes the given node. nodeBuffers are the buffers selected by the node's
	// BufferIndices. ctx provides access to allBuffers for child decoding.
	Decode(node *vortex.ArrayNode, nodeBuffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)
}

Decoder decodes an ArrayNode with its buffer data into a decoded Array.

func Get

func Get(id string) (Decoder, error)

Get returns the decoder for the given encoding ID.

type DeltaDecoder

type DeltaDecoder struct{}

DeltaDecoder decodes fastlanes.delta arrays.

Delta encoding stores values as prefix-sums over a FastLanes-transposed layout. Each 1024-element chunk has LANES base values; the deltas are stored in transposed FL_ORDER. Decompression applies a per-lane prefix-sum (undelta) in transposed order, then untransposes to recover the original logical order.

For remainders (< 1024 elements), scalar prefix-sum is used.

Metadata: DeltaMetadata { deltas_len: u64, offset: u32 } Children: [bases, deltas] — both same unsigned integer ptype

func (*DeltaDecoder) Decode

func (d *DeltaDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type DictDecoder

type DictDecoder struct{}

DictDecoder decodes vortex.dict arrays. A Dict array has two children:

  • Child 0: codes — integer array of indices into the values dictionary
  • Child 1: values — the dictionary (e.g., a VarBinView string array)

func (*DictDecoder) Decode

func (d *DictDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type FSSTDecoder

type FSSTDecoder struct{}

FSSTDecoder decodes vortex.fsst arrays.

FSST (Fast Static Symbol Table) compresses strings by replacing common substrings (1–8 bytes) with single-byte codes. The encoding stores:

Buffer[0]: symbols — 8 bytes per symbol (up to 255 symbols, padded to 8 bytes)
Buffer[1]: symbol_lengths — 1 byte per symbol (actual length of each symbol, 1–8)
Buffer[2]: compressed_codes — concatenated compressed data for all strings
Child[0]:  uncompressed_lengths — integer array of expected decompressed lengths
Child[1]:  codes_offsets — integer array of N+1 VarBin offsets into buffer[2]
Child[2]:  (optional) validity mask — boolean array

Decompression: for each byte in the compressed stream, if the byte value is less than the number of symbols, emit the corresponding symbol bytes. If the byte is 0xFF (escape code), emit the next byte literally.

func (*FSSTDecoder) Decode

func (d *FSSTDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type FastLanesRLEDecoder

type FastLanesRLEDecoder struct{}

FastLanesRLEDecoder decodes fastlanes.rle arrays.

FastLanes RLE stores data as 1024-element chunks. Within each chunk, values are dictionary-encoded: an indices array maps each position to a value in the values array. Multiple chunks share a single values array, with per-chunk offsets stored in values_idx_offsets.

Children: [0] values, [1] indices, [2] values_idx_offsets Metadata: RLEMetadata { values_len, indices_len, indices_ptype,

values_idx_offsets_len, values_idx_offsets_ptype, offset }

For element i in chunk c:

global_val_idx = indices[c*1024 + i] + (offsets[c] - offsets[0])
output[i] = values[global_val_idx]

func (*FastLanesRLEDecoder) Decode

func (d *FastLanesRLEDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type FoRDecoder

type FoRDecoder struct{}

FoRDecoder decodes fastlanes.for (Frame-of-Reference) arrays.

FoR encoding stores values as offsets from a reference (minimum) value. The metadata contains the reference value as a protobuf ScalarValue. The single child is the encoded offsets (typically bitpacked).

Decoding: decoded[i] = child[i] + reference

func (*FoRDecoder) Decode

func (d *FoRDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type MaskedDecoder

type MaskedDecoder struct{}

MaskedDecoder decodes vortex.masked arrays. A Masked array has two children:

  • Child 0: values — the actual data array
  • Child 1: mask — boolean validity array (true = valid, false = null)

func (*MaskedDecoder) Decode

func (d *MaskedDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type NullDecoder

type NullDecoder struct{}

NullDecoder decodes vortex.null arrays. A Null array has no buffers — it represents rowCount null values.

func (*NullDecoder) Decode

func (d *NullDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type PcoDecoder

type PcoDecoder struct{}

PcoDecoder decodes vortex.pco arrays using the pure-Go pcodec package.

Vortex PcoArray layout:

Metadata (protobuf): PcoMetadata { header, chunks[] { pages[] { n_values } } }
Buffers: [chunk_meta_0, ..., chunk_meta_N, page_0, page_1, ..., page_M]
    First len(chunks) buffers are chunk metadata (pco ChunkMeta bytes).
    Remaining buffers are page data (ordered by chunk, then page within chunk).
Children: 0 or 1 (optional validity bitmap).

func (*PcoDecoder) Decode

func (d *PcoDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type PrimitiveDecoder

type PrimitiveDecoder struct{}

PrimitiveDecoder decodes vortex.primitive arrays. A Primitive array has one buffer containing raw typed values (little-endian).

func (*PrimitiveDecoder) Decode

func (d *PrimitiveDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type RunEndDecoder

type RunEndDecoder struct{}

RunEndDecoder decodes vortex.runend arrays.

RunEnd encoding stores runs of identical values as (end_position, value) pairs. Children: [0] ends (integer array of run end positions), [1] values (one per run). Metadata: RunEndMetadata { ends_ptype, num_runs, offset }

To expand: for each output position i, find the run where ends[run] > i+offset, then output values[run].

func (*RunEndDecoder) Decode

func (d *RunEndDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type SequenceDecoder

type SequenceDecoder struct{}

SequenceDecoder decodes vortex.sequence arrays. A Sequence array has no buffers and no children. It represents the formula:

A[i] = base + i * multiplier

The base and multiplier are protobuf ScalarValues stored in the metadata.

func (*SequenceDecoder) Decode

func (d *SequenceDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type SparseDecoder

type SparseDecoder struct{}

SparseDecoder decodes vortex.sparse arrays.

Sparse encoding stores a fill value (in a buffer) plus patches at specific indices that override the fill value. Children: [0] patch_indices (integer array), [1] patch_values (same dtype as column). Buffers: [0] fill value as protobuf ScalarValue. Metadata: SparseMetadata { patches: PatchesMetadata }

func (*SparseDecoder) Decode

func (d *SparseDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type StructDecoder

type StructDecoder struct{}

StructDecoder decodes vortex.struct arrays (array-level). A Struct array has one child per field, no buffers of its own.

func (*StructDecoder) Decode

func (d *StructDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type VarBinViewDecoder

type VarBinViewDecoder struct{}

VarBinViewDecoder decodes vortex.varbinview arrays. This is the canonical encoding for Utf8 and Binary types.

Buffer layout:

  • Buffer 0: views — one 16-byte view descriptor per string
  • Buffers 1..N: data buffers containing string bytes for long strings

View descriptor (16 bytes):

  • length: uint32 (byte length of the string)
  • If length <= 12: next 12 bytes are the string data inline
  • If length > 12: next 4 bytes are a prefix, then uint32 buffer_index + uint32 offset

func (*VarBinViewDecoder) Decode

func (d *VarBinViewDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type ZigZagDecoder

type ZigZagDecoder struct{}

ZigZagDecoder decodes vortex.zigzag arrays.

ZigZag encoding maps signed integers to unsigned integers so that small absolute values have small encoded values (good for bitpacking). Inverse: signed = (unsigned >> 1) ^ -(unsigned & 1)

Single child: the unsigned integer array.

func (*ZigZagDecoder) Decode

func (d *ZigZagDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

type ZstdDecoder

type ZstdDecoder struct{}

ZstdDecoder decodes vortex.zstd arrays.

Zstd compresses primitive or string data using the Zstandard algorithm. An optional shared dictionary improves compression across multiple frames.

Buffers: [dictionary?] [frame0] [frame1] ...

If dictionary_size > 0, the first buffer is the Zstd dictionary.
Remaining buffers are compressed frames.

Metadata: ZstdMetadata { dictionary_size(u32), frames: [ZstdFrameMetadata] }

Each frame has: uncompressed_size(u64), n_values(u64)

Children: 0 or 1 (validity bitmap if nullable).

For primitive dtypes: frames decompress to raw typed bytes. For string dtypes: frames decompress to length-prefixed strings (u32 LE length + bytes), which are reconstructed into VarBinView format.

func (*ZstdDecoder) Decode

func (d *ZstdDecoder) Decode(node *vortex.ArrayNode, buffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL