Documentation
¶
Index ¶
- func BoolValues(arr *vortex.Array) ([]bool, error)
- func DecodeArray(node *vortex.ArrayNode, allBuffers [][]byte, dtype *vortex.DType, ...) (*vortex.Array, error)
- func DecodeChild(node *vortex.ArrayNode, childIdx int, dtype *vortex.DType, rowCount uint64, ...) (*vortex.Array, error)
- func ExportUnpackFastLanesTyped(packed []byte, bw, offset, count, elemSize int) ([]byte, error)
- func Float32Values(arr *vortex.Array) ([]float32, error)
- func Float64Values(arr *vortex.Array) ([]float64, error)
- func Int32Values(arr *vortex.Array) ([]int32, error)
- func Int64Values(arr *vortex.Array) ([]int64, error)
- func Register(id string, dec Decoder)
- func StringValues(arr *vortex.Array) ([]string, error)
- func Uint8Values(arr *vortex.Array) ([]uint8, error)
- func Uint16Values(arr *vortex.Array) ([]uint16, error)
- func Uint32Values(arr *vortex.Array) ([]uint32, error)
- func Uint64Values(arr *vortex.Array) ([]uint64, error)
- type ALPDecoder
- type ALPRDDecoder
- type BitPackedDecoder
- type BoolDecoder
- type ByteBoolDecoder
- type ChunkedDecoder
- type ConstantDecoder
- type DateTimePartsDecoder
- type DecimalBytePartsDecoder
- type DecodeContext
- type Decoder
- type DeltaDecoder
- type DictDecoder
- type FSSTDecoder
- type FastLanesRLEDecoder
- type FoRDecoder
- type MaskedDecoder
- type NullDecoder
- type PcoDecoder
- type PrimitiveDecoder
- type RunEndDecoder
- type SequenceDecoder
- type SparseDecoder
- type StructDecoder
- type VarBinViewDecoder
- type ZigZagDecoder
- type ZstdDecoder
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BoolValues ¶
BoolValues extracts boolean values from a decoded Bool array.
func DecodeArray ¶
func DecodeArray(node *vortex.ArrayNode, allBuffers [][]byte, dtype *vortex.DType, rowCount uint64) (*vortex.Array, error)
DecodeArray recursively decodes an ArrayNode tree into an Array. It resolves the encoding for each node and delegates to the appropriate decoder. allBuffers contains all buffers extracted from the segment; each node's BufferIndices select which buffers it uses.
func DecodeChild ¶
func DecodeChild(node *vortex.ArrayNode, childIdx int, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)
DecodeChild is a convenience for decoders that need to recursively decode a child node.
func ExportUnpackFastLanesTyped ¶
ExportUnpackFastLanesTyped is an exported wrapper around unpackFastLanesTyped for use in cross-package tests (e.g. writer pack→unpack round-trip verification).
func Float32Values ¶
Float32Values returns the f32 values from a Primitive array.
func Float64Values ¶
Float64Values returns the f64 values from a Primitive array.
func Int32Values ¶
Int32Values returns the i32 values from a Primitive array.
func Int64Values ¶
Int64Values returns the i64 values from a Primitive array.
func StringValues ¶
StringValues extracts string values from a VarBinView or Dict string array.
func Uint8Values ¶
Uint8Values returns the u8 values from a Primitive array.
func Uint16Values ¶
Uint16Values returns the u16 values from a Primitive array.
func Uint32Values ¶
Uint32Values returns the u32 values from a Primitive array.
Types ¶
type ALPDecoder ¶
type ALPDecoder struct{}
ALPDecoder decodes vortex.alp (Adaptive Lossless floating-Point) arrays.
ALP encodes floating-point values as integers using powers-of-10 scaling:
encoded_int = round(float * 10^e / 10^f) float = int(encoded_int) * 10^f * 10^(-e)
Children:
- child[0]: encoded integers (i32 for f32, i64 for f64)
- child[1..]: optional patches (indices, values, chunk_offsets) for values that don't round-trip cleanly through the ALP transform
type ALPRDDecoder ¶
type ALPRDDecoder struct{}
ALPRDDecoder decodes vortex.alprd (ALP Right-Digit / "Real Doubles") arrays.
ALPRD splits each float's bit representation into:
- left_parts: most-significant bits, dictionary-coded to u16 codes, then bitpacked
- right_parts: least-significant bits, bitpacked as uint (u32 for f32, u64 for f64)
Decoding:
- Decode left_parts child → u16 codes
- Decode right_parts child → uint values
- Dict-lookup left codes → u16 bit patterns
- Apply patches on left parts (exceptions not in the dictionary)
- Reconstruct: float = from_bits((uint(left_u16) << right_bit_width) | right)
Children: [left_parts, right_parts, patch_indices?, patch_values?]
type BitPackedDecoder ¶
type BitPackedDecoder struct{}
BitPackedDecoder decodes fastlanes.bitpacked arrays.
FastLanes bitpacking stores N-bit integers in a dense buffer using a transposed layout designed for SIMD parallelism. The packed buffer is typed — it's an array of T-sized elements (where T matches the logical PType width) arranged across "lanes".
For a 1024-element block with element type T (having tBits = sizeof(T)*8 bits):
nLanes = 1024 / tBits rowsPerLane = tBits
The spiraldb/fastlanes Rust crate uses a reordered transposition (not the original FastLanes paper order). The mapping from (row, lane) → logical index is:
FL_ORDER = [0, 4, 2, 6, 1, 5, 3, 7] logIdx = FL_ORDER[row/8]*16 + (row%8)*128 + lane
The inverse (logical index → row, lane) is:
lane = logIdx % nLanes s = logIdx / 128 fl = (logIdx - s*128 - lane) / 16 o = FL_ORDER[fl] // FL_ORDER is its own inverse row = o*8 + s
Then the physical packed position for a value at (row, lane) with bit_width W:
startBit = row * W word = startBit / tBits shift = startBit % tBits physIdx = word * nLanes + lane
type BoolDecoder ¶
type BoolDecoder struct{}
BoolDecoder decodes vortex.bool arrays. A Bool array has one buffer containing bit-packed booleans (1 bit per value).
type ByteBoolDecoder ¶
type ByteBoolDecoder struct{}
ByteBoolDecoder decodes vortex.bytebool arrays.
ByteBool encoding stores one byte per boolean value (0x00=false, 0x01=true). This is more compute-friendly than bitpacked bools for certain operations. Single buffer: one byte per element.
type ChunkedDecoder ¶
type ChunkedDecoder struct{}
ChunkedDecoder decodes vortex.chunked arrays (array-level, not layout-level). A Chunked array has N children, each a sub-array. The decoded array is the concatenation of all children.
type ConstantDecoder ¶
type ConstantDecoder struct{}
ConstantDecoder decodes vortex.constant arrays. A Constant array has no buffers — the scalar value is in the metadata (protobuf ScalarValue). One child may be present (the validity array).
type DateTimePartsDecoder ¶
type DateTimePartsDecoder struct{}
DateTimePartsDecoder decodes vortex.datetimeparts arrays.
DateTimeParts splits timestamps into three components for better compression:
days — number of days since the Unix epoch seconds — seconds within the day (0–86399) subseconds — fractional part in the target time unit
Children: [0] days, [1] seconds, [2] subseconds Metadata: DateTimePartsMetadata { days_ptype, seconds_ptype, subseconds_ptype }
The time unit is extracted from the Extension DType metadata:
byte 0: TimeUnit tag (0=ns, 1=µs, 2=ms, 3=s) bytes 1-2: timezone string length (u16 LE) bytes 3+: timezone string (UTF-8)
Reconstruction: timestamp[i] = days[i]*86400*divisor + seconds[i]*divisor + subseconds[i]
type DecimalBytePartsDecoder ¶
type DecimalBytePartsDecoder struct{}
DecimalBytePartsDecoder decodes vortex.decimal_byte_parts arrays.
This encoding stores decimal values as their fixed-point integer representation in a single child (the "most significant part" or MSP). The child is a signed integer primitive array whose values, when divided by 10^scale, give the actual decimal values.
Metadata: zeroth_child_ptype (PType of the child), lower_part_count (always 0 currently) Children: [msp] — a single signed integer array
type DecodeContext ¶
type DecodeContext struct {
// AllBuffers is the complete buffer set extracted from the segment.
AllBuffers [][]byte
}
DecodeContext carries all state needed for recursive array decoding.
type Decoder ¶
type Decoder interface {
// Decode decodes the given node. nodeBuffers are the buffers selected by the node's
// BufferIndices. ctx provides access to allBuffers for child decoding.
Decode(node *vortex.ArrayNode, nodeBuffers [][]byte, dtype *vortex.DType, rowCount uint64, ctx *DecodeContext) (*vortex.Array, error)
}
Decoder decodes an ArrayNode with its buffer data into a decoded Array.
type DeltaDecoder ¶
type DeltaDecoder struct{}
DeltaDecoder decodes fastlanes.delta arrays.
Delta encoding stores values as prefix-sums over a FastLanes-transposed layout. Each 1024-element chunk has LANES base values; the deltas are stored in transposed FL_ORDER. Decompression applies a per-lane prefix-sum (undelta) in transposed order, then untransposes to recover the original logical order.
For remainders (< 1024 elements), scalar prefix-sum is used.
Metadata: DeltaMetadata { deltas_len: u64, offset: u32 } Children: [bases, deltas] — both same unsigned integer ptype
type DictDecoder ¶
type DictDecoder struct{}
DictDecoder decodes vortex.dict arrays. A Dict array has two children:
- Child 0: codes — integer array of indices into the values dictionary
- Child 1: values — the dictionary (e.g., a VarBinView string array)
type FSSTDecoder ¶
type FSSTDecoder struct{}
FSSTDecoder decodes vortex.fsst arrays.
FSST (Fast Static Symbol Table) compresses strings by replacing common substrings (1–8 bytes) with single-byte codes. The encoding stores:
Buffer[0]: symbols — 8 bytes per symbol (up to 255 symbols, padded to 8 bytes) Buffer[1]: symbol_lengths — 1 byte per symbol (actual length of each symbol, 1–8) Buffer[2]: compressed_codes — concatenated compressed data for all strings Child[0]: uncompressed_lengths — integer array of expected decompressed lengths Child[1]: codes_offsets — integer array of N+1 VarBin offsets into buffer[2] Child[2]: (optional) validity mask — boolean array
Decompression: for each byte in the compressed stream, if the byte value is less than the number of symbols, emit the corresponding symbol bytes. If the byte is 0xFF (escape code), emit the next byte literally.
type FastLanesRLEDecoder ¶
type FastLanesRLEDecoder struct{}
FastLanesRLEDecoder decodes fastlanes.rle arrays.
FastLanes RLE stores data as 1024-element chunks. Within each chunk, values are dictionary-encoded: an indices array maps each position to a value in the values array. Multiple chunks share a single values array, with per-chunk offsets stored in values_idx_offsets.
Children: [0] values, [1] indices, [2] values_idx_offsets Metadata: RLEMetadata { values_len, indices_len, indices_ptype,
values_idx_offsets_len, values_idx_offsets_ptype, offset }
For element i in chunk c:
global_val_idx = indices[c*1024 + i] + (offsets[c] - offsets[0]) output[i] = values[global_val_idx]
type FoRDecoder ¶
type FoRDecoder struct{}
FoRDecoder decodes fastlanes.for (Frame-of-Reference) arrays.
FoR encoding stores values as offsets from a reference (minimum) value. The metadata contains the reference value as a protobuf ScalarValue. The single child is the encoded offsets (typically bitpacked).
Decoding: decoded[i] = child[i] + reference
type MaskedDecoder ¶
type MaskedDecoder struct{}
MaskedDecoder decodes vortex.masked arrays. A Masked array has two children:
- Child 0: values — the actual data array
- Child 1: mask — boolean validity array (true = valid, false = null)
type NullDecoder ¶
type NullDecoder struct{}
NullDecoder decodes vortex.null arrays. A Null array has no buffers — it represents rowCount null values.
type PcoDecoder ¶
type PcoDecoder struct{}
PcoDecoder decodes vortex.pco arrays using the pure-Go pcodec package.
Vortex PcoArray layout:
Metadata (protobuf): PcoMetadata { header, chunks[] { pages[] { n_values } } }
Buffers: [chunk_meta_0, ..., chunk_meta_N, page_0, page_1, ..., page_M]
First len(chunks) buffers are chunk metadata (pco ChunkMeta bytes).
Remaining buffers are page data (ordered by chunk, then page within chunk).
Children: 0 or 1 (optional validity bitmap).
type PrimitiveDecoder ¶
type PrimitiveDecoder struct{}
PrimitiveDecoder decodes vortex.primitive arrays. A Primitive array has one buffer containing raw typed values (little-endian).
type RunEndDecoder ¶
type RunEndDecoder struct{}
RunEndDecoder decodes vortex.runend arrays.
RunEnd encoding stores runs of identical values as (end_position, value) pairs. Children: [0] ends (integer array of run end positions), [1] values (one per run). Metadata: RunEndMetadata { ends_ptype, num_runs, offset }
To expand: for each output position i, find the run where ends[run] > i+offset, then output values[run].
type SequenceDecoder ¶
type SequenceDecoder struct{}
SequenceDecoder decodes vortex.sequence arrays. A Sequence array has no buffers and no children. It represents the formula:
A[i] = base + i * multiplier
The base and multiplier are protobuf ScalarValues stored in the metadata.
type SparseDecoder ¶
type SparseDecoder struct{}
SparseDecoder decodes vortex.sparse arrays.
Sparse encoding stores a fill value (in a buffer) plus patches at specific indices that override the fill value. Children: [0] patch_indices (integer array), [1] patch_values (same dtype as column). Buffers: [0] fill value as protobuf ScalarValue. Metadata: SparseMetadata { patches: PatchesMetadata }
type StructDecoder ¶
type StructDecoder struct{}
StructDecoder decodes vortex.struct arrays (array-level). A Struct array has one child per field, no buffers of its own.
type VarBinViewDecoder ¶
type VarBinViewDecoder struct{}
VarBinViewDecoder decodes vortex.varbinview arrays. This is the canonical encoding for Utf8 and Binary types.
Buffer layout:
- Buffer 0: views — one 16-byte view descriptor per string
- Buffers 1..N: data buffers containing string bytes for long strings
View descriptor (16 bytes):
- length: uint32 (byte length of the string)
- If length <= 12: next 12 bytes are the string data inline
- If length > 12: next 4 bytes are a prefix, then uint32 buffer_index + uint32 offset
type ZigZagDecoder ¶
type ZigZagDecoder struct{}
ZigZagDecoder decodes vortex.zigzag arrays.
ZigZag encoding maps signed integers to unsigned integers so that small absolute values have small encoded values (good for bitpacking). Inverse: signed = (unsigned >> 1) ^ -(unsigned & 1)
Single child: the unsigned integer array.
type ZstdDecoder ¶
type ZstdDecoder struct{}
ZstdDecoder decodes vortex.zstd arrays.
Zstd compresses primitive or string data using the Zstandard algorithm. An optional shared dictionary improves compression across multiple frames.
Buffers: [dictionary?] [frame0] [frame1] ...
If dictionary_size > 0, the first buffer is the Zstd dictionary. Remaining buffers are compressed frames.
Metadata: ZstdMetadata { dictionary_size(u32), frames: [ZstdFrameMetadata] }
Each frame has: uncompressed_size(u64), n_values(u64)
Children: 0 or 1 (validity bitmap if nullable).
For primitive dtypes: frames decompress to raw typed bytes. For string dtypes: frames decompress to length-prefixed strings (u32 LE length + bytes), which are reconstructed into VarBinView format.