intcomp

package module
v1.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2025 License: Apache-2.0 Imports: 2 Imported by: 3

README

Integer Compression

Go Reference

This library provides high performance (GB/s) compression and decompression of integers (int32/uint32/int64/uint64).

Good compression factor can be achieved when, on average, the difference between 2 consecutive values of the input remains small and thus can be encoded with fewer bits.

Common use cases:

  • Timestamps
  • Offsets
  • Counter based identifiers

The encoding schemes used here are based on Dr. Daniel Lemire research.

Encoding Logic

Data is encoded in blocks of multiple of 128x32bit or 256x64bits inputs in the following manner:

  1. Difference of consecutive inputs is computed (differential coding)
  2. ZigZag encoding is applied if a block contains at least one negative delta value
  3. The result is bit packed into the optimal number of bits for the block

The remaining input that won't fit within a 128x32bits or 256x64bits block will be encoded in an additional block using Variable Byte encoding (with delta)

Append to compressed arrays

In stream processing systems data is usually received by chunks. Compressing and aggregating small chunks can be inneficient and impractical.

This API provides a convenient way to handle such inputs: When adding data to a compressed buffer, if the last block is a small block, encoded with Variable Byte, it will be rewritten in order to provide better compression using bit packing.

Encoding of timestamps with nanosecond resolution

Timestamps with nanosecond resolution sometimes have an actual lower internal resolution (eg. microsecond). To provide better compression for that type of data, the encoding algorithm for int64 has a specific optimization that will provide better compression factor in such case.

Usage

input := []int32{1, 2, 3}

// compress
compressed := intcomp.CompressInt32(input, nil)
// compress more data (append)
compressed = intcomp.CompressInt32([]int32{4, 5, 6}, compressed)

// uncompress
data := intcomp.UncompressInt32(compressed, nil)
// data: [1, 2, 3, 4, 5, 6]

Performance

Benchmarks for the bitpacking compression/decompression (MacBook pro M1). The result vary depending on the number of bits used to encode integers.

Compression
  • 32bits: between 4.0 and 7.2 GB/s
  • 64bits: between 8.0 and 14.8 GB/s
Decompression
  • 32bits: between 3.6 and 11.5 GB/s
  • 64bits: between 14.9 and 24.0 GB/s

TODO

  • Support float32/64 (using something similar to Gorilla compression)
  • Force creation of blocks at fixed boundaries to enable arbitrary position decoding and reversed iteration
  • Implement Block iterators with low memory usage
  • Add Binary search for sorted arrays

Documentation

Index

Constants

View Source
const (
	BitPackingBlockSize32 = 128
	BitPackingBlockSize64 = 256
)

Variables

This section is empty.

Functions

func CompressDeltaBinPackInt32

func CompressDeltaBinPackInt32(in []int32, out []uint32) ([]int32, []uint32)

CompressDeltaBinPackInt32 compress blocks of 128 integers from `in` and append to `out`. `out` slice will be resized if necessary. The function returns the values from `in` that were not compressed (could not fit into a block), and the updated `out` slice.

Compression logic:

  1. Difference of consecutive inputs is computed (differential coding)
  2. ZigZag encoding is applied if a block contains at least one negative delta value
  3. The result is bit packed into the optimal number of bits for the block

func CompressDeltaBinPackInt64

func CompressDeltaBinPackInt64(in []int64, out []uint64) ([]int64, []uint64)

CompressDeltaBinPackInt64 compress blocks of 256 integers from `in` and append to `out`. `out` slice will be resized if necessary. The function returns the values from `in` that were not compressed (could not fit into a block), and the updated `out` slice.

Compression logic:

  1. Difference of consecutive inputs is computed (differential coding)
  2. ZigZag encoding is applied if a block contains at least one negative delta value
  3. The result is bit packed into the optimal number of bits for the block

func CompressDeltaBinPackUint32

func CompressDeltaBinPackUint32(in, out []uint32) ([]uint32, []uint32)

CompressDeltaBinPackUint32 compress blocks of 128 integers from `in` and append to `out`. `out` slice will be resized if necessary. The function returns the values from `in` that were not compressed (could not fit into a block), and the updated `out` slice.

Compression logic:

  1. Difference of consecutive inputs is computed (differential coding)
  2. ZigZag encoding is applied if a block contains at least one negative delta value
  3. The result is bit packed into the optimal number of bits for the block

func CompressDeltaBinPackUint64

func CompressDeltaBinPackUint64(in, out []uint64) ([]uint64, []uint64)

CompressDeltaBinPackUint64 compress blocks of 256 integers from `in` and append to `out`. `out` slice will be resized if necessary. The function returns the values from `in` that were not compressed (could not fit into a block), and the updated `out` slice.

Compression logic:

  1. Difference of consecutive inputs is computed (differential coding)
  2. ZigZag encoding is applied if a block contains at least one negative delta value
  3. The result is bit packed into the optimal number of bits for the block

func CompressDeltaVarByteInt32

func CompressDeltaVarByteInt32(in []int32, out []uint32) []uint32

CompressDeltaVarByteInt32 compress integers from `in` and append to `out`. `out` slice will be resized if necessary, the modified slice is returned.

Compression logic:

  1. Difference of consecutive inputs is computed (differential coding)
  2. Variable byte encoding is applied

func CompressDeltaVarByteInt64

func CompressDeltaVarByteInt64(in []int64, out []uint64) []uint64

CompressDeltaVarByteInt64 compress integers from `in` and append to `out`. `out` slice will be resized if necessary, the modified slice is returned.

Compression logic:

  1. Difference of consecutive inputs is computed (differential coding)
  2. Variable byte encoding is applied

func CompressDeltaVarByteUint32

func CompressDeltaVarByteUint32(in, out []uint32) []uint32

CompressDeltaVarByteUint32 compress integers from `in` and append to `out`. `out` slice will be resized if necessary, the modified slice is returned.

Compression logic:

  1. Difference of consecutive inputs is computed (differential coding)
  2. Variable byte encoding is applied

func CompressDeltaVarByteUint64

func CompressDeltaVarByteUint64(in, out []uint64) []uint64

CompressDeltaVarByteUint64 compress integers from `in` and append to `out`. `out` slice will be resized if necessary, the modified slice is returned.

Compression logic:

  1. Difference of consecutive inputs is computed (differential coding)
  2. Variable byte encoding is applied

func CompressInt32

func CompressInt32(in []int32, out []uint32) []uint32

CompressInt32 compress integers from `in` and append to `out`. `out` slice will be resized if necessary, the modified slice is returned.

Compression logic:

  1. Compress as many input as possible using `CompressDeltaBinPackInt32`
  2. Compress the remaining input with `CompressDeltaBinPackInt32`

func CompressInt64

func CompressInt64(in []int64, out []uint64) []uint64

CompressInt64 compress integers from `in` and append to `out`. `out` slice will be resized if necessary, the modified slice is returned.

Compression logic:

  1. Compress as many input as possible using `CompressDeltaBinPackInt64`
  2. Compress the remaining input with `CompressDeltaBinPackInt64`

func CompressUint32

func CompressUint32(in, out []uint32) []uint32

CompressUint32 compress integers from `in` and append to `out`. `out` slice will be resized if necessary, the modified slice is returned.

Compression logic:

  1. Compress as many input as possible using `CompressDeltaBinPackUint32`
  2. Compress the remaining input with `CompressDeltaBinPackUint32`

func CompressUint64

func CompressUint64(in, out []uint64) []uint64

CompressUint64 compress integers from `in` and append to `out`. `out` slice will be resized if necessary, the modified slice is returned.

Compression logic:

  1. Compress as many input as possible using `CompressDeltaBinPackUint64`
  2. Compress the remaining input with `CompressDeltaBinPackUint64`

func UncompressDeltaBinPackInt32

func UncompressDeltaBinPackInt32(in []uint32, out []int32) ([]uint32, []int32)

UncompressDeltaBinPackInt32 uncompress one ore more blocks of 128 integers from `in` and append the result to `out`. `out` slice will be resized if necessary. The function returns the values from `in` that were not uncompressed, and the updated `out` slice.

func UncompressDeltaBinPackInt64

func UncompressDeltaBinPackInt64(in []uint64, out []int64) ([]uint64, []int64)

UncompressDeltaBinPackInt64 uncompress one ore more blocks of 256 integers from `in` and append the result to `out`. `out` slice will be resized if necessary. The function returns the values from `in` that were not uncompressed, and the updated `out` slice.

func UncompressDeltaBinPackUint32

func UncompressDeltaBinPackUint32(in, out []uint32) ([]uint32, []uint32)

UncompressDeltaBinPackUint32 uncompress one ore more blocks of 128 integers from `in` and append the result to `out`. `out` slice will be resized if necessary. The function returns the values from `in` that were not uncompressed, and the updated `out` slice.

func UncompressDeltaBinPackUint64

func UncompressDeltaBinPackUint64(in, out []uint64) ([]uint64, []uint64)

UncompressDeltaBinPackUint64 uncompress one ore more blocks of 256 integers from `in` and append the result to `out`. `out` slice will be resized if necessary. The function returns the values from `in` that were not uncompressed, and the updated `out` slice.

func UncompressDeltaVarByteInt32

func UncompressDeltaVarByteInt32(in []uint32, out []int32) ([]uint32, []int32)

UncompressDeltaVarByteInt32 uncompress one block of integers from `in` (compressed with `CompressDeltaVarByteInt32`) and append the result to `out`. `out` slice will be resized if necessary.

The function returns the values from `in` that were not uncompressed, and the updated `out` slice.

func UncompressDeltaVarByteInt64

func UncompressDeltaVarByteInt64(in []uint64, out []int64) ([]uint64, []int64)

UncompressDeltaVarByteInt64 uncompress one block of integers from `in` (compressed with `CompressDeltaVarByteInt32`) and append the result to `out`. `out` slice will be resized if necessary.

The function returns the values from `in` that were not uncompressed, and the updated `out` slice.

func UncompressDeltaVarByteUint32

func UncompressDeltaVarByteUint32(in, out []uint32) ([]uint32, []uint32)

UncompressDeltaVarByteUint32 uncompress one block of integers from `in` (compressed with `CompressDeltaVarByteInt32`) and append the result to `out`. `out` slice will be resized if necessary.

The function returns the values from `in` that were not uncompressed, and the updated `out` slice.

func UncompressDeltaVarByteUint64

func UncompressDeltaVarByteUint64(in, out []uint64) ([]uint64, []uint64)

UncompressDeltaVarByteUint64 uncompress one block of integers from `in` (compressed with `CompressDeltaVarByteInt32`) and append the result to `out`. `out` slice will be resized if necessary.

The function returns the values from `in` that were not uncompressed, and the updated `out` slice.

func UncompressInt32

func UncompressInt32(in []uint32, out []int32) []int32

UncompressInt32 uncompress integers from `in` (compressed with `CompressInt32`) and append the result to `out`. `out` slice will be resized if necessary.

func UncompressInt64

func UncompressInt64(in []uint64, out []int64) []int64

UncompressInt64 uncompress integers from `in` (compressed with `CompressInt64`) and append the result to `out`. `out` slice will be resized if necessary.

func UncompressUint32

func UncompressUint32(in, out []uint32) []uint32

UncompressUint32 uncompress integers from `in` (compressed with `CompressUint32`) and append the result to `out`. `out` slice will be resized if necessary.

func UncompressUint64

func UncompressUint64(in, out []uint64) []uint64

UncompressUint64 uncompress integers from `in` (compressed with `CompressUint64`) and append the result to `out`. `out` slice will be resized if necessary.

Types

This section is empty.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL