encoding

package
v0.7.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package encoding handles the binary .pulse file format: reading, writing, and schema management.

Index

Constants

View Source
const FormatVersion byte = 0x01

FormatVersion is the current .pulse format version.

View Source
const HeaderSize = 9

HeaderSize is the total byte size of the file header (magic + version).

View Source
const MaxDecimalPrecision = 38

MaxDecimalPrecision is the upper bound on decimal128 precision.

View Source
const MaxDescriptionBytes = 1000

MaxDescriptionBytes is the maximum allowed byte length for a field description.

View Source
const MinDecimalScale = 4

MinDecimalScale is the floor on division result scale (matches Arrow).

Variables

View Source
var MagicBytes = [8]byte{'P', 'U', 'L', 'S', 'E', 0x00, 0x00, 0x00}

MagicBytes identifies a .pulse file. 8 bytes: "PULSE\x00\x00\x00"

Functions

func CrossesAntimeridian added in v0.2.0

func CrossesAntimeridian(pts []PointF64) bool

CrossesAntimeridian reports whether a set of points spans the antimeridian, defined per the plan as any pair of points with |lon_a - lon_b| > 180.

func EncodeDecimal128 added in v0.2.0

func EncodeDecimal128(d Decimal128) [16]byte

EncodeDecimal128 serializes a Decimal128 as 16 bytes of two's-complement little-endian integer.

func EncodeH3Cell added in v0.2.0

func EncodeH3Cell(c H3Cell) [8]byte

EncodeH3Cell serializes an H3 cell as 8 little-endian bytes.

func EncodePointF64 added in v0.2.0

func EncodePointF64(p PointF64) [16]byte

EncodePointF64 packs the point into 16 little-endian bytes: lat, lon.

func FormatH3CellHex added in v0.2.0

func FormatH3CellHex(c H3Cell) string

FormatH3CellHex returns the canonical 15-char hex representation.

func FormatWKTPoint added in v0.2.0

func FormatWKTPoint(p PointF64) string

FormatWKTPoint renders a PointF64 as `POINT(lon lat)`.

func HaversineMeters added in v0.2.0

func HaversineMeters(a, b PointF64) float64

func NullDecimalSentinel added in v0.2.0

func NullDecimalSentinel() [16]byte

NullDecimalSentinel returns a copy of the canonical 16-byte NULL pattern for nullable_decimal128 fields.

func PromoteAdd added in v0.2.0

func PromoteAdd(p1, s1, p2, s2 uint8) (uint8, uint8)

PromoteAdd returns the (precision, scale) of a SUM/SUB result given two operand types per SQL:2016 / Arrow Decimal128 rules:

(p1, s1) ± (p2, s2) => (max(p1-s1, p2-s2) + max(s1, s2) + 1, max(s1, s2))

The result precision is clamped at MaxDecimalPrecision; clamping callers must check ClampedPrecision and emit PULSE_DECIMAL_OVERFLOW when overflow surfaces at runtime.

func PromoteDiv added in v0.2.0

func PromoteDiv(p1, s1, p2, s2 uint8) (uint8, uint8)

PromoteDiv returns the (precision, scale) of a DIV result.

(p1, s1) ÷ (p2, s2) => (p1 + s2 + 1, max(s1+s2, MIN_SCALE))

func PromoteMul added in v0.2.0

func PromoteMul(p1, s1, p2, s2 uint8) (uint8, uint8)

PromoteMul returns the (precision, scale) of a MUL result.

(p1, s1) × (p2, s2) => (p1 + p2, s1 + s2)

func ReadBit

func ReadBit(r io.Reader, bitPos uint) (bool, error)

ReadBit reads a single bit from the byte at the current position in r. bitPos is 0-7 within that byte.

func ReadDescription

func ReadDescription(r io.Reader) (string, error)

ReadDescription reads a field description from r. Format: u16 length + utf8 bytes.

func ReadFieldValue

func ReadFieldValue(r io.Reader, ft FieldType) (uint64, error)

ReadFieldValue reads a single field value from r, returning raw bits as uint64. For packed types (PackedBool, NullableBool, NullableU4), use ReadBit/ReadNibble instead.

func ReadHeader

func ReadHeader(r io.Reader) error

ReadHeader reads and validates the .pulse file header from r.

func ReadNibble

func ReadNibble(r io.Reader, high bool) (uint8, error)

ReadNibble reads a 4-bit value from a byte. If high is true, it reads bits 4-7; otherwise bits 0-3.

func ValidatePrecisionScale added in v0.2.0

func ValidatePrecisionScale(precision, scale uint8) error

ValidatePrecisionScale reports whether (precision, scale) form a legal decimal128 type spec (1 ≤ precision ≤ 38, 0 ≤ scale ≤ precision).

func WriteBit

func WriteBit(w io.Writer, bitPos uint, val bool) error

WriteBit writes a byte with a single bit set or cleared at bitPos.

func WriteDecimal128 added in v0.2.0

func WriteDecimal128(w io.Writer, d Decimal128) error

WriteDecimal128 writes a Decimal128 to the .pulse record stream.

func WriteDecimal128Null added in v0.2.0

func WriteDecimal128Null(w io.Writer) error

WriteDecimal128Null writes the nullable_decimal128 NULL sentinel.

func WriteDescription

func WriteDescription(w io.Writer, desc string) error

WriteDescription writes a field description to w. Format: u16 length + utf8 bytes. Returns PULSE_IMPORT_DESCRIPTION_TOO_LONG if the UTF-8 byte length exceeds MaxDescriptionBytes.

func WriteFieldValue

func WriteFieldValue(w io.Writer, ft FieldType, val uint64) error

WriteFieldValue writes a single field value (as raw bits in uint64) to w. For packed types (PackedBool, NullableBool, NullableU4), use WriteBit/WriteNibble instead. For decimal128 / nullable_decimal128 / point_f64 (16-byte types), use the dedicated WriteDecimal128 / WritePointF64 helpers; this function will reject those types with ENCODING_TYPE_MISMATCH.

func WriteH3Cell added in v0.2.0

func WriteH3Cell(w io.Writer, c H3Cell) error

WriteH3Cell writes an H3 cell to the record stream.

func WriteHeader

func WriteHeader(w io.Writer) error

WriteHeader writes the .pulse file header to w.

func WriteNibble

func WriteNibble(w io.Writer, high bool, val uint8) error

WriteNibble writes a 4-bit value into a byte. If high is true, it writes to bits 4-7; otherwise bits 0-3. The other nibble is zero.

func WritePointF64 added in v0.2.0

func WritePointF64(w io.Writer, p PointF64) error

WritePointF64 writes a point to the record stream.

func WriteSchema

func WriteSchema(w io.Writer, s *Schema) error

WriteSchema serializes a schema to w. Format:

u16 field_count
per field:
  u8 type
  u16 name_length + utf8 name
  u32 byte_offset
  u8 bit_position
  u16 csv_column_idx
  u16 description_length + utf8 description
  (if categorical) dictionary block

Types

type Decimal128 added in v0.2.0

type Decimal128 struct {
	// contains filtered or unexported fields
}

Decimal128 is a fixed-point decimal value with up to 38 digits of precision. Logically it is a signed 128-bit two's-complement integer mantissa, paired with a per-field scale that places the implicit decimal point. Internally it is carried as *big.Int (always copied, never aliased) bounded to the decimal128 representable range [-(10^38 - 1), 10^38 - 1].

func DecodeDecimal128 added in v0.2.0

func DecodeDecimal128(buf [16]byte) (Decimal128, bool)

DecodeDecimal128 deserializes 16 bytes of two's-complement little-endian integer into a Decimal128. Returns the null sentinel detection result in the second return value (true == NULL bit pattern).

func NewDecimal128FromBigInt added in v0.2.0

func NewDecimal128FromBigInt(m *big.Int) (Decimal128, error)

NewDecimal128FromBigInt builds a Decimal128 from a big.Int mantissa. Returns PULSE_DECIMAL_OVERFLOW if |m| >= 10^38.

func NewDecimal128FromInt added in v0.2.0

func NewDecimal128FromInt(i int64) Decimal128

NewDecimal128FromInt builds a Decimal128 with the given integer mantissa.

func ParseDecimal128 added in v0.2.0

func ParseDecimal128(s string) (Decimal128, uint8, error)

ParseDecimal128 parses a strict decimal string into a (Decimal128, scale) pair. The scale is inferred from the input — exactly the count of digits after the decimal point. Accepts an optional single leading sign char. Rejects whitespace, thousand separators, scientific notation, currency symbols.

func ReadDecimal128 added in v0.2.0

func ReadDecimal128(r io.Reader) (Decimal128, bool, error)

ReadDecimal128 reads 16 bytes of decimal128 from r and decodes them. Returns the value and a flag indicating whether the bit pattern matches the NULL sentinel.

func ZeroDecimal128 added in v0.2.0

func ZeroDecimal128() Decimal128

ZeroDecimal128 returns a Decimal128 with mantissa 0.

func (Decimal128) Add added in v0.2.0

func (d Decimal128) Add(o Decimal128) (Decimal128, error)

Add returns d + o, treating both at the same scale. The result has the same scale; the caller is responsible for SQL:2016 precision propagation at the schema level.

func (Decimal128) Cmp added in v0.2.0

func (d Decimal128) Cmp(o Decimal128) int

Cmp compares two Decimal128 values with the same scale. Comparing values at different scales requires the caller to align scales first.

func (Decimal128) Div added in v0.2.0

func (d Decimal128) Div(o Decimal128, s1, s2, resultScale uint8) (Decimal128, error)

Div divides d by o using banker's rounding to produce a result at scale `resultScale` given the operands at scales s1 and s2. Returns PULSE_DECIMAL_DIVIDE_BY_ZERO when o is zero.

func (Decimal128) FitsPrecision added in v0.2.0

func (d Decimal128) FitsPrecision(precision uint8) bool

FitsPrecision reports whether the mantissa fits in `precision` digits.

func (Decimal128) Float64 added in v0.2.0

func (d Decimal128) Float64(scale uint8) float64

Float64 returns the decimal as a float64 at the given scale. Lossy for values that exceed float64 precision; callers preserving precision should keep using Decimal128 directly.

func (Decimal128) Mantissa added in v0.2.0

func (d Decimal128) Mantissa() *big.Int

Mantissa returns a copy of the unscaled integer mantissa.

func (Decimal128) Mul added in v0.2.0

func (d Decimal128) Mul(o Decimal128) (Decimal128, error)

Mul returns d * o. The product mantissa is the product of mantissas; callers are responsible for tracking scale propagation (s1 + s2).

func (Decimal128) Rescale added in v0.2.0

func (d Decimal128) Rescale(sourceScale, targetScale uint8) (Decimal128, error)

Rescale converts d at sourceScale to targetScale using banker's rounding. Returns PULSE_DECIMAL_OVERFLOW if the rescaled mantissa exceeds 10^38-1.

func (Decimal128) Sign added in v0.2.0

func (d Decimal128) Sign() int

Sign returns -1, 0, or +1 depending on the mantissa sign.

func (Decimal128) Sqrt added in v0.2.0

func (d Decimal128) Sqrt(sourceScale, targetScale uint8) (Decimal128, error)

Sqrt returns floor-banker-rounded sqrt(d) at the target scale, given the source value at sourceScale. The result is computed entirely in decimal arithmetic via big.Int.Sqrt and a half-to-even rounding step on the residual. Returns PULSE_DECIMAL_OVERFLOW if intermediate state cannot fit in the integer representation; returns PROCESSING_RUNTIME for negative inputs (sqrt of a negative decimal is undefined).

func (Decimal128) String added in v0.2.0

func (d Decimal128) String(scale uint8) string

String renders the decimal at the given scale. Trailing zeros are preserved; the leading sign is included only for negative values.

func (Decimal128) Sub added in v0.2.0

func (d Decimal128) Sub(o Decimal128) (Decimal128, error)

Sub returns d - o.

type Dictionary

type Dictionary struct {
	// contains filtered or unexported fields
}

Dictionary maps string values to sequential uint32 IDs. It is used for categorical field types to encode string categories as compact integer IDs in the binary record format.

func NewDictionary

func NewDictionary() *Dictionary

NewDictionary creates an empty dictionary.

func (*Dictionary) Add

func (d *Dictionary) Add(s string) (uint32, error)

Add inserts a string into the dictionary, returning its ID. If the string already exists, the existing ID is returned. There is no capacity limit with Add; use AddWithLimit to enforce one.

func (*Dictionary) AddWithLimit

func (d *Dictionary) AddWithLimit(s string, maxEntries uint32) (uint32, error)

AddWithLimit inserts a string, enforcing a maximum entry count. Returns PULSE_IMPORT_CATEGORICAL_OVERFLOW if the dictionary is full.

func (*Dictionary) Count

func (d *Dictionary) Count() int

Count returns the number of entries in the dictionary.

func (*Dictionary) IDFor

func (d *Dictionary) IDFor(s string) (uint32, bool)

IDFor looks up the ID for a string. Returns the ID and true if found, or 0 and false otherwise.

func (*Dictionary) ReadFrom

func (d *Dictionary) ReadFrom(r io.Reader) (int64, error)

ReadFrom deserializes a dictionary from r, replacing current contents. Format: u32 count + (u16 strlen + utf8 bytes) x count

Performance: a single byte buffer is grown to hold all string payloads across the dictionary, instead of allocating one []byte per entry. Each resolved string is copied once via the standard string([]byte) conversion (no unsafe), so we drop one allocation per entry while preserving the safety contract that callers can mutate the underlying buffer afterwards without affecting the stored strings.

func (*Dictionary) Resolve

func (d *Dictionary) Resolve(id uint32) string

Resolve returns the string for a given ID. Returns "" if the ID is out of range.

func (*Dictionary) Values

func (d *Dictionary) Values() []string

Values returns a copy of all dictionary values in insertion order.

func (*Dictionary) WriteTo

func (d *Dictionary) WriteTo(w io.Writer) (int64, error)

WriteTo serializes the dictionary to w. Format: u32 count + (u16 strlen + utf8 bytes) x count

type Field

type Field struct {
	Name         string
	Type         FieldType
	ByteOffset   int
	BitPosition  int
	CsvColumnIdx int
	Description  string      // empty = synthesized at inspect time
	Dictionary   *Dictionary // non-nil only for categorical types

	// Precision is the decimal128 precision (1-38). Meaningful only when
	// Type is FieldTypeDecimal128 or FieldTypeNullableDecimal128.
	Precision uint8
	// Scale is the decimal128 scale (0-Precision). Meaningful only when
	// Type is FieldTypeDecimal128 or FieldTypeNullableDecimal128.
	Scale uint8
	// H3Resolution is the native cell resolution recorded at import time
	// (0-15). Meaningful only when Type is FieldTypeH3Cell. The sentinel
	// 0xFF means "unspecified" and is treated as missing metadata.
	H3Resolution uint8
}

Field describes a single column in a .pulse schema.

type FieldType

type FieldType byte

FieldType identifies the data type stored in a schema field.

const (
	FieldTypeU8                 FieldType = iota // 0
	FieldTypeU16                                 // 1
	FieldTypeU32                                 // 2
	FieldTypeU64                                 // 3
	FieldTypeF32                                 // 4
	FieldTypeF64                                 // 5
	FieldTypeNullableBool                        // 6
	FieldTypeNullableU4                          // 7
	FieldTypeNullableU8                          // 8
	FieldTypeNullableU16                         // 9
	FieldTypeDate                                // 10
	FieldTypePackedBool                          // 11
	FieldTypeCategoricalU8                       // 12
	FieldTypeCategoricalU16                      // 13
	FieldTypeCategoricalU32                      // 14
	FieldTypeDecimal128                          // 15
	FieldTypeNullableDecimal128                  // 16
	FieldTypePointF64                            // 17
	FieldTypeH3Cell                              // 18

)

All 19 field types supported by the .pulse format.

func (FieldType) ByteSize

func (ft FieldType) ByteSize() int

ByteSize returns the number of bytes this field type occupies in a record. Packed types (PackedBool, NullableBool, NullableU4) share bytes with adjacent fields via bit packing and return 0 here.

func (FieldType) IsCategorical

func (ft FieldType) IsCategorical() bool

IsCategorical reports whether the field type is one of the categorical types.

func (FieldType) IsDecimal added in v0.2.0

func (ft FieldType) IsDecimal() bool

IsDecimal reports whether the field type is a decimal128 variant.

func (FieldType) IsGeo added in v0.2.0

func (ft FieldType) IsGeo() bool

IsGeo reports whether the field type is a geospatial type.

func (FieldType) IsKnown added in v0.2.0

func (ft FieldType) IsKnown() bool

IsKnown reports whether the byte value corresponds to a registered type. Used by the schema reader to reject files written by a future binary version that introduces unknown type bytes.

func (FieldType) MaxCategoricalEntries

func (ft FieldType) MaxCategoricalEntries() uint32

MaxCategoricalEntries returns the maximum dictionary size for a categorical type. Returns 0 for non-categorical types.

func (FieldType) String

func (ft FieldType) String() string

String returns a human-readable name for the field type.

type H3Cell added in v0.2.0

type H3Cell uint64

H3Cell wraps a 64-bit Uber H3 cell index. The underlying type is uint64; the type alias documents intent at the call site without adding runtime cost.

func DecodeH3Cell added in v0.2.0

func DecodeH3Cell(buf [8]byte) H3Cell

DecodeH3Cell deserializes 8 little-endian bytes.

func ParseH3CellHex added in v0.2.0

func ParseH3CellHex(s string) (H3Cell, error)

ParseH3CellHex parses a 15-character lowercase hex H3 string into an H3Cell. The H3 library treats this as the canonical string form. Validation (IsValid) is left to the caller — typical callers use the h3-go runtime to validate.

func ReadH3Cell added in v0.2.0

func ReadH3Cell(r io.Reader) (H3Cell, error)

ReadH3Cell reads an H3 cell from the record stream.

type PointF64 added in v0.2.0

type PointF64 struct {
	Lat float64
	Lon float64
}

PointF64 is a (lat, lon) pair stored as two LE float64s. Values are in degrees. For non-geo Cartesian usage the same struct can be read as (Y, X) — the encoding is symmetric in the two fields.

func CentroidUnitSphere added in v0.2.0

func CentroidUnitSphere(pts []PointF64) (PointF64, bool)

CentroidUnitSphere computes the 3D unit-sphere centroid of a point set, converting each point to (x, y, z) on the unit sphere, summing, normalizing, and converting back to (lat, lon). Correct at poles and across the antimeridian.

func DecodePointF64 added in v0.2.0

func DecodePointF64(buf [16]byte) PointF64

DecodePointF64 unpacks 16 little-endian bytes into a PointF64.

func ParseWKTPoint added in v0.2.0

func ParseWKTPoint(s string) (PointF64, error)

ParseWKTPoint parses a WKT POINT(lon lat) string into a PointF64. Note WKT order is lon-first; this function flips them so the returned PointF64 has Lat in .Lat and Lon in .Lon.

func ReadPointF64 added in v0.2.0

func ReadPointF64(r io.Reader) (PointF64, error)

ReadPointF64 reads a point from the record stream.

func (PointF64) Validate added in v0.2.0

func (p PointF64) Validate() error

ValidatePoint reports whether the point is in the legal lat/lon range. |lat| ≤ 90 and |lon| ≤ 180.

type Polygon added in v0.2.0

type Polygon struct {
	Ring []PointF64
}

Polygon is a closed simple polygon represented as a single outer ring. MULTIPOLYGON and inner rings (holes) are not supported in v1.

func ParseWKTPolygon added in v0.2.0

func ParseWKTPolygon(s string) (*Polygon, error)

ParseWKTPolygon parses a WKT POLYGON((lon lat, lon lat, ..., lon lat)) into a Polygon. Rejects MULTIPOLYGON, polygons with inner rings, and non-closed rings (first and last vertex must match).

func (*Polygon) Contains added in v0.2.0

func (poly *Polygon) Contains(p PointF64) bool

Contains reports whether p is inside or on the boundary of the polygon. Uses the standard ray-cast (even-odd) test on (lon, lat). Edge cases: points on a vertical edge of the ring are reported as inside via the half-open edge convention, which is consistent with shapely.

type RecordReader

type RecordReader struct {
	// contains filtered or unexported fields
}

RecordReader reads records one at a time from a binary stream. It reads directly from the io.Reader without buffering the entire file.

func NewRecordReader

func NewRecordReader(r io.Reader, schema *Schema) *RecordReader

NewRecordReader creates a RecordReader. The reader must be positioned immediately after the header and schema (i.e., at the first record byte).

func (*RecordReader) ReadRecord

func (rr *RecordReader) ReadRecord(values map[string]float64, nulls map[string]bool) error

ReadRecord reads a single record from the stream, populating the values and nulls maps. Returns io.EOF when no more records are available.

The caller provides pre-allocated maps to avoid per-record allocation. Maps are cleared at the start of each call.

Reuse contract: the maps are owned by the caller. ReadRecord does not retain references to them after returning. If the caller plans to reuse the same maps across calls (the typical pattern), they must consume the populated values BEFORE invoking ReadRecord again, because the next call clears and repopulates the maps in-place. If the caller needs to retain the values past the next call (e.g., collecting Records into a slice for later aggregation), it must pass distinct map instances per record OR copy the contents out before the next ReadRecord call.

To populate typed wide values for fields whose representation does not fit in float64 (decimal128, point_f64, h3_cell), call ReadRecordWithWide instead and pass a third map.

func (*RecordReader) ReadRecordReused added in v0.2.0

func (rr *RecordReader) ReadRecordReused(rec ReusableRecord) error

ReadRecordReused reads one record into an existing ReusableRecord, reusing the record's internal maps. Returns io.EOF when the underlying reader is exhausted.

Hot path semantics:

  • Caller MUST consume the populated rec before the next call.
  • Fixed-size numeric fields are decoded via a single stack-resident [16]byte scratch + binary.LittleEndian, avoiding the per-field allocation of binary.Read's internal buffer.
  • Bit-packed and 16-byte fields fall back to the existing typed readers.

func (*RecordReader) ReadRecordWithWide added in v0.2.0

func (rr *RecordReader) ReadRecordWithWide(values map[string]float64, nulls map[string]bool, wide map[string]any) error

ReadRecordWithWide reads a record and populates a wide map with typed values for decimal128, point_f64, and h3_cell fields. The wide map may be nil to skip wide population.

type ReusableRecord added in v0.2.0

type ReusableRecord interface {
	SetNumeric(name string, value float64)
	SetNullField(name string)
	SetWideField(name string, value any)
	ClearForRow()
}

ReusableRecord is the subset of *processing.Record needed by the reuse fast path. Declared here as an interface so encoding/ does not depend on processing/. Implementations (processing.Record) MUST clear their own null/wide maps before this call returns successfully; the reader populates them in place but only on fields where the value applies.

type Schema

type Schema struct {
	Fields []Field
}

Schema holds all field descriptors for a .pulse file.

func ReadSchema

func ReadSchema(r io.Reader) (*Schema, error)

ReadSchema deserializes a schema from r.

func (*Schema) Categorical

func (s *Schema) Categorical(name string) (*Dictionary, bool)

Categorical returns the dictionary for a named categorical field. Returns nil, false if the field is not found or is not categorical.

func (*Schema) Field

func (s *Schema) Field(name string) *Field

Field returns a pointer to the named field, or nil if not found.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL