recordops

package
v0.45.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 2, 2026 License: MIT Imports: 12 Imported by: 0

README

recordops

Pure, dependency-free analytical helpers over collections of dalgo records. The first capability is a streaming K-way merge diff across one baseline recordset and N candidate recordsets, with renderers for git-style and cross-candidate output. The godoc is the reference; this README is a quick-start.

API surface

Entrypoints:

  • Diff[K cmp.Ordered](baseline, candidates, opts...) — diff with native < ordering.
  • DiffFunc[K comparable](baseline, candidates, less, opts...) — diff with a caller-supplied strict weak order (use for [16]byte UUIDs etc.).

Options:

  • WithIncludeMatched() — emit an IDDiff for every touched ID, not just divergent ones.
  • WithOnlyChangedFields() — trim IDDiff.Baseline.Fields to only fields that have a delta.
  • WithIgnoreFields(names...) — drop named fields (e.g. "UpdatedAt") from comparison.
  • WithAbsentEqualsNil() — treat field-absent and field-with-nil-value as equivalent.

Bridge helpers:

  • SliceToSeq(records) — wrap an already-sorted slice as a RecordSeq. Does NOT sort.
  • ReaderToSeq(reader, idOf) — adapt a dal.RecordsReader to a RecordSeq (closes the reader on completion).

Renderers:

  • RenderYAMLGitStyle(diffs, candidateIndex, name) — per-candidate git-diff view.
  • RenderYAMLByID(diffs, name) — cross-candidate divergence view, one block per ID.
  • RenderYAML(diffs, name) / RenderJSON(diffs, name) — structured serialization.

End-to-end example

Compare a baseline of three users against one candidate that lacks u1, adds u2, and renames u3. The same diff stream feeds two renderers via intermediate materialization.

package main

import (
    "fmt"

    "github.com/dal-go/dalgo/dal"
    "github.com/dal-go/dalgo/record"
    "github.com/dal-go/dalgo/recordops"
)

func mk(id string, data map[string]any) record.WithID[string] {
    r := dal.NewRecordWithData(dal.NewKeyWithID("Users", id), data)
    r.SetError(nil)
    return record.WithID[string]{ID: id, Record: r}
}

func main() {
    // Inputs MUST already be sorted ascending by ID.
    baseline := recordops.SliceToSeq([]record.WithID[string]{
        mk("u1", map[string]any{}),
        mk("u3", map[string]any{"first_name": "Alex"}),
    })
    cand := recordops.SliceToSeq([]record.WithID[string]{
        mk("u2", map[string]any{"first_name": "Jack", "gender": "male"}),
        mk("u3", map[string]any{"first_name": "Alexander"}),
    })

    // Materialize once so we can feed multiple renderers.
    var diffs []recordops.IDDiff[string]
    for d, err := range recordops.Diff[string](baseline, []recordops.RecordSeq[string]{cand}) {
        if err != nil {
            panic(err)
        }
        diffs = append(diffs, d)
    }
    replay := func(yield func(recordops.IDDiff[string], error) bool) {
        for _, d := range diffs {
            if !yield(d, nil) {
                return
            }
        }
    }

    gitStyle, _ := recordops.RenderYAMLGitStyle[string](replay, 0, "users")
    fmt.Print(gitStyle)

    byID, _ := recordops.RenderYAMLByID[string](replay, "users")
    fmt.Print(byID)
}

RenderYAMLGitStyle produces the per-candidate view:

users:
- u1
+ u2:
    first_name: Jack
    gender: male
u3:
-   first_name: Alex
+   first_name: Alexander

RenderYAMLByID produces the cross-candidate view (one block per ID, each showing baseline plus each candidate's status and deltas).

Notes

  • Callers MUST sort input streams ascending by ID. There is no internal sort — that's the price of streaming. Monotonicity is validated per stream; violations fail with ErrUnsortedInput and duplicates fail with ErrDuplicateID.
  • Diff vs. DiffFunc. Diff requires K cmp.Ordered (strings, ints, floats). For keys that are comparable but not orderable (e.g. [16]byte UUIDs), use DiffFunc with an explicit less such as bytes.Compare(a[:], b[:]) < 0.
  • Streams are single-pass. Renderers consume the diff stream exactly once. If you need to feed multiple renderers, materialize first (collect into a slice and rewrap, as shown above).
  • Memory footprint. O(N) records at any moment (one current per stream) plus the in-flight IDDiff.

Reference

  • Feature spec
  • Godoc: go doc github.com/dal-go/dalgo/recordops

Documentation

Overview

specscore: feat-recordops/diff

specscore: feat-recordops/diff

specscore: feat-recordops/diff

Package recordops provides pure, dependency-free analytical helpers over collections of dalgo records.

The first and only capability in MVP is Diff (and its sibling DiffFunc) — a streaming, single-pass comparison of one baseline recordset against N candidate recordsets. Inputs are pull-based iter.Seq2 streams that MUST be sorted ascending by ID. Output is also iter.Seq2: one IDDiff per ID where at least one candidate diverges from baseline. Use WithIncludeMatched to emit fully matched IDs too.

The algorithm is a K-way merge over the N+1 input streams. Memory footprint at any point: O(N) records (one current per stream) plus the in-flight IDDiff being yielded.

Each IDDiff carries the baseline snapshot once (the single source of truth) and per-candidate deltas — never duplicates of baseline values across candidates.

Renderers translate the structured stream into output formats: RenderYAMLGitStyle (per-candidate git-diff style — the visual anchor that matches the source idea spec/ideas/recordops.md), RenderYAMLByID (cross-candidate divergence view), RenderYAML and RenderJSON (structured serialization).

Renderers consume the input stream exactly once; consumers that need multiple views must materialize first via slices.Collect or equivalent.

specscore: feat-recordops/diff

specscore: feat-recordops/diff

specscore: feat-recordops/diff

specscore: feat-recordops/diff

specscore: feat-recordops/diff

specscore: feat-recordops/diff

specscore: feat-recordops/diff

specscore: feat-recordops/diff

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrDuplicateID = errors.New("recordops: duplicate ID in input stream")

ErrDuplicateID indicates an input stream yielded two records with the same ID. Within a single stream, IDs must be unique.

View Source
var ErrIncomparableField = errors.New("recordops: incomparable field")

ErrIncomparableField indicates field comparison via reflect.DeepEqual panicked (e.g., a func or chan field). The panic is recovered and surfaced as a stream error wrapping this sentinel.

View Source
var ErrInvalidArgument = errors.New("recordops: invalid argument")

ErrInvalidArgument indicates a programmer error in calling Diff/DiffFunc (e.g., nil less function passed to DiffFunc).

View Source
var ErrUnsortedInput = errors.New("recordops: input stream not sorted ascending by ID")

ErrUnsortedInput indicates an input stream yielded a record whose ID is not strictly greater than the previously yielded ID from the same stream. Diff requires ID-sorted input streams.

Functions

func Diff

func Diff[K cmp.Ordered](
	baseline RecordSeq[K],
	candidates []RecordSeq[K],
	opts ...Option,
) iter.Seq2[IDDiff[K], error]

Diff compares baseline against candidates via K-way merge over ID-sorted streams and yields one IDDiff per ID where at least one candidate diverges (default) or every ID touched by any input (with WithIncludeMatched).

Inputs MUST be sorted ascending by ID. Monotonicity is validated per stream; violations terminate with ErrUnsortedInput. Duplicate IDs within a stream terminate with ErrDuplicateID. Upstream stream errors propagate verbatim.

Diff requires K to be cmp.Ordered (string/int/float/etc.). For types that are comparable but not orderable (e.g., [16]byte UUIDs), use DiffFunc with an explicit less function.

See Package recordops doc for the K-way merge model and memory footprint. See spec/features/recordops/diff for the full contract.

Renderers consume the returned stream once; multi-view consumers must materialize first (slices.Collect-equivalent).

func DiffFunc

func DiffFunc[K comparable](
	baseline RecordSeq[K],
	candidates []RecordSeq[K],
	less func(a, b K) bool,
	opts ...Option,
) iter.Seq2[IDDiff[K], error]

DiffFunc is Diff for any K comparable, with caller-supplied strict weak order. For UUID-keyed records typed as [16]byte, pass bytes.Compare(a[:], b[:]) < 0 as less.

less MUST be a strict weak order (irreflexive, antisymmetric, transitive). If less is nil, the returned stream yields exactly one (zero, ErrInvalidArgument) and stops.

Example

ExampleDiffFunc shows the canonical DiffFunc use case: comparing two recordsets keyed by [16]byte UUIDs, where ID ordering is provided by bytes.Compare instead of cmp.Ordered. The baseline has u1 only; the candidate has u2 only — one Missing emission and one Extra emission.

package main

import (
	"bytes"
	"encoding/hex"
	"fmt"

	"github.com/dal-go/dalgo/dal"
	"github.com/dal-go/dalgo/record"
	"github.com/dal-go/dalgo/recordops"
)

func main() {
	type uuid = [16]byte
	u1 := uuid{0x01}
	u2 := uuid{0x02}

	mk := func(id uuid) record.WithID[uuid] {
		key := dal.NewKeyWithID("Users", hex.EncodeToString(id[:]))
		r := dal.NewRecordWithData(key, map[string]any{"name": "alice"})
		r.SetError(nil)
		return record.WithID[uuid]{ID: id, Record: r}
	}

	// Inputs MUST be sorted ascending by ID.
	baseline := recordops.SliceToSeq([]record.WithID[uuid]{mk(u1)})
	cand := recordops.SliceToSeq([]record.WithID[uuid]{mk(u2)})

	less := func(a, b uuid) bool { return bytes.Compare(a[:], b[:]) < 0 }

	for d, err := range recordops.DiffFunc[uuid](
		baseline,
		[]recordops.RecordSeq[uuid]{cand},
		less,
	) {
		if err != nil {
			fmt.Println("err:", err)
			return
		}
		fmt.Printf("id=%s status=%d\n", hex.EncodeToString(d.ID[:]), d.Candidates[0].Status)
	}

}
Output:
id=01000000000000000000000000000000 status=0
id=02000000000000000000000000000000 status=1

func RenderJSON

func RenderJSON[K comparable](
	diffs iter.Seq2[IDDiff[K], error],
	collectionName string,
) (string, error)

RenderJSON serializes the entire diff stream as a JSON document with a single top-level key matching collectionName, whose value is the array of IDDiff entries in stream order.

The diff stream is consumed exactly once. If the stream yields an error, RenderJSON returns ("", err) verbatim. If nothing was emitted, the output is a one-key object with an empty array, e.g. {"users":[]}.

Output is deterministic for a given input: encoding/json marshals slices in order and the wrapper map has a single key. Indented for diffability.

RecordStatus is serialized as its numeric int8 value (0=Missing, 1=Extra, 2=Matched, 3=Changed). Consumers needing the string form should map the int themselves; this keeps the renderer faithful to the wire types.

FieldValue's `absent` flag round-trips natively via the json struct tag, preserving the Absent vs. nil-value distinction.

func RenderYAML

func RenderYAML[K comparable](
	diffs iter.Seq2[IDDiff[K], error],
	collectionName string,
) (string, error)

RenderYAML serializes the entire diff stream as a YAML document with a single top-level key matching collectionName, whose value is the sequence of IDDiff entries in stream order.

The diff stream is consumed exactly once. If the stream yields an error, RenderYAML returns ("", err) verbatim. If nothing was emitted, the output is a one-key mapping with an empty sequence.

Output is deterministic for a given input: yaml.v3 marshals slices in order and the wrapper map has a single key.

RecordStatus is serialized as its numeric int8 value (0=Missing, 1=Extra, 2=Matched, 3=Changed). Consumers needing the string form should map the int themselves; this keeps the renderer faithful to the wire types.

FieldValue's `absent` flag round-trips natively via the yaml struct tag, preserving the Absent vs. nil-value distinction.

func RenderYAMLByID

func RenderYAMLByID[K comparable](
	diffs iter.Seq2[IDDiff[K], error],
	collectionName string,
) (string, error)

RenderYAMLByID emits the cross-candidate divergence view — one block per emitted IDDiff in the stream, showing baseline (if present) and each candidate (in index order) with its status and any deltas.

The top-level YAML container is keyed by collectionName. Each ID maps to a block with an optional baseline section and a candidates section keyed by stringified integer index ("0", "1", ...).

Per-candidate field encoding:

  • Changed candidates emit a fields map. A normal value delta renders as {new: <value>}. A field absent from the candidate (FieldValue.Absent == true) renders as {absent: true} — structurally distinct from a real nil value, which renders as YAML null inside {new: null}.

The renderer consumes the stream ONCE. Callers wanting multiple renders of the same Diff result must materialize first. If the stream yields a (zero, err) pair, RenderYAMLByID returns ("", err).

Output is valid YAML and is deterministic for a given input stream. Empty streams still emit a valid empty mapping: "<collectionName>: {}\n".

func RenderYAMLGitStyle

func RenderYAMLGitStyle[K comparable](
	diffs iter.Seq2[IDDiff[K], error],
	candidateIndex int,
	collectionName string,
) (string, error)

RenderYAMLGitStyle renders a single candidate's diff view as a YAML-shaped string with git-diff markers ("- " for missing IDs, "+ " for extra IDs, and per-field "- " / "+ " lines for changed records).

The diff stream is consumed exactly once. Callers needing multi-view rendering must materialize the stream first.

Matched candidates and candidateIndex values outside the [0, len(Candidates)) range are silently skipped. If the stream yields an error, RenderYAMLGitStyle returns ("", err) verbatim. If nothing was emitted, the output is "<collectionName>: {}\n" — an explicit empty collection.

Types

type CandidateState

type CandidateState struct {
	Status RecordStatus
	Fields []FieldValue
}

CandidateState describes one candidate's state for one ID:

  • Status: Missing | Extra | Matched | Changed
  • Fields: deltas only — never duplicates baseline values. See the per-Status semantics in the Feature spec (spec/features/recordops/diff/ REQ id-diff-shape).

type FieldValue

type FieldValue struct {
	Name   string `json:"name"             yaml:"name"`
	Value  any    `json:"value,omitempty"  yaml:"value,omitempty"`
	Absent bool   `json:"absent,omitempty" yaml:"absent,omitempty"`
}

FieldValue is used in BOTH RecordSnapshot.Fields and CandidateState.Fields. In RecordSnapshot.Fields, Value is the baseline's value for Name; Absent is always false. In CandidateState.Fields, Value is the candidate's value (only for Extra and Changed statuses; Missing and Matched have Fields == nil). When a field exists in baseline but is absent from a Changed candidate's record, Absent is true and Value is the zero value — consumers MUST NOT interpret Value when Absent is true. This is structurally distinct from Value == nil with Absent == false (a real Go-nil value the candidate explicitly holds).

Name may be empty for future helpers that ingest positional/unnamed-column records; MVP comparison paths always produce non-empty Name.

type IDDiff

type IDDiff[K comparable] struct {
	ID         K
	Baseline   *RecordSnapshot
	Candidates []CandidateState
}

IDDiff is the per-ID emission of Diff/DiffFunc. It carries the baseline snapshot (if baseline had this ID) and each candidate's state for this ID, in parallel-index order with the input candidates slice — Candidates[i] always describes input candidates[i].

type Option

type Option func(*options)

Option configures Diff/DiffFunc behavior. The package exports four orthogonal options:

  • WithIgnoreFields(names...) — exclude named fields from comparison.
  • WithIncludeMatched() — emit IDDiff for every ID, including fully matched.
  • WithOnlyChangedFields() — trim Baseline.Fields to only fields with deltas.
  • WithAbsentEqualsNil() — treat field-absent as equivalent to field-with-nil-value during comparison.

func WithAbsentEqualsNil

func WithAbsentEqualsNil() Option

WithAbsentEqualsNil instructs Diff to treat "field absent from a record" as equivalent to "field present with nil value" during comparison. Default is to distinguish the two via FieldValue.Absent. Use this when the dataset is sourced from heterogeneous backends where one stores "no value" as an absent column and another stores it as NULL.

When set: a baseline field with nil value and a candidate that lacks the field (or vice versa) produces no delta. Records whose differences all reduce to absent-vs-nil report Status == Matched.

func WithIgnoreFields

func WithIgnoreFields(names ...string) Option

WithIgnoreFields instructs Diff to omit named fields from comparison. Matching is by Go struct field name (when Record.Data() returns a struct) or by map key (when Record.Data() returns a map[string]any). Case-sensitive. Multiple calls compose additively. Unknown names are silently ignored.

Canonical use case: WithIgnoreFields("UpdatedAt") drops a timestamp field that always changes between snapshots.

func WithIncludeMatched

func WithIncludeMatched() Option

WithIncludeMatched instructs Diff to emit IDDiff for every ID touched by any input — including IDs where every candidate is Matched. Default is to skip those.

func WithOnlyChangedFields

func WithOnlyChangedFields() Option

WithOnlyChangedFields trims IDDiff.Baseline.Fields to only the fields that have a delta on at least one candidate. Default is to populate the full baseline record snapshot for context.

type RecordSeq

type RecordSeq[K comparable] = iter.Seq2[record.WithID[K], error]

RecordSeq is the streaming input shape for Diff and DiffFunc. Implementations MUST yield records sorted ascending by ID and MUST propagate any source error as a (zero, err) pair (after which iteration stops).

func ReaderToSeq

func ReaderToSeq[K comparable](r dal.RecordsReader, idOf func(dal.Record) (K, error)) RecordSeq[K]

ReaderToSeq adapts a dalgo dal.RecordsReader to a RecordSeq. idOf extracts the ID from each dal.Record yielded by the reader. Reader errors propagate via the seq2 error channel.

The underlying reader is Closed exactly once when iteration ends — whether by exhausting records (dal.ErrNoMoreRecords), by the consumer breaking out of the range loop early, or by any upstream stream error.

dal.Reader.Cursor() is NOT surfaced through this bridge in MVP; callers needing pagination must drive the reader directly. See spec/ideas/dal-records-reader-iter-seq.md.

func SliceToSeq

func SliceToSeq[K comparable](records []record.WithID[K]) RecordSeq[K]

SliceToSeq turns an already-sorted slice into a RecordSeq. The slice MUST be sorted ascending by ID; SliceToSeq does NOT sort. A nil or empty slice produces a stream that yields zero items.

type RecordSnapshot

type RecordSnapshot struct {
	Fields []FieldValue
}

RecordSnapshot is baseline's record contents for a given ID — the single source of truth for field values. Candidates carry only deltas; consumers reading "the old value for a changed field" look it up here by Name.

type RecordStatus

type RecordStatus int8

RecordStatus classifies one candidate's relationship to baseline for one ID.

const (
	// Missing — baseline has this ID; this candidate doesn't.
	Missing RecordStatus = iota
	// Extra — this candidate has this ID; baseline doesn't.
	Extra
	// Matched — both have the ID; all fields equal.
	Matched
	// Changed — both have the ID; at least one field differs.
	Changed
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL