format

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2026 License: MIT Imports: 15 Imported by: 0

Documentation

Overview

Package format handles reading and writing the Automerge binary file format.

File structure

An Automerge file is a sequence of chunks. Each chunk starts with a 4-byte magic number, a 4-byte checksum, a type byte, and a LEB128-encoded length, followed by the payload. Use ReadChunk to parse one chunk at a time.

There are two chunk types:

A ChangeChunk records a single edit made by one peer. It is the unit of replication: peers exchange change chunks to converge on the same document state. Each change carries a set of operations (inserts, deletes, assignments), metadata (author, timestamp, message), and the hashes of the changes it directly depends on. Operations are listed in creation order. A change does not know whether its operations were later overwritten by other peers.

A DocumentChunk is a fully merged snapshot of the entire document. Instead of replaying a history of edits, it stores every operation from every change in one place, with successor lists that indicate which operations were later overwritten. An operation with no successors is the current live value. A DocumentChunk also stores per-change metadata (author, sequence number, maxOp, dependencies) so the change history remains queryable, but it does not duplicate the operations — a change's operations are identified by their counter range, derived from consecutive maxOp values.

Typical lifecycle

Every edit a peer makes produces a ChangeChunk. Peers replicate by exchanging these chunks — sending only the changes the other side hasn't seen yet.

Periodically, the accumulated changes are compacted into a DocumentChunk. This merges all operations into a single columnar structure and records which values are still live. The individual ChangeChunks are no longer needed once a DocumentChunk covers them.

A file saved to disk is usually a single DocumentChunk (the latest compaction) optionally followed by any ChangeChunks that arrived after it. A reader applies the ChangeChunks on top of the snapshot to reach the current state.

Hashes

Every change is globally identified by a hash: the SHA-256 of its serialized binary representation (type byte + length + payload). Two peers that have the same change — whether stored compressed or uncompressed — agree on its hash. ReadChunk verifies the checksum and stores the full hash in ChangeChunk.Hash. DocumentChunks have no individual hash identity.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WriteChange

func WriteChange(w io.Writer, cc *ChangeChunk, ops *ChangeOpsWriter) error

WriteChange serialises cc as a complete Automerge change chunk and writes it to w. Payloads larger than deflateMinSize bytes are written as ChunkTypeCompressedChange; smaller ones as plain ChunkTypeChange.

After WriteChange returns, cc.Hash is set and cc is ready for ApplyChange.

func WriteDocument

func WriteDocument(w io.Writer, actors []types.ActorId, heads []types.ChangeHash, headIndexes []uint64, changes *ChangeMetaWriter, ops *DocOpsWriter) error

WriteDocument serialises a document chunk. Pass nil for changes to omit the change metadata section (valid, but omits sync metadata).

Types

type ChangeChunk

type ChangeChunk struct {
	// Hash is set by ReadChunk after the checksum is verified. It is the full
	// 32-byte SHA-256 of the serialized change and serves as the change's
	// globally unique identifier used in dependency references between changes.
	Hash         types.ChangeHash
	Dependencies []types.ChangeHash
	Actor        types.ActorId
	SeqNum       uint64
	StartOp      uint64
	Time         types.Timestamp
	Message      string
	OtherActors  []types.ActorId

	OpMetadata column.Metadata
	OpColumns  OperationColumns

	ExtraBytes []byte
	// contains filtered or unexported fields
}

ChangeChunk is the parsed form of an Automerge change chunk.

A change records a set of operations authored by a single actor at a point in time. Changes are the unit of replication: peers exchange changes to converge on the same document state.

The operations are stored in a columnar layout: instead of one record per operation, each field (object, key, action, …) is stored as its own compressed column. OperationColumns holds re-readable SubReaders for those columns; fresh readers are created each time Operations() is called.

OtherActors lists every actor referenced *by the operations in this change* other than the change's own Actor. Operation actor-index 0 always refers to Actor; indices 1..N refer to OtherActors[i-1]. This local numbering keeps repeated actor IDs compact on the wire.

ExtraBytes is a reserved extension field in the binary format. It is preserved intact so files using future extensions can be round-tripped without loss.

func (ChangeChunk) Operations

func (cc ChangeChunk) Operations() iter.Seq2[types.ChangeOperation, error]

func (ChangeChunk) String

func (cc ChangeChunk) String() string

type ChangeColumns

type ChangeColumns struct {
	ActorId *ioutil.SubReader
	SeqNum  *ioutil.SubReader

	MaxOp *ioutil.SubReader

	Time *ioutil.SubReader

	Message *ioutil.SubReader

	DependenciesGroup *ioutil.SubReader
	DependenciesIndex *ioutil.SubReader

	ExtraMetadata *ioutil.SubReader
	ExtraData     *ioutil.SubReader
}

ChangeColumns holds SubReader references for the change summary table inside a document chunk. Each column stores one field for all changes in a compressed, run-length-encoded form.

Dependencies are stored in a two-column group: DependenciesGroup gives the number of dependencies for each change, and DependenciesIndex gives the actual dependency indices (delta-encoded for compactness). This split is the standard "group column" pattern used throughout the Automerge binary format wherever a field has a variable-length list per row.

ExtraMetadata / ExtraData are a reserved extension point in the format.

type ChangeMetaWriter

type ChangeMetaWriter struct {
	// contains filtered or unexported fields
}

ChangeMetaWriter streams per-change metadata into column writers for a document chunk. Call Append for each change in order, then pass to WriteDocument.

func NewChangeMetaWriter

func NewChangeMetaWriter() *ChangeMetaWriter

NewChangeMetaWriter creates a ChangeMetaWriter ready to accept Append calls.

func (*ChangeMetaWriter) Append

func (w *ChangeMetaWriter) Append(m column.RawChangeMeta)

Append encodes one change's metadata. m.ActorIdx must be an index into the document's actor table (same mapping used for op columns).

type ChangeOpsWriter

type ChangeOpsWriter struct {
	// contains filtered or unexported fields
}

ChangeOpsWriter streams operations into per-column writers for a change chunk. Call Append for each operation, then flush once, then pass to WriteChange.

func NewChangeOpsWriter

func NewChangeOpsWriter() *ChangeOpsWriter

func (*ChangeOpsWriter) Append

func (w *ChangeOpsWriter) Append(obj types.ObjectId, key types.Key, insert bool, action types.Action, preds []types.OpId, mapper types.ActorMapper)

Append encodes one operation into the per-column writers. m maps global actor indices to local indices in the change.

type Chunk

type Chunk interface {
	// contains filtered or unexported methods
}

Chunk is the top-level unit of an Automerge binary file.

An Automerge file is a sequence of one or more chunks. Each chunk is either a document snapshot (DocumentChunk) or a single peer edit (ChangeChunk). A file with just one DocumentChunk is a compact, fully-merged snapshot. A file may also hold a series of ChangeChunks representing the edit history of a document, in which case the reader applies them in dependency order to reconstruct the current state.

The only concrete implementations are *DocumentChunk and *ChangeChunk. The unexported chunk() method prevents other packages from satisfying this interface, making those two the exhaustive set of variants. Callers must type-assert to the concrete type to access chunk-specific fields.

func ReadChunk

func ReadChunk(r *ioutil.SubReader) (Chunk, int, error)

ReadChunk reads one chunk from r. The second return value is the total number of payload bytes belonging to this chunk. It is returned even on error so the caller can decide whether to skip and continue.

The file format for each chunk is:

[4 magic][4 checksum][1 type][varint length][...payload...]

The checksum is the first 4 bytes of SHA-256(type || length || payload). ReadChunk verifies the checksum before returning.

type ChunkType

type ChunkType byte
const (
	ChunkTypeDocument         ChunkType = 0x00
	ChunkTypeChange           ChunkType = 0x01
	ChunkTypeCompressedChange ChunkType = 0x02
)

type DocOpsWriter

type DocOpsWriter struct {
	// contains filtered or unexported fields
}

DocOpsWriter streams operations into per-column writers for a document chunk. Unlike ChangeOpsWriter, it encodes an explicit OpId (actorId + counter) for each operation and writes a successor list instead of a predecessor list. Call Append for each operation in object order, then pass to WriteDocument.

func NewDocOpsWriter

func NewDocOpsWriter() *DocOpsWriter

NewDocOpsWriter creates a DocOpsWriter ready to accept Append calls.

func (*DocOpsWriter) Append

func (w *DocOpsWriter) Append(obj types.ObjectId, key types.Key, id types.OpId, insert bool, action types.Action, succs []types.OpId, mapper types.ActorMapper)

Append encodes one operation into the per-column writers. m maps global actor indices to local indices in the document chunk.

type DocumentChunk

type DocumentChunk struct {
	Actors      []types.ActorId
	Heads       []types.ChangeHash
	HeadIndexes []uint64

	ChangeMetadata column.Metadata
	ChangesColumns ChangeColumns

	OpMetadata column.Metadata
	OpColumns  OperationColumns
}

DocumentChunk is the parsed form of an Automerge document snapshot chunk.

A document chunk is a merged representation of an entire document's history. Rather than storing individual changes, it stores:

  • Every operation from every change, in columnar form (OpColumns). Each operation carries a successor list: the operations that later overwrote it. An operation with no successors is the current live value; one with successors has been overwritten or deleted.
  • Per-change metadata in ChangesColumns: actor, sequence number, maxOp, timestamp, message, and dependency indices. This is enough to reconstruct the change graph and map operations back to their originating change. The operations themselves are not duplicated here; the range of operations belonging to change i is derived from MaxOp[i-1]+1 .. MaxOp[i]. Dependency references are integer indices (not hashes) into this same array, which is stored in topological order: if change A depends on change B, B always appears at a lower index than A.

Heads contains the hashes of the "tip" changes — the changes that no other change in this document depends on. They identify the document's current version. HeadIndexes is a parallel array: HeadIndexes[i] is the index of Heads[i] in the change summary table, for fast lookup without scanning all changes.

Actors is the document-wide actor table. All actor references in both the change and operation columns are stored as indices into this table.

The SubReaders in OpColumns and ChangesColumns point directly into the paged reader's pages — no copy is made. Callers must not call Skip on the paged reader while this DocumentChunk (or any OpSet derived from it) is still in use.

func (DocumentChunk) Changes

Changes iterates over the change summaries embedded in this document snapshot.

Changes iterates over every change's raw metadata as stored in the column data. ActorIdx is an index into d.Actors; Time and Message are nil when absent. Dependency references are indices into the document's change array (not hashes).

func (DocumentChunk) Operations

func (d DocumentChunk) Operations() iter.Seq2[types.DocOperation, error]

Operations iterates over every operation stored in this document snapshot.

Unlike a change chunk — where operations are listed in creation order — a document chunk stores operations grouped by the object they belong to. This object-local ordering is part of the document chunk format definition.

func (DocumentChunk) String

func (d DocumentChunk) String() string

type OperationColumns

type OperationColumns struct {
	ObjectActorId *ioutil.SubReader
	ObjectCounter *ioutil.SubReader

	KeyActorId *ioutil.SubReader
	KeyCounter *ioutil.SubReader
	KeyString  *ioutil.SubReader

	ActorId *ioutil.SubReader
	Counter *ioutil.SubReader

	Insert *ioutil.SubReader

	Action        *ioutil.SubReader
	ValueMetadata *ioutil.SubReader
	Value         *ioutil.SubReader

	PredecessorGroup   *ioutil.SubReader
	PredecessorActorId *ioutil.SubReader
	PredecessorCounter *ioutil.SubReader

	SuccessorGroup   *ioutil.SubReader
	SuccessorActorId *ioutil.SubReader
	SuccessorCounter *ioutil.SubReader

	ExpandControl *ioutil.SubReader

	Mark *ioutil.SubReader
}

OperationColumns holds SubReader references for a set of operations, used by both ChangeChunk and DocumentChunk.

In the Automerge binary format, operations are not stored as records. Instead, each field of every operation is stored in its own column — a single compressed stream for all values of that field across all operations. To reconstruct one operation, you read one value from each relevant column in lockstep.

Each field is nil when its column was absent from the binary data (meaning all values for that field default to null/zero).

Object (ObjectActorId + ObjectCounter) identifies which map, list, or text object the operation targets. The root object is represented as (0, 0).

Key identifies the position within that object:

  • For maps: KeyString holds the property name; the actor/counter pair is unused.
  • For lists and text: KeyActorId + KeyCounter identify the list element after which this operation is inserted (i.e. the OpId of its left neighbour). A null key means "insert at the head of the list."

ActorId + Counter together form the operation's own OpId — its globally unique identity. In a ChangeChunk these columns are absent; the OpId is derived from the change's Actor and StartOp counter instead.

Insert distinguishes an insertion from an assignment at an existing position. For maps, Insert is always false. For lists and text, true means a new element is being created; false means the operation targets an existing element.

Action encodes what the operation does (set a value, delete, make a map/list/text object, increment a counter, etc.), together with ValueMetadata and Value which carry the actual scalar value when the action is a set.

Predecessors (PredecessorGroup + PredecessorActorId + PredecessorCounter) list the operations that this operation supersedes — the previous value(s) at the same position. An operation with no predecessors creates a new value; one with predecessors overwrites or deletes an existing one.

Successors (SuccessorGroup + SuccessorActorId + SuccessorCounter) are the inverse: operations that later superseded this one. Only present in a DocumentChunk — a ChangeChunk does not know its future. An operation with no successors is the current live value at its position.

ExpandControl and Mark support rich-text mark operations (bold, italic, etc.) and are only relevant for text objects.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL