automerge

package module
v0.0.0-...-4281e50 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 9, 2020 License: Apache-2.0 Imports: 8 Imported by: 1

README

automerge-go

automerge-go is an experiment in Go of implementing columnar automerge as described by Martin Kleppman (columnar experiment writeup).

*** THIS NON-FUNCTIONAL CODE ***

While the code will be functional at some point, the current intent of the code is to understand the performance characteristics of columnar automerge.

Current Performance

Run from mac pro desktop. https://github.com/savaki/automerge-benchmark

=== RUN   TestPerformance
applying edits ...
 25000:  62388 bytes, 3.1 µs/op
 50000: 126464 bytes, 3.1 µs/op
 75000: 189055 bytes, 4.2 µs/op
100000: 252513 bytes, 3.5 µs/op
125000: 314784 bytes, 2.9 µs/op
150000: 376359 bytes, 7.2 µs/op
175000: 439171 bytes, 3.5 µs/op
200000: 502771 bytes, 6.7 µs/op
225000: 569985 bytes, 7.5 µs/op
250000: 634307 bytes, 4.1 µs/op

edits -> 259778
bytes -> 659432
pages -> 5056
--- PASS: TestPerformance (1.52s)

General Concepts

automerge-go organizes automerge operations into a series of pages. Each page currently consists of 6 fields, with data stored in columnar format:

  • op_counter - together with op_actor form the lamport timestamp of the current operation (int64)
  • op_actor - together with op_id form the lamport timestamp of the current operation ([]byte)
  • ref_counter - counter for reference lamport timestamp (int64)
  • ref_actor - actor for reference lamport timestamp ([]byte)
  • op_type - identifies the operation to be performed (int64)
  • value - the data associated with the operation (varies)

Each field is encoded using one of following encodings which are borrowed heavily from parquet:

  • rle - run length encoding int64
  • delta_rle - delta encoding on top of run length encoding
  • dictionary_rle - encodes []byte to an int64 and stores the int64 using run length encoding. Similar to parquet's DICTIONARY_RLE
  • plain - records are stored in adjancent series based on parquet's PLAIN format

Encodings

rle

rle provides a simple run length encoding for int64 values. Each sequence of int64 is stored as a pair of values, the repetition count along with the value.

For example, the sequence [1] would be stored as [1,1] which you can think of as 1 repetition of 1. The sequence [1,1,1] would be stored as [3,1]; 3 repetitions of 1. And sequence like [1,1,1,2,2,1,1,1,1] would be stored as [3,1,2,2,4,1].

delta_rle

delta_rle sits atop rle and first encodes int64 values as a series of deltas.

For example, with the sequence, [1,2,3], we could first delta encode this to [1,1,1] which can be though of as start with 1, add 1 to get to the second value, 2, and add 1 to that to get to the third value, 3. [1,1,1] can then be further reduced using rle to [3,1].

As the changes of repetition get longer, delta_rle compression gets better. For example, a sequence from 1 to 100 would encode as [100,1] or just two bytes.

plain

plain encoding stores data in series, one record after the next. plain currently supports the following logical data types:

  • []byte
  • string
  • rune (utf8 character)
  • int64
  • key value pair

Underneath the logical data type is a raw data type, currently only varint and byte array. varints are stored using a dynamic number of bytes in the same manner that protobuf uses. byte arrays are stored as a varint that represents the length of the byte array followed by the bytes.

dictionary_rle

dictionary_rle combines plain and rle encodings to store []byte values. dictionary_rle first encodes the []byte using plain encoding, ensuring there are no duplicates. dictionary_rle then takes the order index from plain and uses that as an int64 which is then encoded using rle

Fields

op_counter, op_actor

Although stored as separate columns, these two fields together form the lamport timestamp of the operation.

op_counter is encoded using delta_rle to take advantage of the fact that sequences of text characters will be simple sequences e.g.

op_actor is encoded using dictionary_rle. The expectation is that for any given page, there will likely be far fewer distinct op_actor values than there will be operations in the page.

ref_counter, ref_actor

Also stored as separate columns, the two fields identify the causal reference for the operation.

ref_counter is encoded using delta_rle

ref_actor is encoded using dictionary_rle

To indicate start of document, the ref_counter, ref_actor pair of 0, nil should be used.

op_type

op_type identifies the operation to be performed. The assumption is that each operation can be encoded into an int64 and that readers are responsible for interpreting the results.

op_type is encoded using rle.

value

value contains the data associated with the operation and may store multiple types of data.

value is encoded using plain encoding.

Page

A page is a collection of operations. To keep pages from growing too large (and slowing down the app), pages may be split into smaller pages.

Each page maintains a bloom filter of all the op_ids contained within the page to avoid users having to search all the records to find a given element.

Node

A node (naming?) represents a collection of pages that share common logical and raw types. The current intent is to allow logical portions of a document like automerge Text and Table elements to be stored under a single node. May need to revisit this idea later.

When pages get too large, the node will split the pages into multiple smaller pages for performance reasons.

Document

A document represents the top level entity that the user will interact with. Documents contain a set of nodes.

Documentation

Index

Constants

View Source
const (
	TextInsert = 0
	TextDelete = 1
)

Variables

This section is empty.

Functions

This section is empty.

Types

type ID

type ID struct {
	Counter int64
	Actor   []byte
}

func NewID

func NewID(counter int64, actor []byte) ID

func (ID) Equal

func (i ID) Equal(that ID) bool

type IDToken

type IDToken struct {
	Counter int64
	Actor   []byte
	// contains filtered or unexported fields
}

type Object

type Object struct {
	// contains filtered or unexported fields
}

Object encapsulates a logical object within the document e.g. a Text block, an Object, an Array, etc

func NewObject

func NewObject(rawType encoding.RawType, opts ...ObjectOption) *Object

NewObject returns a new object whose value is of RawType using the options provided

func (*Object) Apply

func (o *Object) Apply(op Op) (int64, error)

func (*Object) NextValue

func (o *Object) NextValue(token ValueToken) (ValueToken, error)

func (*Object) RowCount

func (o *Object) RowCount() (n int64)

func (*Object) Size

func (o *Object) Size() int

type ObjectOption

type ObjectOption func(*objectOptions)

ObjectOption provides functional options to Object

func WithBloomOptions

func WithBloomOptions(m, k uint) ObjectOption

func WithMaxPageSize

func WithMaxPageSize(n int64) ObjectOption

WithMaxPageSize defines maximum number of records contained in a single page

type Op

type Op struct {
	ID    ID
	Ref   ID
	Type  int64
	Value encoding.Value
}

type Page

type Page struct {
	// contains filtered or unexported fields
}

func NewPage

func NewPage(rawType encoding.RawType) *Page

func (*Page) FindIndex

func (p *Page) FindIndex(counter int64, actor []byte) (int64, error)

func (*Page) InsertAt

func (p *Page) InsertAt(index int64, op Op) error

func (*Page) InsertAtTranslated

func (p *Page) InsertAtTranslated(index int64, op Op, isDelete func(int64) bool) error

func (*Page) NextID

func (p *Page) NextID(token IDToken) (IDToken, error)

func (*Page) NextValue

func (p *Page) NextValue(token PageValueToken) (PageValueToken, error)

func (*Page) Size

func (p *Page) Size() int

func (*Page) SplitAt

func (p *Page) SplitAt(index int64) (left, right *Page, err error)

type PageToken

type PageToken struct {
	Op Op
}

type PageValueToken

type PageValueToken struct {
	OpType int64
	Value  encoding.Value
	// contains filtered or unexported fields
}

type Text

type Text struct {
	// contains filtered or unexported fields
}

func NewText

func NewText(opts ...ObjectOption) *Text

func (*Text) Apply

func (t *Text) Apply(op Op) error

func (*Text) InsertAt

func (t *Text) InsertAt(rr ...rune) error

func (*Text) RowCount

func (t *Text) RowCount() int64

func (*Text) Size

func (t *Text) Size() int

type ValueToken

type ValueToken struct {
	PageValueToken
	// contains filtered or unexported fields
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL