skiptake

package module
v0.9.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2022 License: MIT Imports: 5 Imported by: 0

README

SkipTakeList

SkipTakeList is a datatype for storing a sequence of strictly increasing integers, most efficient for sequences which have many contiguous sub-sequences with contiguous gaps.

The basic idea is an interleaved list of skip and take instructions on how to build the sequence from preforming 'skip' and 'take' on the sequence of all integers.

Eg: Skip 1, take 4, skip 2, take 3 would create the sequence: (1, 2, 3, 4, 7, 8, 9), and is effectively (Skip [0]), (Take [1-4]), (Skip [5-6]), (Take [8-9])

As most skip and take values are not actual output values, but always positive differences between such values, the integer type used to store the differences can be of a smaller width to conserve memory.

It is a form of low complexity bitmap compression well suited to very long sequences of unknown length. It was written with the storage of table indexes in mind.

The sequence stored must always be strictly increasing and non-negative, such as a sorted sequence of array indicies.

The skip take list is stored as a sequence of variable width integers in a byte sequence.

Documentation

Overview

Package skiptake is an implementation of a low-complexity integer sequence compression.

A Skip-Take List is a datatype for storing a sequence of strictly increasing integers, most efficient for sequences which have many contiguous sub-sequences with contiguous gaps.

The basic idea is an interleaved list of skip and take instructions on how to build the sequence from preforming 'skip' and 'take' on the sequence of all integers.

Eg:

Skip 1, take 4, skip 2, take 3

would create the sequence:

(1, 2, 3, 4, 7, 8, 9),

and is effectively

Skip: 0         5 6
Take:   1 2 3 4     7 8 9

As most skip and take values are not actual output values, but always positive differences between such values, the integer type used to store the differences can be of a smaller bitwidth to conserve memory.

In this package, the operation 'skip' always implies a take of at lest 1 following the skip, but the operation take does not imply a non-zero take before it.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Equal added in v0.0.2

func Equal(a, b List) bool

Equal returns true if two lists are the same. That is, they contain the same subsequence.

Types

type Builder added in v0.0.2

type Builder struct {
	Encoder Encoder
	// contains filtered or unexported fields
}

Builder incrementally constructs a skip-take list from a sequence of increasing integers or raw skip and take values.

The resulting list is built by maintaining a current take count which is flushed whenever a non-zero skip occurs.

func Build

func Build(l *List) Builder

Build returns a skip take builder that stores the list it creates in the passed argument. The passed list will also be returned by Finish().

Note that Build returns a Builder, not a pointer to a Builder. In order for changes to propage, this builder should be passed by reference.

func (*Builder) Finish added in v0.0.2

func (b *Builder) Finish() List

Finish flushes any pending data to the built list and returns it.

func (*Builder) Next added in v0.0.2

func (b *Builder) Next(n uint64) bool

Next feeds the value 'n' of the strictly increasing sequence to encode to the builder. Next automatically detects skip and non-skips in the sequence, encoding appropriately. Returns true if 'n' is greater than all previous values add, meaning that the sequence is strictly increasing. Otherwise the value 'n' is ignored and false is returned.

func (*Builder) Skip added in v0.0.2

func (b *Builder) Skip(skip uint64)

Skip adds a skip value to the list being built. Every call to skip implies a take of one. Repeat calls to Skip() will NOT sum together due to this implied take.

A skip of 0 has the same effect as incrementing the current take count. A non-zero skip always flushes the current take count.

func (*Builder) Take added in v0.0.2

func (b *Builder) Take(take uint64)

Take increases the current take count by the passed amount.

type Decoder added in v0.0.2

type Decoder struct {
	Elements List
	// contains filtered or unexported fields
}

Decoder abstracts reading pairs from the list.

See skiptake.Iterator for a general-purpose iterator.

func (Decoder) EOS added in v0.0.2

func (d Decoder) EOS() bool

EOS returns if the decoder is a that end of the sequence. Unlike Iterator.EOS(), this will return true before a call to Next().

func (*Decoder) Next added in v0.0.2

func (d *Decoder) Next() (skip, take uint64)

Next returns the next pair of skip, take values. Returns (0,0) in the case of end-of-sequence, although this sequence can occur within a list.

func (*Decoder) PeekSkip added in v0.9.0

func (d *Decoder) PeekSkip() uint64

PeekSkip returns the next skip values, without advancing the current decode location.

func (*Decoder) Reset added in v0.0.2

func (d *Decoder) Reset()

Reset resets the location of the decoder to the beginning of the sequence.

type Encoder added in v0.0.2

type Encoder struct {
	Elements *List
	// contains filtered or unexported fields
}

Encoder abstracts appending items to the list.

See skiptake.Builder for a general-purpose list builder.

func (*Encoder) Add added in v0.0.2

func (e *Encoder) Add(skip, take uint64)

Add adds a new skip-take pair to the sequence.

func (Encoder) Flush added in v0.9.0

func (e Encoder) Flush()

Flush instructs the encoder to write out any pending state.

type Iterator added in v0.0.2

type Iterator struct {
	Decoder *Decoder
	// contains filtered or unexported fields
}

Iterator holds the state for iterating and seeking through the original input values.

func (Iterator) EOS added in v0.0.2

func (t Iterator) EOS() bool

EOS returns true if the stream is at end-of-stream. Because Iterator can iterate by either individual sequence values or intervals, EOS will not return true until AFTER a call of either Next() or NextSkipTake() causes EOS to be reached. Use of EOS should therefor occur after calls to Next() or NextSkipTake(). Eg:

for skip, take := iter.Next(); !iter.EOS(); skip, take = iter.Next() {
	...
}

func (*Iterator) Interval added in v0.9.0

func (t *Iterator) Interval() (first uint64, last uint64)

Interval returns the contigious interval from the subsequence that correlates to the current skip-take pair. Returns the first and last values inclusive of this contigious interval.

Returns (math.MaxUint64, 0) in the case of EOS or iterator that has never had Next() or NextSkipTake() called.

func (*Iterator) Next added in v0.0.2

func (t *Iterator) Next() uint64

Next returns the next value in the subsequence. Returns math.MaxUint64 at end-of-sequence, althougth this is a legitimate subsequence value. Use EOS() to differentiate in this case.

If following a call to NextSkipTake() or Seek(), returns the first subsequence value of the new take interval.

Calling Next shrinks the current interval by one, as returned by Interval().

func (*Iterator) NextInterval added in v0.9.1

func (t *Iterator) NextInterval() (first uint64, last uint64)

NextInterval fetches the next interval range in the expanded sequence. The values returned are inclusive, that is both first and last are members of the subsequence. In the case of a single-element interval, first and last will be equal.

Values returned between two calls never abut, that is the value of first is always at least two more than the previous value of last.

Returns (math.MaxUint64, 0) in the case of end of stream.

first is always less than or equal to last, except for EOS.

func (*Iterator) NextSkipTake added in v0.0.2

func (t *Iterator) NextSkipTake() (skip, take uint64)

NextSkipTake advances the iterator to the next skip-take interval, returning the skip and take values. A following call to Next() returns the expanded sequence value at the beginning of the interval. Returns (0, 0) if-and-only-if the iterator is at the end-of-sequence.

NextSkipTake will coalesce any zero skip or zero take values that occur within the underlying list. Callers are safe to assume that while not at the end-of-sequence, all returned intervals will have a discontinuity (non-zero skip), and have a greater than zero size (non-zero take.)

This means that Iterator.NextSkipTake() may not always return the same values as would be returned by Decoder.Next(), which can return zero skip and take values if they are present in the source list.

func (*Iterator) Reset added in v0.0.2

func (t *Iterator) Reset()

Reset resets the iterator to it's inital state at the beginning of the list.

func (*Iterator) Seek added in v0.0.2

func (t *Iterator) Seek(pos uint64) (uint64, uint64)

Seek seeks to the i'th position in the subsequence. Returns the subsequence value at position i as skip, and the count of how many following sequential values as take. These values are identical to what would be the first skip-take pair of a subsequence that was truncated to start at the i'th position.

A following call to Next() start returns the same i'th sequence value, and subsequent calls to Next() continue as normal within the expanded sequence.

A following call to NextSkipTake() returns next skip take pair, not the skip take pair that was seeked to.

type List added in v0.0.2

type List []byte

List is the type that holds a Skip-Take list.

A Skip-Take list is encoded as a series of variable width integers, with additional run length encoding.

func Complement

func Complement(list List) List

Complement returns a new List that is the set algebra complement of the passed List set. The range of the returned list is [0, math.MaxUint64].

func ComplementMax added in v0.0.2

func ComplementMax(list List, max uint64) List

ComplementMax returns a new List that is the set algebra complement of the passed List set, bounded to the range [0, max].

func Create

func Create(values ...uint64) List

Create creates a skip-take list from the passed slice of values. These values should be a strictly increasing sequence. Returns nil if not.

func FromRaw

func FromRaw(v ...uint64) List

FromRaw creates a skip-take list from a slice of []uint64 values representing a sequence of alternating skip and take values.

func Intersection

func Intersection(lists ...List) List

Intersection returns a new List that is the computed set algebra intersection of the passed slice of lists.

func Union

func Union(lists ...List) List

Union returns a new List that is the computed set algebra union of the passed slice of lists.

func (List) Decode added in v0.0.2

func (l List) Decode() Decoder

Decode returns a new skiptake.Decoder for the list. Note that the decoder should be passed by reference to maintain state.

func (*List) Encode added in v0.0.2

func (l *List) Encode() Encoder

Encode returns a new skiptake.Encoder for the list. Note that the encoder must be passed by reference to maintain state.

func (List) Expand added in v0.0.2

func (l List) Expand() []uint64

Expand expands the sequence as a slice of uint64 values of length Len().

Note: Caution is to be exercised. A naive use of Expand() on a result of skiptake.Complement() can create an array of a multiple of 2^64, which is ~147 EB (147,000,000 GB) in size.

func (List) Format added in v0.0.2

func (l List) Format(maxLen int) string

Format returns a human-friendly representation of the list of values as a sequence of ranges, or individual members for ranges of length 1.

If the argument maxLen > 0, it specifies the maximum length of the string to return. List which exceed this limit will be truncated with `...`. Otherwise all ranges will be included.

func (List) GetRaw added in v0.0.2

func (l List) GetRaw() []uint64

GetRaw returns a []uint64 sequence of alternating skip and take values.

func (List) Iterate added in v0.0.2

func (l List) Iterate() Iterator

Iterate returns a new skiptake.Iterator for the passed list.

func (List) Len added in v0.0.2

func (l List) Len() uint64

Len returns how many values are in the expanded sequence.

func (*List) Reset added in v0.9.0

func (l *List) Reset()

Reset the list as a new empty list.

func (List) String added in v0.0.2

func (l List) String() string

Implement the fmt.Stringer interface. Returns

l.Format(120).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL