roaringset

package
v1.23.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 19, 2024 License: BSD-3-Clause Imports: 13 Imported by: 0

Documentation

Overview

The "roaringset" package contains all the LSM business logic that is unique to the "RoaringSet" strategy

This package alone does not contain an entire LSM store. It's intended to be used as part of the github.com/weaviate/weaviate/adapters/repos/db/lsmkv package.

Motivation

What makes the RoaringSet strategy unique is that it's essentially a fully persistent Roaring Bitmap that can be built up and updated incrementally (without write amplification) while being extremely fast to query.

Without this specific strategy, it would not be efficient to use roaring bitmaps in an LSM store. For example:

  • Lucene uses posting lists in the inverted index on disk and supports converting them to a Roaring Bitmap at query time. This resulting bitmap can then be cached. However, the cost to initially convert a posting list to a roaring bitmap is quite huge. In our own tests, inserting 90M out of 100M possible ids into a github.com/weaviate/sroar.Bitmap takes about 3.5s.

  • You could store a regular roaring bitmap, such as github.com/weaviate/sroar.Bitmap in a regular LSM store, such as RocksDB. This would fix the retrieval issue and you should be able to retrieve and initialize a bitmap containing 90M objects in a few milliseconds. However, the cost to incrementally update this bitmap would be extreme. You would have to use a read-modify-write pattern which would lead to huge write-amplification on large setups. A 90M roaring bitmap is about 10.5MB, so to add a single entry (which would take up anywhere from 1 bit to 2 bytes), you would have to read 10.5MB and write 10.5MB again. That's not feasible except for bulk-loading. In Weaviate we cannot always assume bulk loading, as user behavior and insert orders are generally unpredictable.

We solve this issue by making the LSM store roaring-bitmap-native. This way, we can keep the benefits of an LSM store (very fast writes) with the benefits of a serialized roaring bitmap (very fast reads/initializations).

Essentially this means the RoaringSet strategy behaves like a fully persistent (and durable) Roaring Bitmap. See the next section to learn how it works under the hood.

Internals

The public-facing methods make use of github.com/weaviate/sroar.Bitmap. This serialized bitmap already fulfills many of the criteria needed in Weaviate. It can be initialized at almost no cost (sharing memory) or very little cost (copying some memory). Furthermore, its set, remove, and intersection methods work well for the inverted index use cases in Weaviate.

So, the novel part in the lsmkv.RoaringSet strategy does not sit in the roaring bitmap itself, but rather in the way it's persisted. It uses the standard principles of an LSM store where each new write is first cached in a memtable (and of course written into a Write-Ahead-Log to make it durable). The memtable is flushed into a disk segment when specific criteria are met (memtable size, WAL size, idle time, time since last flush, etc.).

This means that each layer (represented by BitmapLayer) only contains the deltas that were written in a specific time interval. When reading, all layers must be combined into a single bitmap (see BitmapLayers.Flatten).

Over time segments can be combined into fewer, larger segments using an LSM Compaction process. The logic for that can be found in BitmapLayers.Merge.

To make sure access is efficient the entire RoaringSet strategy is built to avoid encoding/decoding steps. Instead we internally store data as simple byte slices. For example, see SegmentNode. You can access bitmaps without any meaningful allocations using SegmentNode.Additions and SegmentNode.Deletions. If you plan to hold on to the bitmap for a time window that is longer than holding a lock that prevents a compaction, you need to copy data (e.g. using SegmentNode.AdditionsWithCopy). Even with such a copy, reading a 90M-ids bitmap takes only single-digit milliseconds.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Condense

func Condense(bm *sroar.Bitmap) *sroar.Bitmap

Operations on bitmaps may result in oversized instances in relation to number of elements currently contained in bitmap Examples of such operations: - And-ing bitmaps may results in size being sum of both sizes (especially and-ing bitmap with itself) - Removing elements from bitmap results in size not being reduced (even if there is only few or no elements left)

Method should be used before saving bitmap to file, to ensure minimal required size

For most cases Or between empty bitmap and used bitmap works pretty well for reducing its final size, except for usecase, where used bitmap uses internally bitmap - it will not be converted to underlying array, even if there are single elements left

func NewBitmap

func NewBitmap(values ...uint64) *sroar.Bitmap

Types

type BinarySearchNode

type BinarySearchNode struct {
	Key   []byte
	Value BitmapLayer
	// contains filtered or unexported fields
}

func BinarySearchNodeFromRB

func BinarySearchNodeFromRB(rbNode rbtree.Node) (bsNode *BinarySearchNode)

func (*BinarySearchNode) IsNil

func (n *BinarySearchNode) IsNil() bool

func (*BinarySearchNode) IsRed

func (n *BinarySearchNode) IsRed() bool

func (*BinarySearchNode) Left

func (n *BinarySearchNode) Left() rbtree.Node

func (*BinarySearchNode) Parent

func (n *BinarySearchNode) Parent() rbtree.Node

func (*BinarySearchNode) Right

func (n *BinarySearchNode) Right() rbtree.Node

func (*BinarySearchNode) SetLeft

func (n *BinarySearchNode) SetLeft(left rbtree.Node)

func (*BinarySearchNode) SetParent

func (n *BinarySearchNode) SetParent(parent rbtree.Node)

func (*BinarySearchNode) SetRed

func (n *BinarySearchNode) SetRed(isRed bool)

func (*BinarySearchNode) SetRight

func (n *BinarySearchNode) SetRight(right rbtree.Node)

type BinarySearchTree

type BinarySearchTree struct {
	// contains filtered or unexported fields
}

func (*BinarySearchTree) FlattenInOrder

func (t *BinarySearchTree) FlattenInOrder() []*BinarySearchNode

FlattenInOrder creates list of ordered copies of bst nodes Only Key and Value fields are populated

func (*BinarySearchTree) Get

func (t *BinarySearchTree) Get(key []byte) (BitmapLayer, error)

Get creates copies of underlying bitmaps to prevent future (concurrent) read and writes after layer being returned

func (*BinarySearchTree) Insert

func (t *BinarySearchTree) Insert(key []byte, values Insert)

type BinarySearchTreeCursor

type BinarySearchTreeCursor struct {
	// contains filtered or unexported fields
}

func NewBinarySearchTreeCursor

func NewBinarySearchTreeCursor(bst *BinarySearchTree) *BinarySearchTreeCursor

func (*BinarySearchTreeCursor) First

func (c *BinarySearchTreeCursor) First() ([]byte, BitmapLayer, error)

func (*BinarySearchTreeCursor) Next

func (c *BinarySearchTreeCursor) Next() ([]byte, BitmapLayer, error)

func (*BinarySearchTreeCursor) Seek

func (c *BinarySearchTreeCursor) Seek(key []byte) ([]byte, BitmapLayer, error)

type BitmapLayer

type BitmapLayer struct {
	Additions *sroar.Bitmap
	Deletions *sroar.Bitmap
}

A BitmapLayer contains all the bitmap related delta-information stored for a specific key in one layer. A layer typically corresponds to one disk segment or a memtable layer

A layer is essentially a snapshot in time and to get an accurate few of the set in its entirety multiple layers need to be combined using BitmapLayers.

The contents of Additions and Deletions must be mutually exclusive. A layer cannot both add and delete an element. The only way to create new layers is through inserting into a Memtable. The memtable must make sure that:

  • When an element is added, any previous deletion of this element is removed
  • When an element is deleted, any previous addition of this element is removed.

As a result, an element is either a net addition or a net deletion in a layer, but it can never be both.

func (*BitmapLayer) Clone

func (l *BitmapLayer) Clone() BitmapLayer

type BitmapLayers

type BitmapLayers []BitmapLayer

BitmapLayers are a helper type to perform operations on multiple layers, such as BitmapLayers.Flatten or BitmapLayers.Merge.

func (BitmapLayers) Flatten

func (bml BitmapLayers) Flatten() *sroar.Bitmap

Flatten reduces all snapshots into a single Bitmap. This bitmap no longer contains separate additions and deletions, but a single set where all additions and deletions have been applied in the correct order.

If you do not wish to flatten all of history, but rather combine two layers, such as would happen in a Compaction, use BitmapLayers.Merge instead.

Flatten is typically used when serving a specific key to the user: It flattens all disk segments, a currently flushing memtable if it exists, and the active memtable into a single bitmap. The final bitmap is returned to the user.

Flattening Logic

  • The first layer is seen as chronologically first. Deletions in the first layers are ignored, as there is nothing to be deleted. As a result, the additions of the first segment become the root state in the first iteration.
  • Any subsequent layer is merged into the root layer in the following way: Deletions remove any existing additions, Additions are added.
  • This process happens one layer at a time. This way delete-and-readd cycles are reflected correctly. For example, if layer 2 deletes an element X and layer 3 adds element X, then it is a net addition overall, and X should be represented in the final bitmap. If the order is reversed and layer 2 adds X, whereas layer 3 removes X, it is should not be contained in the final map.

func (BitmapLayers) Merge

func (bml BitmapLayers) Merge() (BitmapLayer, error)

Merge turns two successive layers into one. It does not flatten the segment, but keeps additions and deletions separate. This is because there are no guarantees that the first segment was the root segment. A merge could run on segments 3+4 and they could contain deletions of elements that were added in segments 1 or 2.

Merge is intended to be used as part of compactions.

type CombinedCursor

type CombinedCursor struct {
	// contains filtered or unexported fields
}

func NewCombinedCursor

func NewCombinedCursor(innerCursors []InnerCursor, keyOnly bool) *CombinedCursor

When keyOnly flag is set, only keys are returned by First/Next/Seek access methods, 2nd value returned is expected to be nil When keyOnly is not set, 2nd value is always bitmap. Returned bitmap can be empty (e.g. for Next call after last element was already returned)

func (*CombinedCursor) First

func (c *CombinedCursor) First() ([]byte, *sroar.Bitmap)

func (*CombinedCursor) Next

func (c *CombinedCursor) Next() ([]byte, *sroar.Bitmap)

func (*CombinedCursor) Seek

func (c *CombinedCursor) Seek(key []byte) ([]byte, *sroar.Bitmap)

type Compactor

type Compactor struct {
	// contains filtered or unexported fields
}

Compactor takes in a left and a right segment and merges them into a single segment. The input segments are represented by cursors without their respective segmentindexes. A new segmentindex is built from the merged nodes without taking the old indexes into account at all.

The left segment must precede the right one in its creation time, as the compactor applies latest-takes-presence rules when there is a conflict.

Merging independent key/value pairs

The new segment's nodes will be in sorted fashion (this is a requirement for the segment index and segment cursors to function). To achieve a sorted end result, the Compactor goes over both input cursors simultaneously and always works on the smaller of the two keys. After a key/value pair has been added to the output only the input cursor that provided the pair is advanced.

Merging key/value pairs with identical keys

When both segment have a key/value pair with an overlapping key, the value has to be merged. The merge logic is not part of the compactor itself. Instead it makes use of BitmapLayers.Merge.

Exit Criterium

When both cursors no longer return values, all key/value pairs are considered compacted. The compactor then deals with metadata.

Index and Header metadata

Only once the key/value pairs have been compacted, will the compactor write the primary index based on the new key/value payload. Finally, the input writer is rewinded to be able to write the header metadata at the beginning of the file. Because of this, the input writer must be an io.WriteSeeker, such as *os.File.

The level of the resulting segment is the input level increased by one. Levels help the "eligible for compaction" cycle to find suitable compaction pairs.

func NewCompactor

func NewCompactor(w io.WriteSeeker,
	left, right *SegmentCursor, level uint16,
	scratchSpacePath string, cleanupDeletions bool,
) *Compactor

NewCompactor from left (older) and right (newer) seeker. See Compactor for an explanation of what goes on under the hood, and why the input requirements are the way they are.

func (*Compactor) Do

func (c *Compactor) Do() error

Do starts a compaction. See Compactor for an explanation of this process.

type InnerCursor

type InnerCursor interface {
	First() ([]byte, BitmapLayer, error)
	Next() ([]byte, BitmapLayer, error)
	Seek(key []byte) ([]byte, BitmapLayer, error)
}

type Insert

type Insert struct {
	Additions []uint64
	Deletions []uint64
}

type Seeker

type Seeker interface {
	Seek(key []byte) (segmentindex.Node, error)
}

type SegmentCursor

type SegmentCursor struct {
	// contains filtered or unexported fields
}

A SegmentCursor iterates over all key-value pairs in a single disk segment. You can either start at the beginning using *SegmentCursor.First or start at an arbitrary key that you may find using *SegmentCursor.Seek

func NewSegmentCursor

func NewSegmentCursor(data []byte, index Seeker) *SegmentCursor

NewSegmentCursor creates a cursor for a single disk segment. Make sure that the data buf is already sliced correctly to start at the payload, as calling *SegmentCursor.First will start reading at offset 0 relative to the passed in buffer. Similarly, the buffer may only contain payloads, as the buffer end is used to determine if more keys can be found.

Therefore if the payload is part of a longer continuous buffer, the cursor should be initialized with data[payloadStartPos:payloadEndPos]

func (*SegmentCursor) First

func (c *SegmentCursor) First() ([]byte, BitmapLayer, error)

func (*SegmentCursor) Next

func (c *SegmentCursor) Next() ([]byte, BitmapLayer, error)

func (*SegmentCursor) Seek

func (c *SegmentCursor) Seek(key []byte) ([]byte, BitmapLayer, error)

type SegmentNode

type SegmentNode struct {
	// contains filtered or unexported fields
}

SegmentNode stores one Key-Value pair (without its index) in the LSM Segment. It uses a single []byte internally. As a result there is no decode step required at runtime. Instead you can use

to access the contents. Those helpers in turn do not require a decoding step. The accessor methods that return Roaring Bitmaps only point to existing memory (methods without WithCopy suffix), or in the worst case copy one byte slice (methods with WithCopy suffix).

This makes the SegmentNode very fast to access at query time, even when it contains a large amount of data.

The internal structure of the data is:

byte begin-start    | description
--------------------|-----------------------------------------------------
0-8                 | uint64 indicating the total length of the node,
                    | this is used in cursors to identify the next node.
8-16                | uint64 length indicator for additions sraor bm -> x
16-(x+16)           | additions bitmap
(x+16)-(x+24)       | uint64 length indicator for deletions sroar bm -> y
(x+24)-(x+y+24)     | deletions bitmap
(x+y+24)-(x+y+28)   | uint32 length indicator for primary key length -> z
(x+y+28)-(x+y+z+28) | primary key

func NewSegmentNode

func NewSegmentNode(
	key []byte, additions, deletions *sroar.Bitmap,
) (*SegmentNode, error)

func NewSegmentNodeFromBuffer

func NewSegmentNodeFromBuffer(buf []byte) *SegmentNode

NewSegmentNodeFromBuffer creates a new segment node by using the underlying buffer without copying data. Only use this when you can be sure that it's safe to share the data or create your own copy.

func (*SegmentNode) Additions

func (sn *SegmentNode) Additions() *sroar.Bitmap

Additions returns the additions roaring bitmap with shared state. Only use this method if you can guarantee that you will only use it while holding a maintenance lock or can otherwise be sure that no compaction can occur. If you can't guarantee that, instead use *SegmentNode.AdditionsWithCopy.

func (*SegmentNode) AdditionsWithCopy

func (sn *SegmentNode) AdditionsWithCopy() *sroar.Bitmap

AdditionsWithCopy returns the additions roaring bitmap without sharing state. It creates a copy of the underlying buffer. This is safe to use indefinitely, but much slower than *SegmentNode.Additions as it requires copying all the memory. If you know that you will only need the contents of the node for a duration of time where a lock is held that prevents compactions, it is more efficient to use *SegmentNode.Additions.

func (*SegmentNode) Deletions

func (sn *SegmentNode) Deletions() *sroar.Bitmap

Deletions returns the deletions roaring bitmap with shared state. Only use this method if you can guarantee that you will only use it while holding a maintenance lock or can otherwise be sure that no compaction can occur. If you can't guarantee that, instead use *SegmentNode.DeletionsWithCopy.

func (*SegmentNode) DeletionsWithCopy

func (sn *SegmentNode) DeletionsWithCopy() *sroar.Bitmap

DeletionsWithCopy returns the deletions roaring bitmap without sharing state. It creates a copy of the underlying buffer. This is safe to use indefinitely, but much slower than *SegmentNode.Deletions as it requires copying all the memory. If you know that you will only need the contents of the node for a duration of time where a lock is held that prevents compactions, it is more efficient to use *SegmentNode.Deletions.

func (*SegmentNode) KeyIndexAndWriteTo

func (sn *SegmentNode) KeyIndexAndWriteTo(w io.Writer, offset int) (segmentindex.Key, error)

KeyIndexAndWriteTo is a helper to flush a memtables full of SegmentNodes. It writes itself into the given writer and returns a segmentindex.Key with start and end indicators (respecting SegmentNode.Offset). Those keys can then be used to build an index for the nodes. The combination of index and node make up an LSM segment.

RoaringSets do not support secondary keys, thus the segmentindex.Key will only ever contain a primary key.

func (*SegmentNode) Len

func (sn *SegmentNode) Len() uint64

Len indicates the total length of the SegmentNode. When reading multiple segments back-2-back, such as in a cursor situation, the offset of element (n+1) is the offset of element n + Len()

func (*SegmentNode) PrimaryKey

func (sn *SegmentNode) PrimaryKey() []byte

func (*SegmentNode) ToBuffer

func (sn *SegmentNode) ToBuffer() []byte

ToBuffer returns the internal buffer without copying data. Only use this, when you can be sure that it's safe to share the data, or create your own copy.

It truncates the buffer at is own length, in case it was initialized with a long buffer that only had a beginning offset, but no end. Such a situation may occur with cursors. If we then returned the whole buffer and don't know what the caller plans on doing with the data, we risk passing around too much memory. Truncating at the length prevents this and has no other negative effects.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL