roaringset

package

v1.23.16 Latest Latest Go to latest Published: May 29, 2024 License: BSD-3-Clause Imports: 13 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/weaviate/weaviate

Links

Open Source Insights

Documentation ¶

Overview ¶

The "roaringset" package contains all the LSM business logic that is unique to the "RoaringSet" strategy

This package alone does not contain an entire LSM store. It's intended to be used as part of the github.com/weaviate/weaviate/adapters/repos/db/lsmkv package.

Motivation ¶

What makes the RoaringSet strategy unique is that it's essentially a fully persistent Roaring Bitmap that can be built up and updated incrementally (without write amplification) while being extremely fast to query.

Without this specific strategy, it would not be efficient to use roaring bitmaps in an LSM store. For example:

Lucene uses posting lists in the inverted index on disk and supports converting them to a Roaring Bitmap at query time. This resulting bitmap can then be cached. However, the cost to initially convert a posting list to a roaring bitmap is quite huge. In our own tests, inserting 90M out of 100M possible ids into a github.com/weaviate/sroar.Bitmap takes about 3.5s.
You could store a regular roaring bitmap, such as github.com/weaviate/sroar.Bitmap in a regular LSM store, such as RocksDB. This would fix the retrieval issue and you should be able to retrieve and initialize a bitmap containing 90M objects in a few milliseconds. However, the cost to incrementally update this bitmap would be extreme. You would have to use a read-modify-write pattern which would lead to huge write-amplification on large setups. A 90M roaring bitmap is about 10.5MB, so to add a single entry (which would take up anywhere from 1 bit to 2 bytes), you would have to read 10.5MB and write 10.5MB again. That's not feasible except for bulk-loading. In Weaviate we cannot always assume bulk loading, as user behavior and insert orders are generally unpredictable.

We solve this issue by making the LSM store roaring-bitmap-native. This way, we can keep the benefits of an LSM store (very fast writes) with the benefits of a serialized roaring bitmap (very fast reads/initializations).

Essentially this means the RoaringSet strategy behaves like a fully persistent (and durable) Roaring Bitmap. See the next section to learn how it works under the hood.

Internals ¶

The public-facing methods make use of github.com/weaviate/sroar.Bitmap. This serialized bitmap already fulfills many of the criteria needed in Weaviate. It can be initialized at almost no cost (sharing memory) or very little cost (copying some memory). Furthermore, its set, remove, and intersection methods work well for the inverted index use cases in Weaviate.

So, the novel part in the lsmkv.RoaringSet strategy does not sit in the roaring bitmap itself, but rather in the way it's persisted. It uses the standard principles of an LSM store where each new write is first cached in a memtable (and of course written into a Write-Ahead-Log to make it durable). The memtable is flushed into a disk segment when specific criteria are met (memtable size, WAL size, idle time, time since last flush, etc.).

This means that each layer (represented by BitmapLayer) only contains the deltas that were written in a specific time interval. When reading, all layers must be combined into a single bitmap (see BitmapLayers.Flatten).

Over time segments can be combined into fewer, larger segments using an LSM Compaction process. The logic for that can be found in BitmapLayers.Merge.

To make sure access is efficient the entire RoaringSet strategy is built to avoid encoding/decoding steps. Instead we internally store data as simple byte slices. For example, see SegmentNode. You can access bitmaps without any meaningful allocations using SegmentNode.Additions and SegmentNode.Deletions. If you plan to hold on to the bitmap for a time window that is longer than holding a lock that prevents a compaction, you need to copy data (e.g. using SegmentNode.AdditionsWithCopy). Even with such a copy, reading a 90M-ids bitmap takes only single-digit milliseconds.

Index ¶

func Condense(bm *sroar.Bitmap) *sroar.Bitmap
func NewBitmap(values ...uint64) *sroar.Bitmap
type BinarySearchNode
- func BinarySearchNodeFromRB(rbNode rbtree.Node) (bsNode *BinarySearchNode)
- func (n *BinarySearchNode) IsNil() bool
- func (n *BinarySearchNode) IsRed() bool
- func (n *BinarySearchNode) Left() rbtree.Node
- func (n *BinarySearchNode) Parent() rbtree.Node
- func (n *BinarySearchNode) Right() rbtree.Node
- func (n *BinarySearchNode) SetLeft(left rbtree.Node)
- func (n *BinarySearchNode) SetParent(parent rbtree.Node)
- func (n *BinarySearchNode) SetRed(isRed bool)
- func (n *BinarySearchNode) SetRight(right rbtree.Node)
type BinarySearchTree
- func (t *BinarySearchTree) FlattenInOrder() []*BinarySearchNode
- func (t *BinarySearchTree) Get(key []byte) (BitmapLayer, error)
- func (t *BinarySearchTree) Insert(key []byte, values Insert)
type BinarySearchTreeCursor
- func NewBinarySearchTreeCursor(bst *BinarySearchTree) *BinarySearchTreeCursor
- func (c *BinarySearchTreeCursor) First() ([]byte, BitmapLayer, error)
- func (c *BinarySearchTreeCursor) Next() ([]byte, BitmapLayer, error)
- func (c *BinarySearchTreeCursor) Seek(key []byte) ([]byte, BitmapLayer, error)
type BitmapLayer
- func (l *BitmapLayer) Clone() BitmapLayer
type BitmapLayers
- func (bml BitmapLayers) Flatten() *sroar.Bitmap
- func (bml BitmapLayers) Merge() (BitmapLayer, error)
type CombinedCursor
- func NewCombinedCursor(innerCursors []InnerCursor, keyOnly bool) *CombinedCursor
- func (c *CombinedCursor) First() ([]byte, *sroar.Bitmap)
- func (c *CombinedCursor) Next() ([]byte, *sroar.Bitmap)
- func (c *CombinedCursor) Seek(key []byte) ([]byte, *sroar.Bitmap)
type Compactor
- func NewCompactor(w io.WriteSeeker, left, right *SegmentCursor, level uint16, ...) *Compactor
- func (c *Compactor) Do() error
type InnerCursor
type Insert
type Seeker
type SegmentCursor
- func NewSegmentCursor(data []byte, index Seeker) *SegmentCursor
- func (c *SegmentCursor) First() ([]byte, BitmapLayer, error)
- func (c *SegmentCursor) Next() ([]byte, BitmapLayer, error)
- func (c *SegmentCursor) Seek(key []byte) ([]byte, BitmapLayer, error)
type SegmentNode
- func NewSegmentNode(key []byte, additions, deletions *sroar.Bitmap) (*SegmentNode, error)
- func NewSegmentNodeFromBuffer(buf []byte) *SegmentNode
- func (sn *SegmentNode) Additions() *sroar.Bitmap
- func (sn *SegmentNode) AdditionsWithCopy() *sroar.Bitmap
- func (sn *SegmentNode) Deletions() *sroar.Bitmap
- func (sn *SegmentNode) DeletionsWithCopy() *sroar.Bitmap
- func (sn *SegmentNode) KeyIndexAndWriteTo(w io.Writer, offset int) (segmentindex.Key, error)
- func (sn *SegmentNode) Len() uint64
- func (sn *SegmentNode) PrimaryKey() []byte
- func (sn *SegmentNode) ToBuffer() []byte

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Condense ¶

func Condense(bm *sroar.Bitmap) *sroar.Bitmap

Operations on bitmaps may result in oversized instances in relation to number of elements currently contained in bitmap Examples of such operations: - And-ing bitmaps may results in size being sum of both sizes (especially and-ing bitmap with itself) - Removing elements from bitmap results in size not being reduced (even if there is only few or no elements left)

Method should be used before saving bitmap to file, to ensure minimal required size

For most cases Or between empty bitmap and used bitmap works pretty well for reducing its final size, except for usecase, where used bitmap uses internally bitmap - it will not be converted to underlying array, even if there are single elements left

func NewBitmap ¶

func NewBitmap(values ...uint64) *sroar.Bitmap

Types ¶

type BinarySearchNode ¶

type BinarySearchNode struct {
	Key   []byte
	Value BitmapLayer
	// contains filtered or unexported fields
}

func BinarySearchNodeFromRB ¶

func BinarySearchNodeFromRB(rbNode rbtree.Node) (bsNode *BinarySearchNode)

func (*BinarySearchNode) IsNil ¶

func (n *BinarySearchNode) IsNil() bool

func (*BinarySearchNode) IsRed ¶

func (n *BinarySearchNode) IsRed() bool

func (*BinarySearchNode) Left ¶

func (n *BinarySearchNode) Left() rbtree.Node

func (*BinarySearchNode) Parent ¶

func (n *BinarySearchNode) Parent() rbtree.Node

func (*BinarySearchNode) Right ¶

func (n *BinarySearchNode) Right() rbtree.Node

func (*BinarySearchNode) SetLeft ¶

func (n *BinarySearchNode) SetLeft(left rbtree.Node)

func (*BinarySearchNode) SetParent ¶

func (n *BinarySearchNode) SetParent(parent rbtree.Node)

func (*BinarySearchNode) SetRed ¶

func (n *BinarySearchNode) SetRed(isRed bool)

func (*BinarySearchNode) SetRight ¶

func (n *BinarySearchNode) SetRight(right rbtree.Node)

type BinarySearchTree ¶

type BinarySearchTree struct {
	// contains filtered or unexported fields
}

func (*BinarySearchTree) FlattenInOrder ¶

func (t *BinarySearchTree) FlattenInOrder() []*BinarySearchNode

FlattenInOrder creates list of ordered copies of bst nodes Only Key and Value fields are populated

func (*BinarySearchTree) Get ¶

func (t *BinarySearchTree) Get(key []byte) (BitmapLayer, error)

Get creates copies of underlying bitmaps to prevent future (concurrent) read and writes after layer being returned

func (*BinarySearchTree) Insert ¶

func (t *BinarySearchTree) Insert(key []byte, values Insert)

type BinarySearchTreeCursor ¶

type BinarySearchTreeCursor struct {
	// contains filtered or unexported fields
}

func NewBinarySearchTreeCursor ¶

func NewBinarySearchTreeCursor(bst *BinarySearchTree) *BinarySearchTreeCursor

func (*BinarySearchTreeCursor) First ¶

func (c *BinarySearchTreeCursor) First() ([]byte, BitmapLayer, error)

func (*BinarySearchTreeCursor) Next ¶

func (c *BinarySearchTreeCursor) Next() ([]byte, BitmapLayer, error)

func (*BinarySearchTreeCursor) Seek ¶

func (c *BinarySearchTreeCursor) Seek(key []byte) ([]byte, BitmapLayer, error)

type BitmapLayer ¶

type BitmapLayer struct {
	Additions *sroar.Bitmap
	Deletions *sroar.Bitmap
}

A BitmapLayer contains all the bitmap related delta-information stored for a specific key in one layer. A layer typically corresponds to one disk segment or a memtable layer

A layer is essentially a snapshot in time and to get an accurate few of the set in its entirety multiple layers need to be combined using BitmapLayers.

The contents of Additions and Deletions must be mutually exclusive. A layer cannot both add and delete an element. The only way to create new layers is through inserting into a Memtable. The memtable must make sure that:

When an element is added, any previous deletion of this element is removed
When an element is deleted, any previous addition of this element is removed.

As a result, an element is either a net addition or a net deletion in a layer, but it can never be both.

func (*BitmapLayer) Clone ¶

func (l *BitmapLayer) Clone() BitmapLayer

type BitmapLayers ¶

type BitmapLayers []BitmapLayer

BitmapLayers are a helper type to perform operations on multiple layers, such as BitmapLayers.Flatten or BitmapLayers.Merge.

func (BitmapLayers) Flatten ¶

func (bml BitmapLayers) Flatten() *sroar.Bitmap

Flatten reduces all snapshots into a single Bitmap. This bitmap no longer contains separate additions and deletions, but a single set where all additions and deletions have been applied in the correct order.

If you do not wish to flatten all of history, but rather combine two layers, such as would happen in a Compaction, use BitmapLayers.Merge instead.

Flatten is typically used when serving a specific key to the user: It flattens all disk segments, a currently flushing memtable if it exists, and the active memtable into a single bitmap. The final bitmap is returned to the user.

Flattening Logic ¶

The first layer is seen as chronologically first. Deletions in the first layers are ignored, as there is nothing to be deleted. As a result, the additions of the first segment become the root state in the first iteration.
Any subsequent layer is merged into the root layer in the following way: Deletions remove any existing additions, Additions are added.
This process happens one layer at a time. This way delete-and-readd cycles are reflected correctly. For example, if layer 2 deletes an element X and layer 3 adds element X, then it is a net addition overall, and X should be represented in the final bitmap. If the order is reversed and layer 2 adds X, whereas layer 3 removes X, it is should not be contained in the final map.

func (BitmapLayers) Merge ¶

func (bml BitmapLayers) Merge() (BitmapLayer, error)

Merge turns two successive layers into one. It does not flatten the segment, but keeps additions and deletions separate. This is because there are no guarantees that the first segment was the root segment. A merge could run on segments 3+4 and they could contain deletions of elements that were added in segments 1 or 2.

Merge is intended to be used as part of compactions.

type CombinedCursor ¶

type CombinedCursor struct {
	// contains filtered or unexported fields
}

func NewCombinedCursor ¶

func NewCombinedCursor(innerCursors []InnerCursor, keyOnly bool) *CombinedCursor

When keyOnly flag is set, only keys are returned by First/Next/Seek access methods, 2nd value returned is expected to be nil When keyOnly is not set, 2nd value is always bitmap. Returned bitmap can be empty (e.g. for Next call after last element was already returned)

func (*CombinedCursor) First ¶

func (c *CombinedCursor) First() ([]byte, *sroar.Bitmap)

func (*CombinedCursor) Next ¶

func (c *CombinedCursor) Next() ([]byte, *sroar.Bitmap)

func (*CombinedCursor) Seek ¶

func (c *CombinedCursor) Seek(key []byte) ([]byte, *sroar.Bitmap)

type Compactor ¶

type Compactor struct {
	// contains filtered or unexported fields
}

Compactor takes in a left and a right segment and merges them into a single segment. The input segments are represented by cursors without their respective segmentindexes. A new segmentindex is built from the merged nodes without taking the old indexes into account at all.

The left segment must precede the right one in its creation time, as the compactor applies latest-takes-presence rules when there is a conflict.

Merging independent key/value pairs ¶

The new segment's nodes will be in sorted fashion (this is a requirement for the segment index and segment cursors to function). To achieve a sorted end result, the Compactor goes over both input cursors simultaneously and always works on the smaller of the two keys. After a key/value pair has been added to the output only the input cursor that provided the pair is advanced.

Merging key/value pairs with identical keys ¶

When both segment have a key/value pair with an overlapping key, the value has to be merged. The merge logic is not part of the compactor itself. Instead it makes use of BitmapLayers.Merge.

Exit Criterium ¶

When both cursors no longer return values, all key/value pairs are considered compacted. The compactor then deals with metadata.

Index and Header metadata ¶

Only once the key/value pairs have been compacted, will the compactor write the primary index based on the new key/value payload. Finally, the input writer is rewinded to be able to write the header metadata at the beginning of the file. Because of this, the input writer must be an io.WriteSeeker, such as *os.File.

The level of the resulting segment is the input level increased by one. Levels help the "eligible for compaction" cycle to find suitable compaction pairs.

func NewCompactor ¶

func NewCompactor(w io.WriteSeeker,
	left, right *SegmentCursor, level uint16,
	scratchSpacePath string, cleanupDeletions bool,
) *Compactor

NewCompactor from left (older) and right (newer) seeker. See Compactor for an explanation of what goes on under the hood, and why the input requirements are the way they are.

func (*Compactor) Do ¶

func (c *Compactor) Do() error

Do starts a compaction. See Compactor for an explanation of this process.

type InnerCursor ¶

type InnerCursor interface {
	First() ([]byte, BitmapLayer, error)
	Next() ([]byte, BitmapLayer, error)
	Seek(key []byte) ([]byte, BitmapLayer, error)
}

type Insert ¶

type Insert struct {
	Additions []uint64
	Deletions []uint64
}

type Seeker ¶

type Seeker interface {
	Seek(key []byte) (segmentindex.Node, error)
}

type SegmentCursor ¶

type SegmentCursor struct {
	// contains filtered or unexported fields
}

A SegmentCursor iterates over all key-value pairs in a single disk segment. You can either start at the beginning using *SegmentCursor.First or start at an arbitrary key that you may find using *SegmentCursor.Seek

func NewSegmentCursor ¶

func NewSegmentCursor(data []byte, index Seeker) *SegmentCursor

NewSegmentCursor creates a cursor for a single disk segment. Make sure that the data buf is already sliced correctly to start at the payload, as calling *SegmentCursor.First will start reading at offset 0 relative to the passed in buffer. Similarly, the buffer may only contain payloads, as the buffer end is used to determine if more keys can be found.

Therefore if the payload is part of a longer continuous buffer, the cursor should be initialized with data[payloadStartPos:payloadEndPos]

func (*SegmentCursor) First ¶

func (c *SegmentCursor) First() ([]byte, BitmapLayer, error)

func (*SegmentCursor) Next ¶

func (c *SegmentCursor) Next() ([]byte, BitmapLayer, error)

func (*SegmentCursor) Seek ¶

func (c *SegmentCursor) Seek(key []byte) ([]byte, BitmapLayer, error)

type SegmentNode ¶

type SegmentNode struct {
	// contains filtered or unexported fields
}

SegmentNode stores one Key-Value pair (without its index) in the LSM Segment. It uses a single []byte internally. As a result there is no decode step required at runtime. Instead you can use

*SegmentNode.Additions
*SegmentNode.AdditionsWithCopy
*SegmentNode.Deletions
*SegmentNode.DeletionsWithCopy
*SegmentNode.PrimaryKey

to access the contents. Those helpers in turn do not require a decoding step. The accessor methods that return Roaring Bitmaps only point to existing memory (methods without WithCopy suffix), or in the worst case copy one byte slice (methods with WithCopy suffix).

This makes the SegmentNode very fast to access at query time, even when it contains a large amount of data.

The internal structure of the data is:

byte begin-start    | description
--------------------|-----------------------------------------------------
0-8                 | uint64 indicating the total length of the node,
                    | this is used in cursors to identify the next node.
8-16                | uint64 length indicator for additions sraor bm -> x
16-(x+16)           | additions bitmap
(x+16)-(x+24)       | uint64 length indicator for deletions sroar bm -> y
(x+24)-(x+y+24)     | deletions bitmap
(x+y+24)-(x+y+28)   | uint32 length indicator for primary key length -> z
(x+y+28)-(x+y+z+28) | primary key

func NewSegmentNode ¶

func NewSegmentNode(
	key []byte, additions, deletions *sroar.Bitmap,
) (*SegmentNode, error)

func NewSegmentNodeFromBuffer ¶

func NewSegmentNodeFromBuffer(buf []byte) *SegmentNode

NewSegmentNodeFromBuffer creates a new segment node by using the underlying buffer without copying data. Only use this when you can be sure that it's safe to share the data or create your own copy.

func (*SegmentNode) Additions ¶

func (sn *SegmentNode) Additions() *sroar.Bitmap

Additions returns the additions roaring bitmap with shared state. Only use this method if you can guarantee that you will only use it while holding a maintenance lock or can otherwise be sure that no compaction can occur. If you can't guarantee that, instead use *SegmentNode.AdditionsWithCopy.

func (*SegmentNode) AdditionsWithCopy ¶

func (sn *SegmentNode) AdditionsWithCopy() *sroar.Bitmap

AdditionsWithCopy returns the additions roaring bitmap without sharing state. It creates a copy of the underlying buffer. This is safe to use indefinitely, but much slower than *SegmentNode.Additions as it requires copying all the memory. If you know that you will only need the contents of the node for a duration of time where a lock is held that prevents compactions, it is more efficient to use *SegmentNode.Additions.

func (*SegmentNode) Deletions ¶

func (sn *SegmentNode) Deletions() *sroar.Bitmap

Deletions returns the deletions roaring bitmap with shared state. Only use this method if you can guarantee that you will only use it while holding a maintenance lock or can otherwise be sure that no compaction can occur. If you can't guarantee that, instead use *SegmentNode.DeletionsWithCopy.

func (*SegmentNode) DeletionsWithCopy ¶

func (sn *SegmentNode) DeletionsWithCopy() *sroar.Bitmap

DeletionsWithCopy returns the deletions roaring bitmap without sharing state. It creates a copy of the underlying buffer. This is safe to use indefinitely, but much slower than *SegmentNode.Deletions as it requires copying all the memory. If you know that you will only need the contents of the node for a duration of time where a lock is held that prevents compactions, it is more efficient to use *SegmentNode.Deletions.

func (*SegmentNode) KeyIndexAndWriteTo ¶

func (sn *SegmentNode) KeyIndexAndWriteTo(w io.Writer, offset int) (segmentindex.Key, error)

KeyIndexAndWriteTo is a helper to flush a memtables full of SegmentNodes. It writes itself into the given writer and returns a segmentindex.Key with start and end indicators (respecting SegmentNode.Offset). Those keys can then be used to build an index for the nodes. The combination of index and node make up an LSM segment.

RoaringSets do not support secondary keys, thus the segmentindex.Key will only ever contain a primary key.

func (*SegmentNode) Len ¶

func (sn *SegmentNode) Len() uint64

Len indicates the total length of the SegmentNode. When reading multiple segments back-2-back, such as in a cursor situation, the offset of element (n+1) is the offset of element n + Len()

func (*SegmentNode) PrimaryKey ¶

func (sn *SegmentNode) PrimaryKey() []byte

func (*SegmentNode) ToBuffer ¶

func (sn *SegmentNode) ToBuffer() []byte

ToBuffer returns the internal buffer without copying data. Only use this, when you can be sure that it's safe to share the data, or create your own copy.

It truncates the buffer at is own length, in case it was initialized with a long buffer that only had a beginning offset, but no end. Such a situation may occur with cursors. If we then returned the whole buffer and don't know what the caller plans on doing with the data, we risk passing around too much memory. Truncating at the length prevents this and has no other negative effects.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL