store

package
v0.0.0-...-60fa597 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 12, 2020 License: Apache-2.0 Imports: 11 Imported by: 3

README

store - persisted data structures

The data structures provided by the store package are intended to avoid memory allocations whenever possible, at the potential tradeoff in ease-of-use of the API's.

When memory usage grows too large, these data structures are designed to spill to mmap()'ed files.

These data structures are intended for ephemeral data processing, where data spilled to files, for example, are considered temporary. Long term persistence concerns like versioning, atomic writes, crash recovery, etc, are not part of these implementations.

RHStore

RHStore is a persisted hashmap that uses the robinhood algorithm.

Unlike an rhmap.RHMap, the key/val bytes placed into an RHStore are owned or managed by the RHStore.

Heap

Heap is a min-heap that can spill out to files, which works in conjunction with golang's container/heap package. It can be useful for sorting and "OFFSET/LIMIT" processing.

Chunks

Chunks represents an "append-only" sequence of persisted chunk files, where each chunk file has the same physical size (e.g., 4MB).

Documentation

Overview

Package store provides a map[[]byte][]byte based on the robin-hood hashmap algorithm that's hookable with persistance or storage.

Index

Constants

View Source
const ItemLen = 3 // Number of uint64's needed for item metadata.
View Source
const MaskDistance = uint64(0xFFFC000000000000) // 14 bits << ShiftDistance.
View Source
const MaskKeySize = uint64(0x0000000001FFFFFF) // 25 bits.
View Source
const MaskValSize = uint64(0x0003FFFFFE000000) // 25 bits << ShiftValSize.
View Source
const MaxKeyLen = (1 << 25) - 1

MaxKeyLen is representable by 25 bit number, or ~33MB.

View Source
const MaxValLen = (1 << 25) - 1

MaxKeyLen is representable by 25 bit number, or ~33MB.

View Source
const ShiftDistance = 50 // # of bits to left-shift a 14-bit Distance.
View Source
const ShiftValSize = 25 // # of bits to left-shift a 25-bit ValSize.

Variables

View Source
var DefaultRHStoreFileOptions = RHStoreFileOptions{
	StartSize:      5303,
	ChunkSizeBytes: 4 * 1024 * 1024,
	MaxDistance:    10,
	FileSuffix:     ".rhstore",
}

DefaultRHStoreFileOptions are the default values for options.

View Source
var ErrKeyTooBig = errors.New("key too big")

ErrKeyTooBig means a key was too large.

View Source
var ErrKeyZeroLen = errors.New("key zero len")

ErrKeyZeroLen means a key was nil.

View Source
var ErrValTooBig = errors.New("val too big")

ErrValTooBig means a val was too large.

View Source
var MMapPageGranularity = MMapPageSize

MMapPageGranularity defines the granularity of mmap'ed region. Some operating systems require mmap to occur on particular boundaries that are not equal to a page size.

View Source
var MMapPageSize = int64(4096)

MMapPageSize is the default page size in bytes used by rhstore.

Functions

func ByteSliceToUint64Slice

func ByteSliceToUint64Slice(in []byte) ([]uint64, error)

ByteSliceToUint64Slice gives access to []byte as []uint64. By default, an efficient O(1) implementation of this function is used, but which requires the unsafe package. See the "safe" build tag to use an O(N) implementation that does not need the unsafe package.

func BytesAppend

func BytesAppend(m *RHStore, b []byte) (offset, size uint64, err error)

BytesAppend is the default implementation to append data to the backing bytes of an RHStore.

func BytesRead

func BytesRead(m *RHStore, offset, size uint64) ([]byte, error)

BytesRead is the default implementation to read a bytes from the backing bytes of an RHStore.

func BytesTruncate

func BytesTruncate(m *RHStore, size uint64) error

BytesTruncate is the default implementation to truncate the backing bytes of an RHStore to a given length.

func Grow

func Grow(m *RHStore, newSize int) error

Grow is the default implementation to grow a RHStore.

func Uint64SliceToByteSlice

func Uint64SliceToByteSlice(in []uint64) ([]byte, error)

Uint64SliceToByteSlice gives access to []uint64 as []byte. By default, an efficient O(1) implementation of this function is used, but which requires the unsafe package. See the "safe" build tag to use an O(N) implementation that does not need the unsafe package.

Types

type BytesLessFunc

type BytesLessFunc func(a, b []byte) bool

BytesLessFunc returns true when a is less than b.

type Chunks

type Chunks struct {
	PathPrefix, FileSuffix string

	// ChunkSizeBytes is the size of each chunk file.
	ChunkSizeBytes int

	// Chunks is a sequence of append-only chunk files. An example
	// usage is to hold the underlying key/val bytes for a
	// hashmap. The 0'th chunk is a special, in-memory-only chunk
	// without an actual backing file.
	Chunks []*MMapRef

	// LastChunkLen is the logical length of the last chunk, which is
	// the chunk that is still being appended to when there are new,
	// incoming data items.
	LastChunkLen int

	// Recycled is a stack of chunks that are ready to reuse.
	Recycled []*MMapRef
}

Chunks tracks a sequence of persisted chunk files.

func (*Chunks) AddChunk

func (cs *Chunks) AddChunk() (err error)

AddChunk appends a new chunk file to the chunks, or reuses a previously recycled chunk file.

func (*Chunks) BytesAppend

func (cs *Chunks) BytesAppend(b []byte) (
	offsetOut, sizeOut uint64, err error)

func (*Chunks) BytesRead

func (cs *Chunks) BytesRead(offset, size uint64) (
	[]byte, error)

BytesRead returns a slice of data from the chunks.

func (*Chunks) BytesTruncate

func (cs *Chunks) BytesTruncate(size uint64) error

func (*Chunks) Close

func (cs *Chunks) Close() error

Close releases resources used by the chunk files.

func (*Chunks) PrevChunkLens

func (cs *Chunks) PrevChunkLens() int

PrevChunkLens returns the sum of the chunk lengths for all but the last chunk.

type Heap

type Heap struct {
	// LessFunc is used to compare two data items.
	LessFunc BytesLessFunc

	// CurItems holds the # of current, live items on the heap.
	CurItems int64

	// MaxItems holds the maximum # of items the heap has ever held.
	MaxItems int64

	// Heap is a min-heap of offset (uint64) and size (uint64) pairs,
	// which refer into the Data. The Chunks of the Heap must be
	// configured with a ChunksSizeBytes that's a multiple of 16.
	Heap *Chunks

	// Data represents the application data items held in chunks,
	// where each item is prefixed by its length as a uint64.
	Data *Chunks

	// Free represents unused but reusable slices in the Data. The
	// free list is appended to as items are popped from the heap.
	Free []OffsetSize

	// Temp is used during mutations.
	Temp []byte

	// Extra is application specific data associated with this heap.
	Extra interface{}

	// Err tracks the first error encountered.
	Err error
}

Heap provides a min-heap using a given BytesLessFunc. When the min-heap grows too large, it will automatically spill data to temporary, mmap()'ed files based on the features from rhmap/store/Chunks. The implementation is meant to be used with golang's container/heap package. The implementation is not concurrent safe. The implementation is designed to avoid allocations and reuse existing []byte buffers when possible.

The heap can also be used directly with the PushBytes() API without using golang's container/heap package, in which case this data structure behaves as a appendable sequence of []byte entries.

func (*Heap) Close

func (h *Heap) Close() error

func (*Heap) Error

func (h *Heap) Error(err error) error

Error records the first error encountered.

func (*Heap) Get

func (h *Heap) Get(i int64) ([]byte, error)

Get returns the i'th item from the min-heap.

func (*Heap) GetOffsetSize

func (h *Heap) GetOffsetSize(i int64) ([]byte, uint64, uint64, error)

Get returns the i'th item from the min-heap, along with its holding area offset and holding area size in the data chunks.

func (*Heap) Len

func (h *Heap) Len() int

func (*Heap) Less

func (h *Heap) Less(i, j int) bool

func (*Heap) Pop

func (h *Heap) Pop() interface{}

func (*Heap) Push

func (h *Heap) Push(x interface{})

Push will error if the incoming "data length + 8" is greater than the configured ChunkSizeBytes of the heap's data chunks.

func (*Heap) PushBytes

func (h *Heap) PushBytes(xbytes []byte) error

PushBytes is more direct than Push, avoiding interface{} casting.

func (*Heap) Reset

func (h *Heap) Reset() error

func (*Heap) SetOffsetSize

func (h *Heap) SetOffsetSize(i int64, offset, size uint64) error

func (*Heap) Sort

func (h *Heap) Sort(offset int64) error

Sort pops items off the heap and places them at the end of the heap slots in reverse order, leaving sorted items at the end of the heap slots. This approach does not allocate additional space. If there are n items in the heap, then n-offset items will be left sorted at the end of the heap slots. An offset of 0 sorts the entire heap.

func (*Heap) Swap

func (h *Heap) Swap(i, j int)

type Item

type Item []uint64

Item represents an entry in the RHStore, where each item uses 3 contiguous slots (uint64's) for encoding...

MSB....................................................LSB

uint64 0: [64-bits keyOffset ] uint64 1: [64-bits valOffset ] uint64 2: [14-bits distance] | [25 bits valSize] | [25 bits keySize]

The len(Item) == 3 (i.e., 3 uint64's). The key/val offsets are into the RHStore's backing bytes.

func (Item) Distance

func (item Item) Distance() uint64

func (Item) DistanceAdd

func (item Item) DistanceAdd(x int)

func (Item) Encode

func (item Item) Encode(
	keyOffset, keySize, valOffset, valSize, distance uint64)

func (Item) KeyOffsetSize

func (item Item) KeyOffsetSize() (uint64, uint64)

func (Item) ValOffsetSize

func (item Item) ValOffsetSize() (uint64, uint64)

type Key

type Key []byte

Key is the type for a key. A key with len() of 0 is invalid. Key max length is 2^25-1 (~33MB).

type MMapRef

type MMapRef struct {
	Path string
	File *os.File
	MMap mmap.MMap
	Buf  []byte
	Refs int
}

MMapRef provides a ref-counting wrapper around a mmap handle. The implementation is not concurrent safe.

func CreateFileAsMMapRef

func CreateFileAsMMapRef(path string, size int) (*MMapRef, error)

CreateFileAsMMapRef creates a new, empty file of the given size in bytes and mmap()'s it. If the path is "", then an in-memory-only MMapRef is returned, which is an MMapRef that really isn't mmap()'ing an actual file.

func MMapFileRegion

func MMapFileRegion(path string, file *os.File, offset, size int64,
	readWrite bool) (*MMapRef, error)

MMapFileRegion mmap()'s a region of bytes in an os.File.

func (*MMapRef) AddRef

func (r *MMapRef) AddRef() *MMapRef

func (*MMapRef) Close

func (r *MMapRef) Close() error

Close is an alias for DecRef(), allowing MMapRef to implement the io.Closer interface.

func (*MMapRef) DecRef

func (r *MMapRef) DecRef() error

func (*MMapRef) Remove

func (r *MMapRef) Remove() error

Remove should be called only on a closed MMapRef.

type OffsetSize

type OffsetSize struct {
	Offset, Size uint64
}

OffsetSize associates an offset with a size.

type RHStore

type RHStore struct {
	// Slots are the backing data for item metadata of the hashmap.
	// 3 slots are used to represent a single item's metadata.
	Slots []uint64

	// Size is the max number of items this hashmap can hold.
	// Size * 3 == len(Slots).
	Size int

	// Bytes is the backing slice for key/val data that's used by the
	// default BytesTruncate/Append/Read() implementations.
	Bytes []byte

	// Number of items in the RHStore.
	Count int

	// Overridable hash func. Defaults to hash/fnv.New32a().
	HashFunc func(Key) uint32

	// When any item's distance gets too large, grow the RHStore.
	// Defaults to 10.
	MaxDistance int

	// Overridable func to calculate a size multiplier when resizing
	// for growth is needed. Default returns a constant 2.0.
	Growth func(*RHStore) float64

	// Overridable func to grow the RHStore.
	Grow func(m *RHStore, newSize int) error

	// Overridable func to truncate the backing bytes.
	BytesTruncate func(m *RHStore, n uint64) error

	// Overridable func to append data to the backing bytes.
	BytesAppend func(m *RHStore, b []byte) (offset, size uint64, err error)

	// Overridable func to read data from the backing bytes.
	BytesRead func(m *RHStore, offset, size uint64) ([]byte, error)

	// Extra is for optional data that the application wants to
	// associate with the RHStore instance.
	Extra interface{}

	Close func() error

	// Temp is used during mutations to avoid memory allocations.
	Temp Item
}

RHStore is a hashmap that uses the robinhood algorithm. This implementation is not concurrent safe.

Unlike an RHMap, the key/val bytes placed into an RHStore are copied and owned by the RHStore. The RHStore's internal data structures are also more "flat" than an RHMap's, allowing for easier persistence with an RHStore.

RHStore has more hook-points or callback options than an RHMap, which are intended for advanced users who might use the hook-points to build a persistent data structure.

func NewRHStore

func NewRHStore(size int) *RHStore

NewRHStore returns a ready-to-use RHStore.

func (*RHStore) CopyTo

func (m *RHStore) CopyTo(dest *RHStore)

CopyTo copies key/val's to the dest RHStore.

func (*RHStore) Del

func (m *RHStore) Del(k Key) (prev Val, existed bool, err error)

Del removes a key/val from the RHStore. The previous val, if it existed, is returned.

NOTE: RHStore does not remove key/val data from its backing bytes, so deletes of items will not reduce memory usage. Applications may instead use CopyTo() to copy any remaining live key/val data to another, potentially smaller RHStore.

func (*RHStore) Get

func (m *RHStore) Get(k Key) (v Val, found bool)

Get retrieves the val for a given key. The returned val, if found, is a slice into the RHStore's backing bytes and should only be used within its returned len() -- don't append() to the returned val as that might incorrectly overwrite unrelated data.

func (*RHStore) Item

func (m *RHStore) Item(idx int) Item

func (*RHStore) ItemKey

func (m *RHStore) ItemKey(item Item) (Key, error)

func (*RHStore) ItemVal

func (m *RHStore) ItemVal(item Item) (Val, error)

func (*RHStore) Reset

func (m *RHStore) Reset() error

Reset clears RHStore, where already allocated memory will be reused.

func (*RHStore) Set

func (m *RHStore) Set(k Key, v Val) (wasNew bool, err error)

Set inserts or updates a key/val into the RHStore. The returned wasNew will be true if the mutation was on a newly-seen inserted key, and wasNew will be false if the mutation was an update to an existing key.

NOTE: RHStore appends or copies the incoming key/val into its backing bytes. Multiple updates to the same key will continue to grow the backing bytes -- i.e., the backing bytes are not reused or recycled during a Set(). Applications that need to really remove deleted bytes may instead use CopyTo() to copy live key/val data to another RHStore. Applications might also mutate val bytes in-place as another way to save allocations.

func (*RHStore) SetOffsets

func (m *RHStore) SetOffsets(kOffset, kSize, vOffset, vSize uint64) (
	wasNew bool, err error)

func (*RHStore) Visit

func (m *RHStore) Visit(
	callback func(k Key, v Val) (keepGoing bool)) error

Visit invokes the callback on key/val. The callback can return false to stop the visitation early.

func (*RHStore) VisitOffsets

func (m *RHStore) VisitOffsets(
	callback func(kOffset, kSize, vOffset, vSize uint64) (keepGoing bool)) error

VisitOffsets invokes the callback on each item's key/val offsets. The callback can return false to stop the visitation early.

type RHStoreFile

type RHStoreFile struct {
	// PathPrefix is the path prefix of any persisted files.
	PathPrefix string

	// Options configured for this RHStoreFile instance.
	Options RHStoreFileOptions

	// The embedded RHStore is hooked with callbacks that spill data
	// out to disk during growth, to ever larger "slots" file for item
	// metadata, and to an ever-growing sequence of "chunk" files
	// where key/val bytes are stored.
	RHStore

	// Generation is incremented whenever the hashmap metadata slots
	// are grown. See Grow(). The initial, in-memory only hashmap
	// that has the size of Options.StartSize is generation number 0.
	Generation int64

	// Slots holds the hashmap's item metadata slots. The item
	// metadata entries in the slots have offset+size uint64's that
	// reference key/val byte slices in the chunks.
	Slots *MMapRef

	// Chunks is a sequence of append-only chunk files which hold the
	// underlying key/val bytes for the hashmap.
	Chunks
}

RHStoreFile represents a persisted hashmap. Its implementation is not concurrent safe.

Unlike an RHMap or RHStore, the key's and val's in an RHStoreFile may not be larger than the Options.ChunkSizeBytes.

The design point is to support applications that need to process or analyze ephemeral data which becomes large enough to not fit comfortably into memory, where an temporary, spillable hashmap is appropriate (e.g., performing a GROUP BY aggregation on CSV data files). A different use-case of long-term database-like storage (e.g., with checksums, versioning, reloading, and multithreaded access, etc) is not the current design target of RHStoreFile.

func CreateRHStoreFile

func CreateRHStoreFile(pathPrefix string, options RHStoreFileOptions) (
	rv *RHStoreFile, err error)

CreateRHStoreFile starts a brand new RHStoreFile, which is a hashmap based on the robin-hood algorithm, and which will also spill out to mmap()'ed files if the hashmap becomes too big. The returned RHStoreFile is not concurrent safe. Providing a pathPrefix that's already in-use has undefined behavior.

func (*RHStoreFile) Close

func (sf *RHStoreFile) Close() error

Close releases resources used by the RHStoreFile.

func (*RHStoreFile) Grow

func (sf *RHStoreFile) Grow(nextSize int) error

Grow creates a new slots file and copies over existing metadata items from RHStore.Slots, if any.

type RHStoreFileOptions

type RHStoreFileOptions struct {
	// Initial size of the first in-memory-only hashmap in the # of
	// items that its metadata slots can theoretically hold "fully
	// loaded" before growing.
	StartSize int

	// MaxDistance is a config on hashmap growth in that when the
	// distance of any hashmap item becomes > MaxDistance, the hashmap
	// metadata slots will be grown (and spilled to mmap()'ed files).
	MaxDistance int

	// ChunkSizeBytes is the size of each chunk file in bytes.
	// No key or val can be larger than ChunkSizeBytes.
	// ChunkSizeBytes must be > 0.
	ChunkSizeBytes int

	// FileSuffix is the file suffix used for all the files that were
	// created or managed by an RHStoreFile.
	FileSuffix string
}

RHStoreFileOptions represents creation-time configurable options for an RHStoreFile.

type Val

type Val []byte

Val is the type for a val. A nil val is valid. Val max length is 2^25-1 (~33MB).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL