Back to godoc.org

Package mapio

v0.0.9
Latest Go to latest
Published: Jun 24, 2020 | License: Apache-2.0 | Module: github.com/grailbio/base

Overview

Package mapio implements a sorted, on-disk map, similar to the SSTable data structure used in Bigtable [1], Cassandra [2], and others. Maps are read-only, and are produced by a Writer. Each Writer expects keys to be appended in lexicographic order. Buf provides a means of buffering writes to be sorted before appended to a Writer.

Mapio's on-disk layout loosely follows that of LevelDB [3]. Each Map is a sequence of blocks; each block comprises a sequence of entries, followed by a trailer:

block := blockEntry* blockTrailer
blockEntry :=
	nshared:   uvarint           // number of bytes shared with previous key
	nunshared: uvarint           // number of new bytes in this entry's key
	nvalue:    uvarint           // number of bytes in value
	key:       uint8[nunshared]  // the (prefix compressed) key
	value:     uint8[nvalue]     // the entry's value
blockTrailer :=
	restarts:  uint32[nrestart]  // array of key restarts
	nrestart:  uint32            // size of restart array
	type:      uint8             // block type (should be 0; reserved for future use)
	crc32:     uint32            // IEEE crc32 of contents and trailer

Maps prefix compress each key by storing the number of bytes shared with the previous key. Maps contain a number of restart points: points at which the full key is specified (and nshared = 0). The restart point are stored in an array in the block trailer. This array can be used to perform binary search for keys.

A Map is a sequence of data blocks, followed by an index block, followed by a trailer.

map := block(data)* block(meta)* block(index) mapTrailer
mapTrailer :=
	meta:	blockAddr[20]  // zero-padded address of the meta block index (tbd)
	index:  blockAddr[20]  // zero-padded address of index
	magic:	uint64         // magic (0xa8b2374e8558bc76)
blockAddr :=
	offset: uvarint        // offset of block in map
	len:    uvarint        // length of block

The index block contains one entry for each block in the map: each entry's key is the last key in that block; the entry's value is a blockAddr containing the position of that block. This arrangement allows the reader to binary search the index block then search the found block.

[1] https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf [2] https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf [3] https://github.com/google/leveldb

Index

Package Files

type Buf

type Buf struct {
	// contains filtered or unexported fields
}

A Buf is an unordered write buffer for maps. It holds entries in memory; these are then sorted and written to a map.

func (*Buf) Append

func (b *Buf) Append(key, value []byte)

Append append the given entry to the buffer.

func (*Buf) Len

func (b *Buf) Len() int

Len implements sort.Interface

func (*Buf) Less

func (b *Buf) Less(i, j int) bool

Less implements sort.Interface

func (*Buf) Size

func (b *Buf) Size() int

Size returns the number size of this buffer in bytes.

func (*Buf) Swap

func (b *Buf) Swap(i, j int)

Swap implements sort.Interface

func (*Buf) WriteTo

func (b *Buf) WriteTo(w *Writer) error

WriteTo sorts and then writes all of the entries in this buffer to the provided writer.

type Map

type Map struct {
	// contains filtered or unexported fields
}

Map is a read-only, sorted map backed by an io.ReadSeeker. The on-disk layout of maps are described by the package documentation. Maps support both lookup and (ordered) iteration. A Map instance maintains a current position, starting out at the first entry.

func New

func New(r io.ReadSeeker) (*Map, error)

New opens the map at the provided io.ReadSeeker (usually a file).

func (*Map) Seek

func (m *Map) Seek(key []byte) *MapScanner

Seek returns a map scanner beginning at the first key in the map >= the provided key.

type MapScanner

type MapScanner struct {
	// contains filtered or unexported fields
}

MapScanner implements ordered iteration over a map.

func (*MapScanner) Err

func (m *MapScanner) Err() error

Err returns the last error encountered while scanning.

func (*MapScanner) Key

func (m *MapScanner) Key() []byte

Key returns the key that was last scanned.

func (*MapScanner) Scan

func (m *MapScanner) Scan() bool

Scan scans the next entry, returning true on success. When Scan returns false, the caller should inspect Err to distinguish between scan completion and scan error.

func (*MapScanner) Value

func (m *MapScanner) Value() []byte

Value returns the value that was last scanned.

type Merged

type Merged []*Map

Merged represents the merged contents of multiple underlying maps. Like Map, Merged presents a sorted, scannable map, but it does not guarantee that the order of traversal is stable.

func (Merged) Seek

func (m Merged) Seek(key []byte) *MergedScanner

Seek returns a scanner for the merged map that starts at the first entry where entryKey <= key.

type MergedScanner

type MergedScanner []*MapScanner

MergedScanner is a scanner for merged maps.

func (MergedScanner) Err

func (m MergedScanner) Err() error

Err returns the last error encountered while scanning, if any.

func (MergedScanner) Key

func (m MergedScanner) Key() []byte

Key returns the last key scanned.

func (MergedScanner) Len

func (m MergedScanner) Len() int

Len implements heap.Interface

func (MergedScanner) Less

func (m MergedScanner) Less(i, j int) bool

Less implements heap.Interface

func (*MergedScanner) Pop

func (m *MergedScanner) Pop() interface{}

Pop implements heap.Interface

func (*MergedScanner) Push

func (m *MergedScanner) Push(x interface{})

Push implements heap.Interface

func (*MergedScanner) Scan

func (m *MergedScanner) Scan() bool

Scan scans the next entry in the merged map, returning true on success. If Scan returns false, the caller should check Err to distinguish between scan completion and scan error.

func (MergedScanner) Swap

func (m MergedScanner) Swap(i, j int)

Swap implements heap.Interface

func (MergedScanner) Value

func (m MergedScanner) Value() []byte

Value returns the last value scanned.

type Scanner

type Scanner interface {
	// Scan scans the next entry, returning true on success, after which
	// time the entry is available to inspect using the Key and Value
	// methods.
	Scan() bool
	// Err returns the last error encountered while scanning, if any.
	Err() error
	// Key returns the key of the last scanned entry.
	Key() []byte
	// Value returns the value of the last scanned entry.
	Value() []byte
}

Scanner is an ordered iterator over map entries.

type WriteOption

type WriteOption func(*Writer)

WriteOption represents a tunable writer parameter.

func BlockSize

func BlockSize(sz int) WriteOption

BlockSize sets the writer's target block size to sz (in bytes). Note that keys and values cannot straddle blocks, so that if large data are added to a map, block sizes can grow large. The default target block size is 4KB.

func RestartInterval

func RestartInterval(iv int) WriteOption

RestartInterval sets the writer's restart interval to provided value. The default restart interval is 16.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

A Writer appends key-value pairs to a map. Keys must be appended in lexicographic order.

func NewWriter

func NewWriter(w io.Writer, opts ...WriteOption) *Writer

NewWriter returns a new Writer that writes a map to the provided io.Writer. BlockSize specifies the target block size, while restartInterval determines the frequency of key restart points, which trades off lookup performance with size. See package docs for more details.

func (*Writer) Append

func (w *Writer) Append(key, value []byte) error

Append appends an entry to the maps. Keys must be provided in lexicographic order.

func (*Writer) Close

func (w *Writer) Close() error

Close flushes the last block of the writer and writes the map's trailer. After successful close, the map is ready to be opened.

func (*Writer) Flush

func (w *Writer) Flush() error

Flush creates a new block with the current contents. It forces the creation of a new block, and overrides the Writer's block size parameter.

Documentation was rendered with GOOS=linux and GOARCH=amd64.

Jump to identifier

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to identifier