bs

package module
v0.4.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 23, 2022 License: MIT Imports: 11 Imported by: 0

README

BS, a content-addressable blob store

Go Reference Go Report Card Tests

This is BS, an implementation of a content-addressable blob store.

A blob store stores arbitrarily sized sequences of bytes, or blobs, and indexes them by their hash, which is used as a unique key. This key is called the blob’s reference, or ref.

With a sufficiently good hash algorithm, the likelihood of any two distinct blobs “colliding” is so small that you’re better off worrying about the much-more-likely danger of spontaneously combusting while also facing a thundering herd of angry rhinoceroses, at the very moment of a solar eclipse.

This module uses sha2-256, which is a sufficiently good hash algorithm.

The fact that the lookup key is computed from a blob’s content, rather than by its location in memory or the order in which it was added, is the meaning of “content-addressable.”

Content addressability has some desirable properties, but it does mean that if some data changes, so does its ref, which can make it tricky to keep track of a piece of data over its lifetime. So in addition to a plain blob store, this module provides an anchor store. An anchor is a structured blob mapping a name to a timestamp and a blob ref. You can give a blob a name (such as a filename) by storing an anchor pointing to the blob’s ref. As the data changes, you can store new anchors with the same name but an updated ref and timestamp. An anchor store lets you retrieve the latest ref for a given name as of a given timestamp.

Blob stores work best when blobs are not too big, so when storing potentially large bytestreams, use a split.Writer (in the split subpackage). This splits the input into multiple blobs organized as a tree, and it returns the ref of the tree’s root. The bytestream can be reassembled with split.Read.

When splitting, blob boundaries are determined not by position or size but by content, using the technique of hashsplitting. This same technique is used by rsync and other projects to represent file changes very compactly: if two versions of a file have a small difference, only the blob containing the difference is affected. The other blobs of the file are unchanged. This is the same reason that the blobs are organized into a tree. If the blobs were organized as a list, the whole list would have to change any time a blob is added, removed, or replaced. But as a tree, only the subtree with the affected blob has to change.

The bs package describes an abstract Store interface. The store subpackage contains a registry for different concrete types of blob store: a memory-based one, a file-based one, a SQLite-based one, a Postgresql-based one, and a Google Cloud Storage one. There is also a blob store that is an LRU cache for an underlying blob store, and a blob store that transforms blobs (compressing or encrypting them, for instance) on their way into and out of an underlying blob store. And store/rpc supplies a simple grpc server wrapping a Store, and a client for it that is a Store.

BS is inspired by, and a simplification of, the Perkeep project, which is presently dormant.

Documentation

Overview

Package bs is a content-addressable blob store.

A blob store stores arbitrarily sized sequences of bytes, or _blobs_, and indexes them by their hash, which is used as a unique key. This key is called the blob’s reference, or _ref_.

With a sufficiently good hash algorithm, the likelihood of any two distinct blobs “colliding” is so small that you’re better off worrying about the much-more-likely danger of spontaneously combusting while also facing a thundering herd of angry rhinoceroses, at the very moment of a solar eclipse.

This module uses sha2-256, which is a sufficiently good hash algorithm.

The fact that the lookup key is computed from a blob’s content, rather than by its location in memory or the order in which it was added, is the meaning of “content-addressable.”

Content addressability has some desirable properties, but it does mean that if some data changes, so does its ref, which can make it tricky to keep track of a piece of data over its lifetime. So in addition to a plain blob store, this module provides an "anchor" store. An anchor is a structured blob containing a name, a timestamp, and a blob ref. You can give a blob a name (such as a filename) by storing an anchor pointing to the blob’s ref. As the data changes, you can store new anchors with the same name but an updated ref and timestamp. An anchor store lets you retrieve the latest ref for a given name as of a given timestamp.

Blob stores work best when blobs are not too big, so when storing potentially large bytestreams, use the split.Write function (in the split subpackage). This splits the input into multiple blobs organized as a tree, and it returns the ref of the tree’s root. The bytestream can be reassembled with split.Read.

BS is inspired by, and a simplification of, the Perkeep project (https://perkeep.org/), which is presently dormant.

Index

Constants

This section is empty.

Variables

View Source
var ErrNotFound = errors.New("not found")

ErrNotFound is the error returned when a Getter tries to access a non-existent ref.

Functions

func GetMulti added in v0.3.0

func GetMulti(ctx context.Context, g Getter, refs []Ref) (map[Ref]Blob, error)

GetMulti gets multiple blobs with a single call. By default this is implemented as a bunch of concurrent individual Get calls. However, if g implements MultiGetter, its GetMulti method is used instead. The return value is a mapping of input refs to the blobs that were found in g. The returned error may be a MultiErr, mapping input refs to errors encountered retrieving those specific refs. This function may return a successful partial result even in case of error. In particular, when the error return is a MultiErr, every input ref appears in either the result map or the MultiErr map.

func GetProto

func GetProto(ctx context.Context, g Getter, ref Ref, m proto.Message) error

GetProto reads a blob from a blob store and parses it into the given protocol buffer.

func PutMulti added in v0.3.0

func PutMulti(ctx context.Context, s Store, blobs []Blob) (map[Ref]bool, error)

PutMulti stores multiple blobs with a single call. By default this is implemented as a bunch of concurrent individual Put calls. However, if s implements MultiPutter, its PutMulti method is used instead. The return value is a mapping of input blobs' refs to a boolean indicating whether each was a new addition to s. The returned error may be a MultiErr, mapping input blobs' refs to errors encountered writing those specific blobs. This function may return a successful partial result even in case of error. In particular, when the error return is a MultiErr, the ref of every input blob appears in either the result map or the MultiErr map.

Types

type Blob

type Blob []byte

Blob is a data blob.

func (Blob) Ref

func (b Blob) Ref() Ref

Ref computes the Ref of a blob.

type DeleterStore

type DeleterStore interface {
	Store

	// Delete deletes the blob identified by the given ref from the store.
	// Implementations may choose to return nil or ErrNotFound in the case where the ref does not exist.
	//
	// TODO: What if the store is also an anchor.Store and this deletes the target of an anchor?
	// Probably should delete that anchor, too.
	Delete(context.Context, Ref) error
}

DeleterStore is the type of a Store that can also delete blobs.

type Getter

type Getter interface {
	// Get gets a blob by its ref.
	Get(context.Context, Ref) (Blob, error)

	// ListRefs calls a function for each blob ref in the store in lexicographic order,
	// beginning with the first ref _after_ the specified one.
	//
	// The calls reflect at least the set of refs
	// known at the moment ListRefs was called.
	// It is unspecified whether later changes,
	// that happen concurrently with ListRefs,
	// are reflected.
	//
	// If the callback function returns an error,
	// ListRefs exits with that error.
	ListRefs(context.Context, Ref, func(r Ref) error) error
}

Getter is a read-only Store (qv).

type MultiErr added in v0.3.0

type MultiErr map[Ref]error

MultiErr is a type of error returned by GetMulti and PutMulti. It maps individual refs to errors encountered trying to Get or Put them.

func (MultiErr) Error added in v0.3.0

func (e MultiErr) Error() string

Error implements the error interface.

type MultiGetter added in v0.3.0

type MultiGetter interface {
	GetMulti(context.Context, []Ref) (map[Ref]Blob, error)
}

MultiGetter is an interface that Getters may optionally implement to make the GetMulti function efficient.

type MultiPutter added in v0.3.0

type MultiPutter interface {
	PutMulti(context.Context, []Blob) (map[Ref]bool, error)
}

MultiPutter is an interface that Stores may optionally implement to make the PutMulti function efficient.

type Ref

type Ref [sha256.Size]byte

Ref is the reference of a blob: its sha256 hash.

var Zero Ref

Zero is the zero ref.

func ProtoRef

func ProtoRef(m proto.Message) (Ref, error)

ProtoRef is a convenience function for computing the ref of a serialized protobuf.

func PutProto

func PutProto(ctx context.Context, s Store, m proto.Message) (Ref, bool, error)

PutProto serializes m and stores it as a blob in s.

The boolean result tells whether m's blob was newly added.

func RefFromBytes

func RefFromBytes(b []byte) Ref

RefFromBytes produces a Ref from a byte slice. The length of the byte slice is not checked.

func RefFromHex

func RefFromHex(s string) (Ref, error)

RefFromHex produces a Ref from a hex string.

func (*Ref) FromHex

func (r *Ref) FromHex(s string) error

FromHex parses the hex string `s` and places the result in `r`.

func (Ref) IsZero

func (r Ref) IsZero() bool

IsZero tells whether r is the zero ref.

func (Ref) Less

func (r Ref) Less(other Ref) bool

Less tells whether `r` is lexicographically less than `other`.

func (*Ref) Scan

func (r *Ref) Scan(src interface{}) error

Scan implements the "database/sql".Scanner.

func (Ref) String

func (r Ref) String() string

String converts a Ref to a hexadecimal string.

func (Ref) Value

func (r Ref) Value() (driver.Value, error)

Value implements the "database/sql".Valuer.

type Store

type Store interface {
	Getter

	// Put adds b to the store if it was not already present.
	// It returns b's ref and a boolean that is true iff the blob had to be added.
	Put(ctx context.Context, b Blob) (ref Ref, added bool, err error)
}

Store is a blob store. It stores byte sequences - "blobs" - of arbitrary length. Each blob can be retrieved using its "ref" as a lookup key. A ref is simply the SHA2-256 hash of the blob's content.

Directories

Path Synopsis
Package anchor defines anchor.Store, an extension to bs.Store that indexes "anchors," which are constant lookup names for changing blobs.
Package anchor defines anchor.Store, an extension to bs.Store that indexes "anchors," which are constant lookup names for changing blobs.
cmd
bs
Command bs is a general purpose CLI interface to blob stores.
Command bs is a general purpose CLI interface to blob stores.
Package fs implements blob store structures for representing files and directories.
Package fs implements blob store structures for representing files and directories.
Package gc implements garbage collection for blob stores.
Package gc implements garbage collection for blob stores.
Package schema implements miscellaneous data structures that can be converted to and from blobs.
Package schema implements miscellaneous data structures that can be converted to and from blobs.
Package split implements reading and writing of hashsplit trees in a blob store.
Package split implements reading and writing of hashsplit trees in a blob store.
Package store is a registry for Store factories.
Package store is a registry for Store factories.
file
Package file implements a blob store as a file hierarchy.
Package file implements a blob store as a file hierarchy.
gcs
Package gcs implements a blob store on Google Cloud Storage.
Package gcs implements a blob store on Google Cloud Storage.
logging
Package logging implements a store that delegates everything to a nested store, logging operations as they happen.
Package logging implements a store that delegates everything to a nested store, logging operations as they happen.
lru
Package lru implements a blob store that acts as a least-recently-used cache for a nested blob store.
Package lru implements a blob store that acts as a least-recently-used cache for a nested blob store.
mem
Package mem implements an in-memory blob store.
Package mem implements an in-memory blob store.
pg
Package pg implements a blob store in a Postgresql relational database schema.
Package pg implements a blob store in a Postgresql relational database schema.
rpc
Package rpc defines an RPC server managing a blob store, and a client for it that implements bs.Store (and anchor.Store).
Package rpc defines an RPC server managing a blob store, and a client for it that implements bs.Store (and anchor.Store).
sqlite3
Package sqlite3 implements a blob store in a Sqlite3 relational database schema.
Package sqlite3 implements a blob store in a Sqlite3 relational database schema.
transform
Package transform implements a blob store that can transform blobs into and out of a nested store.
Package transform implements a blob store that can transform blobs into and out of a nested store.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL