README

What is diskv?

Diskv (disk-vee) is a simple, persistent key-value store written in the Go language. It starts with an incredibly simple API for storing arbitrary data on a filesystem by key, and builds several layers of performance-enhancing abstraction on top. The end result is a conceptually simple, but highly performant, disk-backed storage system.

Build Status

Installing

Install Go 1, either from source or with a prepackaged binary. Then,

$ go get github.com/peterbourgon/diskv

Usage

package main

import (
	"fmt"
	"github.com/peterbourgon/diskv"
)

func main() {
	// Simplest transform function: put all the data files into the base dir.
	flatTransform := func(s string) []string { return []string{} }

	// Initialize a new diskv store, rooted at "my-data-dir", with a 1MB cache.
	d := diskv.New(diskv.Options{
		BasePath:     "my-data-dir",
		Transform:    flatTransform,
		CacheSizeMax: 1024 * 1024,
	})

	// Write three bytes to the key "alpha".
	key := "alpha"
	d.Write(key, []byte{'1', '2', '3'})

	// Read the value back out of the store.
	value, _ := d.Read(key)
	fmt.Printf("%v\n", value)

	// Erase the key+value from the store (and the disk).
	d.Erase(key)
}

More complex examples can be found in the "examples" subdirectory.

Theory

Basic idea

At its core, diskv is a map of a key (string) to arbitrary data ([]byte). The data is written to a single file on disk, with the same name as the key. The key determines where that file will be stored, via a user-provided TransformFunc, which takes a key and returns a slice ([]string) corresponding to a path list where the key file will be stored. The simplest TransformFunc,

func SimpleTransform (key string) []string {
    return []string{}
}

will place all keys in the same, base directory. The design is inspired by Redis diskstore; a TransformFunc which emulates the default diskstore behavior is available in the content-addressable-storage example.

Note that your TransformFunc should ensure that one valid key doesn't transform to a subset of another valid key. That is, it shouldn't be possible to construct valid keys that resolve to directory names. As a concrete example, if your TransformFunc splits on every 3 characters, then

d.Write("abcabc", val) // OK: written to <base>/abc/abc/abcabc
d.Write("abc", val)    // Error: attempted write to <base>/abc/abc, but it's a directory

This will be addressed in an upcoming version of diskv.

Probably the most important design principle behind diskv is that your data is always flatly available on the disk. diskv will never do anything that would prevent you from accessing, copying, backing up, or otherwise interacting with your data via common UNIX commandline tools.

Adding a cache

An in-memory caching layer is provided by combining the BasicStore functionality with a simple map structure, and keeping it up-to-date as appropriate. Since the map structure in Go is not threadsafe, it's combined with a RWMutex to provide safe concurrent access.

Adding order

diskv is a key-value store and therefore inherently unordered. An ordering system can be injected into the store by passing something which satisfies the diskv.Index interface. (A default implementation, using Petar Maymounkov's LLRB tree, is provided.) Basically, diskv keeps an ordered (by a user-provided Less function) index of the keys, which can be queried.

Adding compression

Something which implements the diskv.Compression interface may be passed during store creation, so that all Writes and Reads are filtered through a compression/decompression pipeline. Several default implementations, using stdlib compression algorithms, are provided. Note that data is cached compressed; the cost of decompression is borne with each Read.

Streaming

diskv also now provides ReadStream and WriteStream methods, to allow very large data to be handled efficiently.

Future plans

  • Needs plenty of robust testing: huge datasets, etc...
  • More thorough benchmarking
  • Your suggestions for use-cases I haven't thought of
Expand ▾ Collapse ▴

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Compression

type Compression interface {
	Writer(dst io.Writer) (io.WriteCloser, error)
	Reader(src io.Reader) (io.ReadCloser, error)
}

    Compression is an interface that Diskv uses to implement compression of data. Writer takes a destination io.Writer and returns a WriteCloser that compresses all data written through it. Reader takes a source io.Reader and returns a ReadCloser that decompresses all data read through it. You may define these methods on your own type, or use one of the NewCompression helpers.

    func NewGzipCompression

    func NewGzipCompression() Compression

      NewGzipCompression returns a Gzip-based Compression.

      func NewGzipCompressionLevel

      func NewGzipCompressionLevel(level int) Compression

        NewGzipCompressionLevel returns a Gzip-based Compression with the given level.

        func NewZlibCompression

        func NewZlibCompression() Compression

          NewZlibCompression returns a Zlib-based Compression.

          func NewZlibCompressionLevel

          func NewZlibCompressionLevel(level int) Compression

            NewZlibCompressionLevel returns a Zlib-based Compression with the given level.

            func NewZlibCompressionLevelDict

            func NewZlibCompressionLevelDict(level int, dict []byte) Compression

              NewZlibCompressionLevelDict returns a Zlib-based Compression with the given level, based on the given dictionary.

              type Diskv

              type Diskv struct {
              	Options
              	// contains filtered or unexported fields
              }

                Diskv implements the Diskv interface. You shouldn't construct Diskv structures directly; instead, use the New constructor.

                func New

                func New(o Options) *Diskv

                  New returns an initialized Diskv structure, ready to use. If the path identified by baseDir already contains data, it will be accessible, but not yet cached.

                  func (*Diskv) Erase

                  func (d *Diskv) Erase(key string) error

                    Erase synchronously erases the given key from the disk and the cache.

                    func (*Diskv) EraseAll

                    func (d *Diskv) EraseAll() error

                      EraseAll will delete all of the data from the store, both in the cache and on the disk. Note that EraseAll doesn't distinguish diskv-related data from non- diskv-related data. Care should be taken to always specify a diskv base directory that is exclusively for diskv data.

                      func (*Diskv) Has

                      func (d *Diskv) Has(key string) bool

                        Has returns true if the given key exists.

                        func (*Diskv) Import

                        func (d *Diskv) Import(srcFilename, dstKey string, move bool) (err error)

                          Import imports the source file into diskv under the destination key. If the destination key already exists, it's overwritten. If move is true, the source file is removed after a successful import.

                          func (*Diskv) Keys

                          func (d *Diskv) Keys(cancel <-chan struct{}) <-chan string

                            Keys returns a channel that will yield every key accessible by the store, in undefined order. If a cancel channel is provided, closing it will terminate and close the keys channel.

                            func (*Diskv) KeysPrefix

                            func (d *Diskv) KeysPrefix(prefix string, cancel <-chan struct{}) <-chan string

                              KeysPrefix returns a channel that will yield every key accessible by the store with the given prefix, in undefined order. If a cancel channel is provided, closing it will terminate and close the keys channel. If the provided prefix is the empty string, all keys will be yielded.

                              func (*Diskv) Read

                              func (d *Diskv) Read(key string) ([]byte, error)

                                Read reads the key and returns the value. If the key is available in the cache, Read won't touch the disk. If the key is not in the cache, Read will have the side-effect of lazily caching the value.

                                func (*Diskv) ReadStream

                                func (d *Diskv) ReadStream(key string, direct bool) (io.ReadCloser, error)

                                  ReadStream reads the key and returns the value (data) as an io.ReadCloser. If the value is cached from a previous read, and direct is false, ReadStream will use the cached value. Otherwise, it will return a handle to the file on disk, and cache the data on read.

                                  If direct is true, ReadStream will lazily delete any cached value for the key, and return a direct handle to the file on disk.

                                  If compression is enabled, ReadStream taps into the io.Reader stream prior to decompression, and caches the compressed data.

                                  func (*Diskv) Write

                                  func (d *Diskv) Write(key string, val []byte) error

                                    Write synchronously writes the key-value pair to disk, making it immediately available for reads. Write relies on the filesystem to perform an eventual sync to physical media. If you need stronger guarantees, see WriteStream.

                                    func (*Diskv) WriteStream

                                    func (d *Diskv) WriteStream(key string, r io.Reader, sync bool) error

                                      WriteStream writes the data represented by the io.Reader to the disk, under the provided key. If sync is true, WriteStream performs an explicit sync on the file as soon as it's written.

                                      bytes.Buffer provides io.Reader semantics for basic data types.

                                      type Index

                                      type Index interface {
                                      	Initialize(less LessFunction, keys <-chan string)
                                      	Insert(key string)
                                      	Delete(key string)
                                      	Keys(from string, n int) []string
                                      }

                                        Index is a generic interface for things that can provide an ordered list of keys.

                                        type LLRBIndex

                                        type LLRBIndex struct {
                                        	sync.RWMutex
                                        	LessFunction
                                        	*llrb.LLRB
                                        }

                                          LLRBIndex is an implementation of the Index interface using Petar Maymounkov's LLRB tree.

                                          func (*LLRBIndex) Delete

                                          func (i *LLRBIndex) Delete(key string)

                                            Delete removes the given key (only) from the LLRB tree.

                                            func (*LLRBIndex) Initialize

                                            func (i *LLRBIndex) Initialize(less LessFunction, keys <-chan string)

                                              Initialize populates the LLRB tree with data from the keys channel, according to the passed less function. It's destructive to the LLRBIndex.

                                              func (*LLRBIndex) Insert

                                              func (i *LLRBIndex) Insert(key string)

                                                Insert inserts the given key (only) into the LLRB tree.

                                                func (*LLRBIndex) Keys

                                                func (i *LLRBIndex) Keys(from string, n int) []string

                                                  Keys yields a maximum of n keys in order. If the passed 'from' key is empty, Keys will return the first n keys. If the passed 'from' key is non-empty, the first key in the returned slice will be the key that immediately follows the passed key, in key order.

                                                  type LessFunction

                                                  type LessFunction func(string, string) bool

                                                    LessFunction is used to initialize an Index of keys in a specific order.

                                                    type Options

                                                    type Options struct {
                                                    	BasePath     string
                                                    	Transform    TransformFunction
                                                    	CacheSizeMax uint64 // bytes
                                                    	PathPerm     os.FileMode
                                                    	FilePerm     os.FileMode
                                                    
                                                    	Index     Index
                                                    	IndexLess LessFunction
                                                    
                                                    	Compression Compression
                                                    }

                                                      Options define a set of properties that dictate Diskv behavior. All values are optional.

                                                      type TransformFunction

                                                      type TransformFunction func(s string) []string

                                                        TransformFunction transforms a key into a slice of strings, with each element in the slice representing a directory in the file path where the key's entry will eventually be stored.

                                                        For example, if TransformFunc transforms "abcdef" to ["ab", "cde", "f"], the final location of the data file will be <basedir>/ab/cde/f/abcdef

                                                        Directories

                                                        Path Synopsis
                                                        examples