hybrid

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 30, 2019 License: BSD-3-Clause Imports: 16 Imported by: 0

Documentation

Overview

Package hybrid provides a hybrid FSDB implementation.

A hybrid FSDB is backed by a local FSDB and a remote bucket. All data are written locally first, then a background thread will upload them to the remote bucket and delete the local data. Read operations will check local FSDB first, and fetch from bucket if it does not present locally. When remote read happens, the data will be saved locally until the next upload loop.

Data stored on the remote bucket will be gzipped using best compression level.

Concurrency

If you turn off the optional row lock (default is on), there are two possible cases we might lose date due to race conditions, but they are very unlikely.

The first case is remote read. The read process is:

  1. Check local FSDB.
  2. Read fully from remote bucket.
  3. Check local FSDB again to prevent using stale remote data to overwrite local data.
  4. If there's still no local data in Step 3, write remote data locally.
  5. Return local data.

If another write happens between Step 3 and 4, then it might be overwritten by stale remote data.

The other case is during upload. The upload process for each key is:

  1. Read local data, calculate crc32c.
  2. Gzip local data, upload to remote bucket.
  3. Calculate local data crc32c again.
  4. If the crc32c from Step 1 and Step 3 matches, delete local data.

If another write happens between Step 3 and 4, then it might be deleted on Step 4 so we only have stale data in the system.

Turning on the optional row lock will make sure the discussed data loss scenarios won't happen, but it also degrade the performance slightly. The lock is only used partially inside the operations (whole local write operation, remote read from Step 3, upload from Step 3).

There are no other locks used in the code, except a few atomic numbers in upload loop for logging purpose.

Example
package main

import (
	"context"
	"io/ioutil"
	"os"
	"strings"

	"github.com/fishy/fsdb"
	"github.com/fishy/fsdb/bucket"
	"github.com/fishy/fsdb/hybrid"
	"github.com/fishy/fsdb/local"
)

func main() {
	root, _ := ioutil.TempDir("", "fsdb_")
	defer os.RemoveAll(root)

	var bucket bucket.Bucket
	// TODO: open bucket from an implementation

	ctx, cancel := context.WithCancel(context.Background())
	db := hybrid.Open(
		ctx,
		local.Open(local.NewDefaultOptions(root)),
		bucket,
		hybrid.NewDefaultOptions(),
	)
	defer cancel() // Stop the upload loop, not really necessary

	key := fsdb.Key("key")

	if err := db.Write(ctx, key, strings.NewReader("Hello, world!")); err != nil {
		// TODO: handle error
	}

	reader, err := db.Read(ctx, key)
	if err != nil {
		// TODO: handle error
	}
	defer reader.Close()
	// TODO: read from reader

	if err := db.Delete(ctx, key); err != nil {
		// TODO: handle error
	}
}
Output:

Index

Examples

Constants

View Source
const (
	DefaultUploadDelay     time.Duration = time.Minute * 5
	DefaultUploadThreadNum               = 5
	DefaultUseLock                       = true
)

Default options values.

Variables

View Source
var DefaultSkipFunc = UploadAll

DefaultSkipFunc is the default skip function used.

Functions

func DefaultNameFunc

func DefaultNameFunc(key fsdb.Key) string

DefaultNameFunc is the default name function used.

The format is:

fsdb/data/<sha-512/224 of key>.gz

func Open

func Open(
	ctx context.Context,
	local fsdb.Local,
	bucket bucket.Bucket,
	opts Options,
) fsdb.FSDB

Open creates a hybrid FSDB, which is backed by a local FSDB and a remote bucket.

There's no need to close, but you could cancel the context to stop the upload loop.

Read reads from local first, then read from remote bucket if it does not exist locally. In that case, the data will be saved locally for cache until the next upload loop.

Write writes locally. There is a background scan loop to upload everything from local to remote, then deletes the local copy after the upload succeed.

Delete deletes from both local and remote, and returns combined errors, if any.

github.com/fishy/gcsbucket and github.com/fishy/s3bucket provide bucket.Bucket implementations for Google Cloud Storage and AWS S3, respectively. And github.com/fishy/blobbucket provides a bucket.Bucket implementation based on Go-Cloud Blob interface.

func SkipAll

func SkipAll(key fsdb.Key) bool

SkipAll is the skip function that retains everything locally.

func UploadAll

func UploadAll(key fsdb.Key) bool

UploadAll is the skip function that uploads everything to remote bucket.

Types

type Options

type Options interface {
	// GetUploadDelay returns the delay between two upload scan loops.
	GetUploadDelay() time.Duration

	// GetUploadThreadNum returns the number of threads used in upload scan loops.
	//
	// The higher the number, the faster the uploads,
	// but it also means heavier disk I/O load.
	GetUploadThreadNum() int

	// GetUseLock returns whether we should use a row lock.
	//
	// Uses a row lock guarantees that we do not overwrite newer data with stale
	// data, but it also degrades all operations.
	//
	// Refer to the package documentation for more details.
	GetUseLock() bool

	// GetLogger returns the logger to be used in hybrid FSDB.
	//
	// If it returns nil, nothing will be logged.
	GetLogger() *log.Logger

	// GetRemoteName returns the name for the data file on remote bucket.
	GetRemoteName(key fsdb.Key) string

	// SkipKey returns true if the key should not be uploaded to remote bucket
	// (retain locally), or false if the key should be uploaded to remote bucket.
	SkipKey(key fsdb.Key) bool

	// It's possible that this function need to read from the hybrid FSDB,
	// so it's allowed to be changed in read-only Options.
	SetSkipFunc(f func(fsdb.Key) bool)
}

Options defines a read-only view of options used in hybrid FSDB.

type OptionsBuilder

type OptionsBuilder interface {
	Options

	// Build builds the read-only view of the options.
	Build() Options

	// SetUploadDelay sets the delay between two upload scan loops.
	SetUploadDelay(delay time.Duration) OptionsBuilder

	// SetUploadThreadNum sets the number of threads used in upload scan loops.
	SetUploadThreadNum(threads int) OptionsBuilder

	// SetUseLock sets whether to use a row lock.
	SetUseLock(lock bool) OptionsBuilder

	// SetLogger sets the logger used in hybrid FSDB.
	SetLogger(logger *log.Logger) OptionsBuilder

	// SetRemoteNameFunc sets the function for GetRemoteName.
	SetRemoteNameFunc(f func(fsdb.Key) string) OptionsBuilder
}

OptionsBuilder defines a read write view of options used in hybrid FSDB.

func NewDefaultOptions

func NewDefaultOptions() OptionsBuilder

NewDefaultOptions creates the default options.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL