flatfs

package module
v0.0.0-...-691bb08 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2024 License: MIT Imports: 24 Imported by: 0

README

sia-ds

[WARNING] Work In progress: Can lead to data loss!

A datastore implementation using sharded directories and flat files to store data backed by Sia renterd for backup

sia-ds is used by go-ipfs to store raw block contents on disk. It supports several sharding functions (prefix, suffix, next-to-last/*). It is based on the go-ds-flatfs implementation with additional functionality to connect with Sia Renterd. All the blocks are backed up to the renterd and are automatically restored during lookup operations if not found locally.

It is not a general-purpose datastore and has several important restrictions. See the restrictions section for details.

Lead Maintainer

Table of Contents

Install

sia-ds can be used like any Go module:

import "github.com/IPFSR/sia-ds"

Design

  • Each PUT operation within IPFS is consistently forwarded to the Renterd node. If the API call to Renterd fails, then operation is discarded in IPFS too. This done was to chose consistency over availability in case of partitioning.
  • The DELETE operations are not synced with Renterd by default. This strategy allows the users to restore back any data they might have deleted from IPFS just by trying to access it.
  • Since all blocks are unique and this the resulting files names are unique, we don't need to account for object already existing in the bucket. IPFS already does a GET check before doing a PUT operation.
  • The default Renterd bucket name is IPFS but can be edited to allow for multiple IPFS node connecting to the same Renterd node. Users can choose a different bucket name for each IPFS node.
  • All accidental deletes from IPFS or Filesystem should be restored by just accessing the block if the DELETE ops are not being synced. In case, they are being synced, only Filesystem level deletes can be restored.

Usage

You can run the Sia backed IPFS node by either downloading the customised go-IPFS implementation from here or you can use this plugin with a vanilla Kubo by replacing the github.com/ipfs/go-ds-flatfs with github.com/IPFS/sia-ds in file plugin/plugins/flatfs/flatfs.go

Please make sure to setup your renterd() first. Once it is running, export the following variables to a terminal and initiate a new IPFS node.

Env Var Default Description
IPFS_SIA_RENTERD_PASSWORD Renterd Password
IPFS_SIA_RENTERD_WORKER_ADDRESS Renterd worker API address (ex: http://127.0.0.1:9980)
IPFS_SIA_RENTERD_BUCKET IPFS A private bucket with this name will be created and used
IPFS_SIA_SYNC_DELETE False(0) If set to True(1), the DELETE operation will be synced to Renterd bucket
Instructions
  • Set up a renterd node and note down the password and API address.
  • Open a new terminal and clone the IPFSR Kubo (go-IPFS)
git clone https://github.com/IPFSR/kubo
  • Enter the directory and build the binary
cd kubo
make build
  • Export the IPFS_SIA_RENTERD_PASSWORD and IPFS_SIA_RENTERD_WORKER_ADDRESS environment variables.
  • Initiate a new IPFS node
cmd/ipfs/ipfs init
  • Verify that a new bucket named "IPFS" was created the Renterd
  • Your IPFS node is not connected to the Renterd node.
Restrictions

FlatFS keys are severely restricted. Only keys that match /[0-9A-Z+-_=]\+ are allowed. That is, keys may only contain upper-case alpha-numeric characters, '-', '+', '_', and '='. This is because values are written directly to the filesystem without encoding.

Importantly, this means namespaced keys (e.g., /FOO/BAR), are not allowed. Attempts to write to such keys will result in an error.

DiskUsage and Accuracy

This datastore implements the PersistentDatastore interface. It offers a DiskUsage() method which strives to find a balance between accuracy and performance. This implies:

  • The total disk usage of a datastore is calculated when opening the datastore
  • The current disk usage is cached frequently in a file in the datastore root (diskUsage.cache by default). This file is also written when the datastore is closed.
  • If this file is not present when the datastore is opened:
    • The disk usage will be calculated by walking the datastore's directory tree and estimating the size of each folder.
    • This may be a very slow operation for huge datastores or datastores with slow disks
    • The operation is time-limited (5 minutes by default).
    • Upon timeout, the remaining folders will be assumed to have the average of the previously processed ones.
  • After opening, the disk usage is updated in every write/delete operation.

This means that for certain datastores (huge ones, those with very slow disks or special content), the values reported by DiskUsage() might be reduced accuracy and the first startup (without a diskUsage.cache file present), might be slow.

If you need increased accuracy or a fast start from the first time, you can manually create or update the diskUsage.cache file.

The file diskUsage.cache is a JSON file with two fields diskUsage and accuracy. For example the JSON file for a small repo might be:

{"diskUsage":6357,"accuracy":"initial-exact"}

diskUsage is the calculated disk usage and accuracy is a note on the accuracy of the initial calculation. If the initial calculation was accurate the file will contain the value initial-exact. If some of the directories have too many entries and the disk usage for that directory was estimated based on the first 2000 entries, the file will contain initial-approximate. If the calculation took too long and timed out as indicated above, the file will contain initial-timed-out.

If the initial calculation timed out the JSON file might be:

{"diskUsage":7589482442898,"accuracy":"initial-timed-out"}

To fix this with a more accurate value you could do (in the datastore root):

$ du -sb .
7536515831332    .
$ echo -n '{"diskUsage":7536515831332,"accuracy":"initial-exact"}' > diskUsage.cache

Contribute

PRs accepted.

Small note: If editing the README, please conform to the standard-readme specification.

License

MIT

Documentation

Overview

Package flatfs is a Datastore implementation that stores all objects in a two-level directory structure in the local file system, regardless of the hierarchy of the keys.

Package flatfs is a Datastore implementation that stores all objects in a two-level directory structure in the local file system, regardless of the hierarchy of the keys.

Index

Constants

View Source
const (
	SIA_PASS     = "IPFS_SIA_RENTERD_PASSWORD"
	SIA_ADDR     = "IPFS_SIA_RENTERD_WORKER_ADDRESS"
	SIA_BUCKET   = "IPFS_SIA_RENTERD_BUCKET"
	SIA_SYNC_DEL = "IPFS_SIA_SYNC_DELETE"
)
View Source
const PREFIX = "/repo/flatfs/shard/"
View Source
const README_FN = "_README"
View Source
const SHARDING_FN = "SHARDING"
View Source
const SyncThreadsMax = 16

don't block more than 16 threads on sync opearation 16 should be able to sataurate most RAIDs in case of two used disks per write (RAID 1, 5) and queue depth of 2, 16 concurrent Sync calls should be able to saturate 16 HDDs RAID TODO: benchmark it out, maybe provide tweak parmeter

Variables

View Source
var (
	// DiskUsageFile is the name of the file to cache the size of the
	// datastore in disk
	DiskUsageFile = "diskUsage.cache"
	// DiskUsageFilesAverage is the maximum number of files per folder
	// to stat in order to calculate the size of the datastore.
	// The size of the rest of the files in a folder will be assumed
	// to be the average of the values obtained. This includes
	// regular files and directories.
	DiskUsageFilesAverage = 2000
	// DiskUsageCalcTimeout is the maximum time to spend
	// calculating the DiskUsage upon a start when no
	// DiskUsageFile is present.
	// If this period did not suffice to read the size of the datastore,
	// the remaining sizes will be estimated.
	DiskUsageCalcTimeout = 5 * time.Minute
	// RetryDelay is a timeout for a backoff on retrying operations
	// that fail due to transient errors like too many file descriptors open.
	RetryDelay = time.Millisecond * 200

	// RetryAttempts is the maximum number of retries that will be attempted
	// before giving up.
	RetryAttempts = 6
)
View Source
var (
	ErrDatastoreExists       = errors.New("datastore already exists")
	ErrDatastoreDoesNotExist = errors.New("datastore directory does not exist")
	ErrShardingFileMissing   = fmt.Errorf("%s file not found in datastore", SHARDING_FN)
	ErrClosed                = errors.New("datastore closed")
	ErrInvalidKey            = errors.New("key not supported by flatfs")
)
View Source
var IPFS_DEF_SHARD = NextToLast(2)
View Source
var IPFS_DEF_SHARD_STR = IPFS_DEF_SHARD.String()
View Source
var README_IPFS_DEF_SHARD = `` /* 1123-byte string literal not displayed */

Functions

func Create

func Create(path string, fun *ShardIdV1) error

func DowngradeV1toV0

func DowngradeV1toV0(path string) error

func Move

func Move(oldPath string, newPath string, out io.Writer) error

func UpgradeV0toV1

func UpgradeV0toV1(path string, prefixLen int) error

func WriteReadme

func WriteReadme(dir string, id *ShardIdV1) error

func WriteShardFunc

func WriteShardFunc(dir string, id *ShardIdV1) error

Types

type Datastore

type Datastore struct {
	// contains filtered or unexported fields
}

Datastore implements the go-datastore Interface. Note this datastore cannot guarantee order of concurrent write operations to the same key. See the explanation in Put().

func CreateOrOpen

func CreateOrOpen(path string, fun *ShardIdV1, sync bool) (*Datastore, error)

convenience method

func Open

func Open(path string, syncFiles bool) (*Datastore, error)

func (*Datastore) Accuracy

func (fs *Datastore) Accuracy() string

Accuracy returns a string representing the accuracy of the DiskUsage() result, the value returned is implementation defined and for informational purposes only

func (*Datastore) Batch

func (fs *Datastore) Batch(_ context.Context) (datastore.Batch, error)

func (*Datastore) Close

func (fs *Datastore) Close() error

func (*Datastore) Delete

func (fs *Datastore) Delete(ctx context.Context, key datastore.Key) error

Delete removes a key/value from the Datastore. Please read the Put() explanation about the handling of concurrent write operations to the same key.

func (*Datastore) DiskUsage

func (fs *Datastore) DiskUsage(ctx context.Context) (uint64, error)

DiskUsage implements the PersistentDatastore interface and returns the current disk usage in bytes used by this datastore.

The size is approximative and may slightly differ from the real disk values.

func (*Datastore) Get

func (fs *Datastore) Get(ctx context.Context, key datastore.Key) ([]byte, error)

func (*Datastore) GetSize

func (fs *Datastore) GetSize(ctx context.Context, key datastore.Key) (int, error)

func (*Datastore) Has

func (fs *Datastore) Has(ctx context.Context, key datastore.Key) (bool, error)

func (*Datastore) Put

func (fs *Datastore) Put(ctx context.Context, key datastore.Key, value []byte) error

Put stores a key/value in the datastore.

Note, that we do not guarantee order of write operations (Put or Delete) to the same key in this datastore.

For example. i.e. in the case of two concurrent Put, we only guarantee that one of them will come through, but cannot assure which one even if one arrived slightly later than the other. In the case of a concurrent Put and a Delete operation, we cannot guarantee which one will win.

func (*Datastore) Query

func (fs *Datastore) Query(ctx context.Context, q query.Query) (query.Results, error)

func (*Datastore) ShardStr

func (fs *Datastore) ShardStr() string

func (*Datastore) Sync

func (fs *Datastore) Sync(ctx context.Context, prefix datastore.Key) error

type ShardFunc

type ShardFunc func(string) string

type ShardIdV1

type ShardIdV1 struct {
	// contains filtered or unexported fields
}

func NextToLast

func NextToLast(suffixLen int) *ShardIdV1

Prefix returns a sharding function taking the suffixLen characters of the key before the very last character. If too short, the key is padded with "_".

func ParseShardFunc

func ParseShardFunc(str string) (*ShardIdV1, error)

func Prefix

func Prefix(prefixLen int) *ShardIdV1

Prefix returns a sharding function taking the first prefixLen characters of the key. If too short, the key is padded with "_".

func ReadShardFunc

func ReadShardFunc(dir string) (*ShardIdV1, error)

func Suffix

func Suffix(suffixLen int) *ShardIdV1

Prefix returns a sharding function taking the last suffixLen characters of the key. If too short, the key is padded with "_".

func (*ShardIdV1) Func

func (f *ShardIdV1) Func() ShardFunc

func (*ShardIdV1) String

func (f *ShardIdV1) String() string

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL