grasure

package module
v0.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 21, 2021 License: MIT Imports: 24 Imported by: 0

README

Go Reference

Grasure

Universal Erasure Coding Architecture in Go Implementing most popular erasured-based filesystem operations, it's readily used and integrated into other filesystems.

Project home: https://github.com/DurantVivado/Grasure

Godoc: https://pkg.go.dev/github.com/DurantVivado/Grasure

Project Architecture:

  • erasure-global.go contains the system-level interfaces and global structs and variables

  • erasure-init.go contains the basic config file(.hdr.sys) read and write operation, once there is file change, we update the config file.

  • erasure-errors.go contains the definitions for various possible errors.

  • erasure-encode.go contains operation for striped file encoding, one great thing is that you could specify the data layout.

  • erasure-layout.go You could specific the layout, for example, random data distribution or some other heuristics.

  • erasure-read.go contains operation for striped file reading, if some parts are lost, we try to recover.

  • erasure-update.go contains operation for striped file updating, if some parts are lost, we try to recover.

  • erasure-recover.go deals with multi-disk recovery, concerning both data and meta data.

  • erasure-update.go contains operation for striped file updating, if some parts are lost, we try to recover first.

import: reedsolomon library

Usage

A complete demonstration of various CLI usage lies in examples/buildAndRun.sh. You may have a glimpse. Here we elaborate the steps as following, in dir ./examples:

  1. Build the project:
go build -o main ./main.go  
  1. New a file named .hdr.disks.path in ./examples, list the path of your local disks, e.g.,
/home/server1/data/data1
/home/server1/data/data2
/home/server1/data/data3
/home/server1/data/data4
/home/server1/data/data5
/home/server1/data/data6
/home/server1/data/data7
/home/server1/data/data8
/home/server1/data/data9
/home/server1/data/data10
/home/server1/data/data11
/home/server1/data/data12
/home/server1/data/data13
/home/server1/data/data14
/home/server1/data/data15
/home/server1/data/data16
  1. Initialise the system, you should explictly attach the number of data(k) and parity shards (m) as well as blocksize (in bytes), remember k+m must NOT be bigger than 256.
./main -md init -k 12 -m 4 -bs 4096 -dn 16

bs is the blockSize in bytes and dn is the diskNum you intend to use in .hdr.disks.path. Obviously, you should spare some disks for fault torlerance purpose.

  1. Encode one examplar file.
./main -md encode -f {source file path} -conStripes 100 -o
  1. decode(read) the examplar file.
./grasure -md read -f {source file basename} -conStripes 100 -sp {destination file path} 

here conStripes denotes how many stripes are allowed to operate concurrently, default value is 100. sp means save path.

use fn to simulate the failed number of disks (default is 0), for example, -fn 2 simluates shutdown of arbitrary two disks. Relax, the data will not be really lost.

  1. check the hash string to see encode/decode is correct.
sha256sum {source file path}
sha256sum {destination file path}
  1. To delete the file in storage (currently irreversible, we are working on that):
./main -md delete -f {filebasename} -o
  1. To update a file in the storage:
./main -md update -f {filebasename} -nf {local newfile path} -o
  1. Recover a disk(e.g. all the file blobs in failed disk(s)), and transfer it to backup disks. This turns to be time-consuming job. The previous disk path file will be renamed to .hdr.disks.path.old. New disk config path will replace every failed path with the redundant one.
./main -md recover 

Storage System Structure

We display the structure of storage system using tree command. As shown below, each file is encoded and split into k+m parts then saved in N disks. Every part named BLOB is placed into a folder with the same basename of file. And the system's metadata (e.g., filename, filesize, filehash and file distribution) is recorded in META. Concerning reliability, we replicate the META file K-fold.(K is uppercased and not equal to aforementioned k). It functions as the general erasure-coding experiment settings and easily integrated into other systems. It currently suppports encode, read, update, and more coming soon.

server1@ubuntu:~/data$  tree . -Rh
.
├── [4.0K]  data1
│   ├── [4.0K]  Goprogramming.pdf
│   │   └── [1.3M]  BLOB
│   └── [ 46K]  META
├── [4.0K]  data10
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data11
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data12
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data13
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data14
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data15
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data16
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data17
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data18
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data19
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data2
│   ├── [4.0K]  Goprogramming.pdf
│   │   └── [1.4M]  BLOB
│   └── [ 46K]  META
├── [4.0K]  data20
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.5M]  BLOB
├── [4.0K]  data21
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB
├── [4.0K]  data22
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.3M]  BLOB
├── [4.0K]  data23
│   └── [4.0K]  Goprogramming.pdf
│       └── [1.4M]  BLOB

CLI parameters

the command-line parameters of ./examples/main.go are listed as below.

parameter(alias) description default
blockSize(bs) the block size in bytes 4096
mode(md) the mode of ec system, one of (encode, decode, update, scaling, recover)
dataNum(k) the number of data shards 12
parityNum(m) the number of parity shards(fault tolerance) 4
diskNum(dn) the number of disks (may be less than those listed in .hdr.disk.path) 4
filePath(f) upload: the local file path, download&update: the remote file basename
savePath the local save path (local path) file.save
newDataNum(new_k) the new number of data shards 32
newParityNum(new_m) the new number of parity shards 8
recoveredDiskPath(rDP) the data path for recovered disk, default to /tmp/restore /tmp/restore
override(o) whether to override former files or directories, default to false false
conWrites(cw) whether to enable concurrent write, default is false false
conReads(cr) whether to enable concurrent read, default is false false
failMode(fmd) simulate [diskFail] or [bitRot] mode" diskFail
failNum(fn) simulate multiple disk failure, provides the fail number of disks 0
conStripes(cs) how many stripes are allowed to encode/decode concurrently 100
quiet(q) whether or not to mute outputs in terminal false

Performance

Performance are testedin test files.

Contributions

Please fork and issue whenever you are in trouble with the project.

It's also applicable to email to durantthorvals@gmail.com.

Documentation

Overview

Package grasure is an Universal Erasure Coding Architecture in Go

For usage and examples, see https://github.com/DurantVivado/Grasure

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Erasure

type Erasure struct {
	// the number of data blocks in a stripe
	K int `json:"dataShards"`

	// the number of parity blocks in a stripe
	M int `json:"parityShards"`

	// the block size. default to 4KiB
	BlockSize int64 `json:"blockSize"`

	// the disk number, only the first diskNum disks are used in diskPathFile
	DiskNum int `json:"diskNum"`

	//FileMeta lists, indicating fileName, fileSize, fileHash, fileDist...
	FileMeta []*fileInfo `json:"fileLists"`

	//how many stripes are allowed to encode/decode concurrently
	ConStripes int `json:"-"`

	// the replication factor for config file
	ReplicateFactor int

	// configuration file path
	ConfigFile string `json:"-"`

	// the path of file recording all disks path
	DiskFilePath string `json:"-"`

	// whether or not to override former files or directories, default to false
	Override bool `json:"-"`

	//whether or not to mute outputs
	Quiet bool `json:"-"`
	// contains filtered or unexported fields
}

func (*Erasure) Destroy

func (e *Erasure) Destroy(mode string, failNum int, fileName string)

Destroy simulates disk failure or bitrot:

for `diskFail mode`, `failNum` random disks are marked as unavailable, `failName` is ignored.

for `bitRot`, `failNum` random blocks in a stripe of the file corrupts, that only works in Read Mode;

Since it's a simulation, no real data will be lost. Note that failNum = min(failNum, DiskNum).

func (*Erasure) EncodeFile

func (e *Erasure) EncodeFile(filename string) (*fileInfo, error)

EncodeFile takes filepath as input and encodes the file into data and parity blocks concurrently.

It returns `*fileInfo` and an error. Specify `blocksize` and `conStripe` for better performance.

Example

An intriguing example of how to encode a file into the system

package main

import (
	"fmt"
	"log"
	"math/rand"
	"os"

	grasure "github.com/DurantVivado/Grasure"
)

func fillRandom(p []byte) {
	for i := 0; i < len(p); i += 7 {
		val := rand.Int63()
		for j := 0; i+j < len(p) && j < 7; j++ {
			p[i+j] = byte(val)
			val >>= 8
		}
	}
}

func prepareDir(diskNum int) error {

	f, err := os.Create(".hdr.disks.path")
	if err != nil {
		return err
	}
	defer f.Close()

	for i := 0; i < diskNum; i++ {
		path := fmt.Sprintf("disk%d", i)
		if err := os.RemoveAll(path); err != nil {
			return err
		}
		if err := os.Mkdir(path, 0644); err != nil {
			return err
		}
		_, err := f.WriteString(path + "\n")
		if err != nil {
			return err
		}
	}
	return nil

}

func main() {
	// Create some sample data
	data := make([]byte, 250000)
	filepath := "example.file"
	fillRandom(data)
	// write it into a file
	f, err := os.OpenFile(filepath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0666)
	if err != nil {
		log.Fatal(err)
	}
	_, err = f.Write(data)
	if err != nil {
		log.Fatal(err)
	}
	f.Close()
	// define the struct Erasure
	erasure := &grasure.Erasure{
		DiskFilePath:    ".hdr.disks.path",
		ConfigFile:      "config.json",
		DiskNum:         10,
		K:               6,
		M:               3,
		BlockSize:       4096,
		ReplicateFactor: 3,
		ConStripes:      100,
		Override:        true,
	}
	err = prepareDir(13)
	if err != nil {
		log.Fatal(err)
	}
	//read the disk paths
	err = erasure.ReadDiskPath()
	if err != nil {
		log.Fatal(err)
	}
	//first init the system
	err = erasure.InitSystem(true)
	if err != nil {
		log.Fatal(err)
	}
	//read the config file (auto-generated)
	err = erasure.ReadConfig()
	if err != nil {
		log.Fatal(err)
	}
	//encode the file into system
	_, err = erasure.EncodeFile(filepath)
	if err != nil {
		log.Fatal(err)
	}
	//write the config
	err = erasure.WriteConfig()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("encode ok!")
}
Output:

Warning: you are intializing a new erasure-coded system, which means the previous data will also be reset.
System init!
 Erasure parameters: dataShards:6, parityShards:3,blocksize:4096,diskNum:10
encode ok!

func (*Erasure) InitSystem

func (e *Erasure) InitSystem(assume bool) error

Init initiates the erasure-coded system, this func can NOT be called concurrently. It will clear all the data on the storage, so a consulting procedure is added in advance of perilous action.

Note if `assume` renders yes then the consulting part will be skipped.

func (*Erasure) ReadConfig

func (e *Erasure) ReadConfig() error

ReadConfig reads the config file during system warm-up.

Calling it before actions like encode and read is a good habit.

func (*Erasure) ReadDiskPath

func (e *Erasure) ReadDiskPath() error

ReadDiskPath reads the disk paths from diskFilePath. There should be exactly ONE disk path at each line.

This func can NOT be called concurrently.

func (*Erasure) ReadFile

func (e *Erasure) ReadFile(filename string, savepath string, degrade bool) error

ReadFile reads ONE file on the system and save it to local `savePath`.

In case of any failure within fault tolerance, the file will be decoded first. `degrade` indicates whether degraded read is enabled.

func (*Erasure) Recover

func (e *Erasure) Recover() (map[string]string, error)

RecoverReadFull mainly deals with a disk-level disaster reconstruction. User should provide enough backup devices in `.hdr.disk.path` for data transferring.

An (oldPath -> replacedPath) replace map is returned in the first placeholder.

Example

A fabulous example on recovery of disks

package main

import (
	"fmt"
	"log"

	grasure "github.com/DurantVivado/Grasure"
)

func main() {
	erasure := &grasure.Erasure{
		DiskFilePath:    ".hdr.disks.path",
		ConfigFile:      "config.json",
		DiskNum:         10,
		K:               6,
		M:               3,
		BlockSize:       4096,
		ReplicateFactor: 3,
		ConStripes:      100,
		Override:        true,
	}
	//read the disk paths
	err := erasure.ReadDiskPath()
	if err != nil {
		log.Fatal(err)
	}
	err = erasure.ReadConfig()
	if err != nil {
		log.Fatal(err)
	}
	erasure.Destroy("diskFail", 2, "")
	_, err = erasure.Recover()
	if err != nil {
		log.Fatal(err)
	}
	err = erasure.WriteConfig()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("system recovered")
}
Output:

system recovered

func (*Erasure) RemoveFile

func (e *Erasure) RemoveFile(filename string) error

RemoveFile deletes specific file `filename`in the system.

Both the file blobs and meta data are deleted. It's currently irreversible.

Example

A curious example on removal of file, please encode the file into system first

package main

import (
	"fmt"
	"log"

	grasure "github.com/DurantVivado/Grasure"
)

func main() {
	filepath := "example.file"
	erasure := &grasure.Erasure{
		DiskFilePath:    ".hdr.disks.path",
		ConfigFile:      "config.json",
		DiskNum:         10,
		K:               6,
		M:               3,
		BlockSize:       4096,
		ReplicateFactor: 3,
		ConStripes:      100,
		Override:        true,
	}
	//read the disk paths
	err := erasure.ReadDiskPath()
	if err != nil {
		log.Fatal(err)
	}
	err = erasure.ReadConfig()
	if err != nil {
		log.Fatal(err)
	}
	err = erasure.RemoveFile(filepath)
	if err != nil {
		log.Fatal(err)
	}
	err = erasure.WriteConfig()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("file removed")
}
Output:

file removed

func (*Erasure) Scale added in v0.0.4

func (e *Erasure) Scale(new_k, new_m int) error

Scale expands the storage system to a new k and new m, for example, Start with a (2,1) system but with more data flouring into, the system needs to be scaled to a larger system, say (6,4).

One advantage is that a bigger k supports higher storage efficiency.

Another is that requirement of fault tolerance may level up when needed.

It unavoidably incurrs serious data migration. We are working to minimize the traffic.

func (*Erasure) Update added in v0.0.4

func (e *Erasure) Update(oldFile, newFile string) error

update a file according to a new file, the local `filename` will be used to update the file in the cloud with the same name

func (*Erasure) WriteConfig

func (e *Erasure) WriteConfig() error

WriteConfig writes the erasure parameters and file information list into config files.

Calling it after actions like encode and read is a good habit.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL