Redirected from github.com/openacid/erasurecode.

xrs

package module

v0.0.1 Latest Latest Go to latest Published: Dec 17, 2019 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/templexxx/xrs

Links

Open Source Insights

README ¶

X-Reed-Solomon

Introduction:

Fast and efficient data reconstruction Erasure Code engine in pure Go.

Systematic Codes with MDS property.

More than 10GB/s per physics core.

Saving about 30% I/O in reconstruction.

Has been used for a distributed storage system with more than 10PB data.

Based on papers:

<A “Hitchhiker’s” Guide to Fast and Efﬁcient Data Reconstruction in Erasure-coded Data Centers>

<A Piggybacking Design Framework for Read-and Download-efﬁcient Distributed Storage Codes>

Getting Started

Make sure you have read the papers.

XRS splits row vector into two equal parts.

e.g. 10+4:

+---------+
| a1 | b1 |
+---------+
| a2 | b2 |
+---------+
| a3 | b3 |
+---------+
    ...
+---------+
| a10| b10|
+---------+
| a11| b11|
+---------+
| a12| b12|
+---------+
| a13| b13|
+---------+

So it's important to choose a fit size for reading/write disks efficiently.

APIs are almost as same as normal Reed-Solomon Erasure Codes.

Performance

Performance depends mainly on:

CPU instruction extension.

Number of data/parity row vectors.

Platform:

MacBook Pro 15-inch, 2017 (Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz)

All test run on a single Core.

RS means Reed-Solomon Codes(for comparing), the RS lib is here

Encode:

I/O = (data + parity) * vector_size / cost

Base means no SIMD.

Data	Parity	Vector size	RS I/O (MB/S)	XRS I/O (MB/S)
12	4	4KB	12658.00	10895.15
12	4	1MB	8989.67	7530.84
12	4	8MB	8509.06	6579.53

Reconstruct:

Need Data = Data size need read in reconstruction

I/O = (need_data + reconstruct_data_num * vector_size) / cost

Data	Parity	Vector size	Reconstruct Data Num	RS Need Data	XRS Need Data	RS Cost	XRS Cost	RS I/O (MB/S)	XRS I/O (MB/S)
12	4	4KB	1	48KB	34KB	2140 ns/op	3567 ns/op	24885.17	10334.99
12	4	4KB	2	48KB	48KB	3395 ns/op	5940 ns/op	16890.41	9654.17
12	4	4KB	3	48KB	48KB	4746 ns/op	7525 ns/op	12945.61	8164.76
12	4	4KB	4	48KB	48KB	5958 ns/op	8851 ns/op	10999.75	7404.41

Update:

I/O = (2 + parity_num + parity_num) * vector_size / cost

Data	Parity	Vector size	RS I/O (MB/S)	XRS I/O (MB/S)
12	4	4KB	32739.22	26312.14

Replace:

I/O = (parity_num + parity_num + replace_data_num) * vector_size / cost

Data	Parity	Vector size	Replace Data Num	RS I/O (MB/S)	XRS I/O (MB/S)
12	4	4KB	1	63908.06	44082.57
12	4	4KB	2	39966.65	26554.30
12	4	4KB	3	30007.81	19583.16
12	4	4KB	4	25138.38	16636.82
12	4	4KB	5	21261.91	14301.15
12	4	4KB	6	19833.14	13121.98
12	4	4KB	7	18395.47	12028.10
12	4	4KB	8	17364.02	11300.55

PS:

And we must know the benchmark test is quite different with encoding/decoding in practice. Because in benchmark test loops, the CPU Cache may help a lot.

Links & Deps

Documentation ¶

Index ¶

type XRS
- func New(dataNum, parityNum int) (x *XRS, err error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type XRS ¶

type XRS struct {
	// RS is the backend of XRS>
	RS *rs.RS
	// XORSet shows how XRS combines sub-vectors by xor.
	//
	// Key: Parity index(except first parity).
	// Value: Data indexes.
	XORSet map[int][]int
}

XRS X-Reed-Solomon Codes receiver.

func New ¶

func New(dataNum, parityNum int) (x *XRS, err error)

New create an XRS with specific data & parity numbers.

Warn: parityNum can't be 1.

func (*XRS) Encode ¶

func (x *XRS) Encode(vects [][]byte) (err error)

Encode encodes data for generating parity. Write parity vectors into vects[r.DataNum:].

func (*XRS) GetNeedVects ¶

func (x *XRS) GetNeedVects(needReconst int) (aNeed, bNeed []int, err error)

GetNeedVects receives needReconst index(it must be data vector) returns a_vectors' indexes and b_parity_vectors' indexes for reconstructing needReconst. It's used for ReconstOne to read correct vectors for saving I/O.

bNeed always has two elements, the first one is DataNum.

func (*XRS) Reconst ¶

func (x *XRS) Reconst(vects [][]byte, dpHas, needReconst []int) (err error)

Reconst reconstructs missing vectors, vects: All vectors, len(vects) = dataNum + parityNum. dpHas: Survived data & parity index, need dataNum indexes at least. needReconst: Vectors indexes which need to be reconstructed.

Warn: If there is only one needReconst, it will call ReconstOne, so make sure you have correct data, if there is only one vectors need to repair.

e.g: in 3+2, the whole index: [0,1,2,3,4], if vects[0,4] are lost & they need to be reconstructed (Maybe you only need vects[0], so the needReconst should be [0], but not [0,4]). the "dpHas" will be [1,2,3] ,and you must be sure that vects[1] vects[2] vects[3] have correct data, results will be written into vects[0]&vects[4] directly.

func (*XRS) ReconstOne ¶

func (x *XRS) ReconstOne(vects [][]byte, needReconst int) (err error)

ReconstOne reconstruct one data vector, it saves I/O. Make sure you have some specific vectors data. ( you can get the vectors' indexes from GetNeedVects)

func (*XRS) Replace ¶

func (x *XRS) Replace(data [][]byte, replaceRows []int, parity [][]byte) (err error)

Replace replaces oldData vectors with 0 or replaces 0 with newData vectors.

In practice, If len(replaceRows) > dataNum-parityNum, it's better to use Encode, because Replace need to read len(replaceRows) + parityNum vectors, if replaceRows are too many, the cost maybe larger than Encode (Encode only need read dataNum). Think about an EC compute node, and dataNum+parityNum data nodes model.

It's used in two situations: 1. We didn't have enough data for filling in a stripe, but still did ec encode, we need replace several zero vectors with new vectors which have data after we get enough data finally. 2. After compact, we may have several useless vectors in a stripe, we need replaces these useless vectors with zero vectors for free space.

Warn: data's index & replaceRows must has the same sort.

func (*XRS) Update ¶

func (x *XRS) Update(oldData, newData []byte, row int, parity [][]byte) (err error)

Update updates parity_data when one data_vect changes. row: It's the new data's index in the whole vectors.

Source Files ¶

View all Source files

xrs.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL