reedsolomon

package module
v1.9.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 13, 2021 License: MIT Imports: 9 Imported by: 428

README

Reed-Solomon

GoDoc Build Status

Reed-Solomon Erasure Coding in Go, with speeds exceeding 1GB/s/cpu core implemented in pure Go.

This is a Go port of the JavaReedSolomon library released by Backblaze, with some additional optimizations.

For an introduction on erasure coding, see the post on the Backblaze blog.

Package home: https://github.com/klauspost/reedsolomon

Godoc: https://pkg.go.dev/github.com/klauspost/reedsolomon?tab=doc

Installation

To get the package use the standard:

go get -u github.com/klauspost/reedsolomon

Using Go modules recommended.

Changes

May 2020

  • ARM64 optimizations, up to 2.5x faster.
  • Added WithFastOneParityMatrix for faster operation with 1 parity shard.
  • Much better performance when using a limited number of goroutines.
  • AVX512 is now using multiple cores.
  • Stream processing overhaul, big speedups in most cases.
  • AVX512 optimizations

March 6, 2019

The pure Go implementation is about 30% faster. Minor tweaks to assembler implementations.

February 8, 2019

AVX512 accelerated version added for Intel Skylake CPUs. This can give up to a 4x speed improvement as compared to AVX2. See here for more details.

December 18, 2018

Assembly code for ppc64le has been contributed, this boosts performance by about 10x on this platform.

November 18, 2017

Added WithAutoGoroutines which will attempt to calculate the optimal number of goroutines to use based on your expected shard size and detected CPU.

October 1, 2017

  • Cauchy Matrix is now an option. Thanks to templexxx for the basis of this.

  • Default maximum number of goroutines has been increased for better multi-core scaling.

  • After several requests the Reconstruct and ReconstructData now slices of zero length but sufficient capacity to be used instead of allocating new memory.

August 26, 2017

  • The Encoder() now contains an Update function contributed by chenzhongtao.

  • Frank Wessels kindly contributed ARM 64 bit assembly, which gives a huge performance boost on this platform.

July 20, 2017

ReconstructData added to Encoder interface. This can cause compatibility issues if you implement your own Encoder. A simple workaround can be added:

func (e *YourEnc) ReconstructData(shards [][]byte) error {
	return ReconstructData(shards)
}

You can of course also do your own implementation. The StreamEncoder handles this without modifying the interface. This is a good lesson on why returning interfaces is not a good design.

Usage

This section assumes you know the basics of Reed-Solomon encoding. A good start is this Backblaze blog post.

This package performs the calculation of the parity sets. The usage is therefore relatively simple.

First of all, you need to choose your distribution of data and parity shards. A 'good' distribution is very subjective, and will depend a lot on your usage scenario. A good starting point is above 5 and below 257 data shards (the maximum supported number), and the number of parity shards to be 2 or above, and below the number of data shards.

To create an encoder with 10 data shards (where your data goes) and 3 parity shards (calculated):

    enc, err := reedsolomon.New(10, 3)

This encoder will work for all parity sets with this distribution of data and parity shards. The error will only be set if you specify 0 or negative values in any of the parameters, or if you specify more than 256 data shards.

If you will primarily be using it with one shard size it is recommended to use WithAutoGoroutines(shardSize) as an additional parameter. This will attempt to calculate the optimal number of goroutines to use for the best speed. It is not required that all shards are this size.

The you send and receive data is a simple slice of byte slices; [][]byte. In the example above, the top slice must have a length of 13.

    data := make([][]byte, 13)

You should then fill the 10 first slices with equally sized data, and create parity shards that will be populated with parity data. In this case we create the data in memory, but you could for instance also use mmap to map files.

    // Create all shards, size them at 50000 each
    for i := range input {
      data[i] := make([]byte, 50000)
    }
    
    
  // Fill some data into the data shards
    for i, in := range data[:10] {
      for j:= range in {
         in[j] = byte((i+j)&0xff)
      }
    }

To populate the parity shards, you simply call Encode() with your data.

    err = enc.Encode(data)

The only cases where you should get an error is, if the data shards aren't of equal size. The last 3 shards now contain parity data. You can verify this by calling Verify():

    ok, err = enc.Verify(data)

The final (and important) part is to be able to reconstruct missing shards. For this to work, you need to know which parts of your data is missing. The encoder does not know which parts are invalid, so if data corruption is a likely scenario, you need to implement a hash check for each shard.

If a byte has changed in your set, and you don't know which it is, there is no way to reconstruct the data set.

To indicate missing data, you set the shard to nil before calling Reconstruct():

    // Delete two data shards
    data[3] = nil
    data[7] = nil
    
    // Reconstruct the missing shards
    err := enc.Reconstruct(data)

The missing data and parity shards will be recreated. If more than 3 shards are missing, the reconstruction will fail.

If you are only interested in the data shards (for reading purposes) you can call ReconstructData():

    // Delete two data shards
    data[3] = nil
    data[7] = nil
    
    // Reconstruct just the missing data shards
    err := enc.ReconstructData(data)

So to sum up reconstruction:

  • The number of data/parity shards must match the numbers used for encoding.
  • The order of shards must be the same as used when encoding.
  • You may only supply data you know is valid.
  • Invalid shards should be set to nil.

For complete examples of an encoder and decoder see the examples folder.

Splitting/Joining Data

You might have a large slice of data. To help you split this, there are some helper functions that can split and join a single byte slice.

   bigfile, _ := ioutil.Readfile("myfile.data")
   
   // Split the file
   split, err := enc.Split(bigfile)

This will split the file into the number of data shards set when creating the encoder and create empty parity shards.

An important thing to note is that you have to keep track of the exact input size. If the size of the input isn't divisible by the number of data shards, extra zeros will be inserted in the last shard.

To join a data set, use the Join() function, which will join the shards and write it to the io.Writer you supply:

   // Join a data set and write it to io.Discard.
   err = enc.Join(io.Discard, data, len(bigfile))

Streaming/Merging

It might seem like a limitation that all data should be in memory, but an important property is that as long as the number of data/parity shards are the same, you can merge/split data sets, and they will remain valid as a separate set.

    // Split the data set of 50000 elements into two of 25000
    splitA := make([][]byte, 13)
    splitB := make([][]byte, 13)
    
    // Merge into a 100000 element set
    merged := make([][]byte, 13)
    
    for i := range data {
      splitA[i] = data[i][:25000]
      splitB[i] = data[i][25000:]
      
      // Concatenate it to itself
	  merged[i] = append(make([]byte, 0, len(data[i])*2), data[i]...)
	  merged[i] = append(merged[i], data[i]...)
    }
    
    // Each part should still verify as ok.
    ok, err := enc.Verify(splitA)
    if ok && err == nil {
        log.Println("splitA ok")
    }
    
    ok, err = enc.Verify(splitB)
    if ok && err == nil {
        log.Println("splitB ok")
    }
    
    ok, err = enc.Verify(merge)
    if ok && err == nil {
        log.Println("merge ok")
    }

This means that if you have a data set that may not fit into memory, you can split processing into smaller blocks. For the best throughput, don't use too small blocks.

This also means that you can divide big input up into smaller blocks, and do reconstruction on parts of your data. This doesn't give the same flexibility of a higher number of data shards, but it will be much more performant.

Streaming API

There has been added support for a streaming API, to help perform fully streaming operations, which enables you to do the same operations, but on streams. To use the stream API, use NewStream function to create the encoding/decoding interfaces.

You can use WithConcurrentStreams to ready an interface that reads/writes concurrently from the streams.

You can specify the size of each operation using WithStreamBlockSize. This will set the size of each read/write operation.

Input is delivered as []io.Reader, output as []io.Writer, and functionality corresponds to the in-memory API. Each stream must supply the same amount of data, similar to how each slice must be similar size with the in-memory API. If an error occurs in relation to a stream, a StreamReadError or StreamWriteError will help you determine which stream was the offender.

There is no buffering or timeouts/retry specified. If you want to add that, you need to add it to the Reader/Writer.

For complete examples of a streaming encoder and decoder see the examples folder.

Advanced Options

You can modify internal options which affects how jobs are split between and processed by goroutines.

To create options, use the WithXXX functions. You can supply options to New, NewStream. If no Options are supplied, default options are used.

Example of how to supply options:

    enc, err := reedsolomon.New(10, 3, WithMaxGoroutines(25))

Performance

Performance depends mainly on the number of parity shards. In rough terms, doubling the number of parity shards will double the encoding time.

Here are the throughput numbers with some different selections of data and parity shards. For reference each shard is 1MB random data, and 16 CPU cores are used for encoding.

Data Parity Go MB/s SSSE3 MB/s AVX2 MB/s
5 2 14287 66355 108755
8 8 5569 34298 70516
10 4 6766 48237 93875
50 20 1540 12130 22090

The throughput numbers here is the size of the encoded data and parity shards.

If runtime.GOMAXPROCS() is set to a value higher than 1, the encoder will use multiple goroutines to perform the calculations in Verify, Encode and Reconstruct.

Example of performance scaling on AMD Ryzen 3950X - 16 physical cores, 32 logical cores, AVX 2. The example uses 10 blocks with 1MB data each and 4 parity blocks.

Threads Speed
1 9979 MB/s
2 18870 MB/s
4 33697 MB/s
8 51531 MB/s
16 59204 MB/s

Benchmarking Reconstruct() followed by a Verify() (=all) versus just calling ReconstructData() (=data) gives the following result:

benchmark                            all MB/s     data MB/s    speedup
BenchmarkReconstruct10x2x10000-8     2011.67      10530.10     5.23x
BenchmarkReconstruct50x5x50000-8     4585.41      14301.60     3.12x
BenchmarkReconstruct10x2x1M-8        8081.15      28216.41     3.49x
BenchmarkReconstruct5x2x1M-8         5780.07      28015.37     4.85x
BenchmarkReconstruct10x4x1M-8        4352.56      14367.61     3.30x
BenchmarkReconstruct50x20x1M-8       1364.35      4189.79      3.07x
BenchmarkReconstruct10x4x16M-8       1484.35      5779.53      3.89x

Performance on AVX512

The performance on AVX512 has been accelerated for Intel CPUs. This gives speedups on a per-core basis typically up to 2x compared to AVX2 as can be seen in the following table:

[...]

This speedup has been achieved by computing multiple parity blocks in parallel as opposed to one after the other. In doing so it is possible to minimize the memory bandwidth required for loading all data shards. At the same time the calculations are performed in the 512-bit wide ZMM registers and the surplus of ZMM registers (32 in total) is used to keep more data around (most notably the matrix coefficients).

Performance on ARM64 NEON

By exploiting NEON instructions the performance for ARM has been accelerated. Below are the performance numbers for a single core on an EC2 m6g.16xlarge (Graviton2) instance (Amazon Linux 2):

BenchmarkGalois128K-64        119562     10028 ns/op        13070.78 MB/s
BenchmarkGalois1M-64           14380     83424 ns/op        12569.22 MB/s
BenchmarkGaloisXor128K-64      96508     12432 ns/op        10543.29 MB/s
BenchmarkGaloisXor1M-64        10000    100322 ns/op        10452.13 MB/s

Performance on ppc64le

The performance for ppc64le has been accelerated. This gives roughly a 10x performance improvement on this architecture as can been seen below:

benchmark                      old MB/s     new MB/s     speedup
BenchmarkGalois128K-160        948.87       8878.85      9.36x
BenchmarkGalois1M-160          968.85       9041.92      9.33x
BenchmarkGaloisXor128K-160     862.02       7905.00      9.17x
BenchmarkGaloisXor1M-160       784.60       6296.65      8.03x

asm2plan9s

asm2plan9s is used for assembling the AVX2 instructions into their BYTE/WORD/LONG equivalents.

License

This code, as the original JavaReedSolomon is published under an MIT license. See LICENSE file for more information.

Documentation

Overview

Package reedsolomon enables Erasure Coding in Go

For usage and examples, see https://github.com/klauspost/reedsolomon

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrInvShardNum = errors.New("cannot create Encoder with zero or less data/parity shards")

ErrInvShardNum will be returned by New, if you attempt to create an Encoder where either data or parity shards is zero or less.

View Source
var ErrInvalidInput = errors.New("invalid input")

ErrInvalidInput is returned if invalid input parameter of Update.

View Source
var ErrMaxShardNum = errors.New("cannot create Encoder with more than 256 data+parity shards")

ErrMaxShardNum will be returned by New, if you attempt to create an Encoder where data and parity shards are bigger than the order of GF(2^8).

View Source
var ErrReconstructMismatch = errors.New("valid shards and fill shards are mutually exclusive")

ErrReconstructMismatch is returned by the StreamEncoder, if you supply "valid" and "fill" streams on the same index. Therefore it is impossible to see if you consider the shard valid or would like to have it reconstructed.

View Source
var ErrReconstructRequired = errors.New("reconstruction required as one or more required data shards are nil")

ErrReconstructRequired is returned if too few data shards are intact and a reconstruction is required before you can successfully join the shards.

View Source
var ErrShardNoData = errors.New("no shard data")

ErrShardNoData will be returned if there are no shards, or if the length of all shards is zero.

View Source
var ErrShardSize = errors.New("shard sizes do not match")

ErrShardSize is returned if shard length isn't the same for all shards.

View Source
var ErrShortData = errors.New("not enough data to fill the number of requested shards")

ErrShortData will be returned by Split(), if there isn't enough data to fill the number of shards.

View Source
var ErrTooFewShards = errors.New("too few shards given")

ErrTooFewShards is returned if too few shards where given to Encode/Verify/Reconstruct/Update. It will also be returned from Reconstruct if there were too few shards to reconstruct the missing data.

Functions

This section is empty.

Types

type Encoder

type Encoder interface {
	// Encode parity for a set of data shards.
	// Input is 'shards' containing data shards followed by parity shards.
	// The number of shards must match the number given to New().
	// Each shard is a byte array, and they must all be the same size.
	// The parity shards will always be overwritten and the data shards
	// will remain the same, so it is safe for you to read from the
	// data shards while this is running.
	Encode(shards [][]byte) error

	// Verify returns true if the parity shards contain correct data.
	// The data is the same format as Encode. No data is modified, so
	// you are allowed to read from data while this is running.
	Verify(shards [][]byte) (bool, error)

	// Reconstruct will recreate the missing shards if possible.
	//
	// Given a list of shards, some of which contain data, fills in the
	// ones that don't have data.
	//
	// The length of the array must be equal to the total number of shards.
	// You indicate that a shard is missing by setting it to nil or zero-length.
	// If a shard is zero-length but has sufficient capacity, that memory will
	// be used, otherwise a new []byte will be allocated.
	//
	// If there are too few shards to reconstruct the missing
	// ones, ErrTooFewShards will be returned.
	//
	// The reconstructed shard set is complete, but integrity is not verified.
	// Use the Verify function to check if data set is ok.
	Reconstruct(shards [][]byte) error

	// ReconstructData will recreate any missing data shards, if possible.
	//
	// Given a list of shards, some of which contain data, fills in the
	// data shards that don't have data.
	//
	// The length of the array must be equal to Shards.
	// You indicate that a shard is missing by setting it to nil or zero-length.
	// If a shard is zero-length but has sufficient capacity, that memory will
	// be used, otherwise a new []byte will be allocated.
	//
	// If there are too few shards to reconstruct the missing
	// ones, ErrTooFewShards will be returned.
	//
	// As the reconstructed shard set may contain missing parity shards,
	// calling the Verify function is likely to fail.
	ReconstructData(shards [][]byte) error

	// Update parity is use for change a few data shards and update it's parity.
	// Input 'newDatashards' containing data shards changed.
	// Input 'shards' containing old data shards (if data shard not changed, it can be nil) and old parity shards.
	// new parity shards will in shards[DataShards:]
	// Update is very useful if  DataShards much larger than ParityShards and changed data shards is few. It will
	// faster than Encode and not need read all data shards to encode.
	Update(shards [][]byte, newDatashards [][]byte) error

	// Split a data slice into the number of shards given to the encoder,
	// and create empty parity shards.
	//
	// The data will be split into equally sized shards.
	// If the data size isn't dividable by the number of shards,
	// the last shard will contain extra zeros.
	//
	// There must be at least 1 byte otherwise ErrShortData will be
	// returned.
	//
	// The data will not be copied, except for the last shard, so you
	// should not modify the data of the input slice afterwards.
	Split(data []byte) ([][]byte, error)

	// Join the shards and write the data segment to dst.
	//
	// Only the data shards are considered.
	// You must supply the exact output size you want.
	// If there are to few shards given, ErrTooFewShards will be returned.
	// If the total data size is less than outSize, ErrShortData will be returned.
	Join(dst io.Writer, shards [][]byte, outSize int) error
}

Encoder is an interface to encode Reed-Salomon parity sets for your data.

Example

Simple example of how to use all functions of the Encoder. Note that all error checks have been removed to keep it short.

package main

import (
	"fmt"
	"math/rand"

	"github.com/klauspost/reedsolomon"
)

func fillRandom(p []byte) {
	for i := 0; i < len(p); i += 7 {
		val := rand.Int63()
		for j := 0; i+j < len(p) && j < 7; j++ {
			p[i+j] = byte(val)
			val >>= 8
		}
	}
}

func main() {
	// Create some sample data
	var data = make([]byte, 250000)
	fillRandom(data)

	// Create an encoder with 17 data and 3 parity slices.
	enc, _ := reedsolomon.New(17, 3)

	// Split the data into shards
	shards, _ := enc.Split(data)

	// Encode the parity set
	_ = enc.Encode(shards)

	// Verify the parity set
	ok, _ := enc.Verify(shards)
	if ok {
		fmt.Println("ok")
	}

	// Delete two shards
	shards[10], shards[11] = nil, nil

	// Reconstruct the shards
	_ = enc.Reconstruct(shards)

	// Verify the data set
	ok, _ = enc.Verify(shards)
	if ok {
		fmt.Println("ok")
	}
}
Output:

ok
ok
Example (Slicing)

This demonstrates that shards can be arbitrary sliced and merged and still remain valid.

package main

import (
	"fmt"
	"math/rand"

	"github.com/klauspost/reedsolomon"
)

func fillRandom(p []byte) {
	for i := 0; i < len(p); i += 7 {
		val := rand.Int63()
		for j := 0; i+j < len(p) && j < 7; j++ {
			p[i+j] = byte(val)
			val >>= 8
		}
	}
}

func main() {
	// Create some sample data
	var data = make([]byte, 250000)
	fillRandom(data)

	// Create 5 data slices of 50000 elements each
	enc, _ := reedsolomon.New(5, 3)
	shards, _ := enc.Split(data)
	err := enc.Encode(shards)
	if err != nil {
		panic(err)
	}

	// Check that it verifies
	ok, err := enc.Verify(shards)
	if ok && err == nil {
		fmt.Println("encode ok")
	}

	// Split the data set of 50000 elements into two of 25000
	splitA := make([][]byte, 8)
	splitB := make([][]byte, 8)

	// Merge into a 100000 element set
	merged := make([][]byte, 8)

	// Split/merge the shards
	for i := range shards {
		splitA[i] = shards[i][:25000]
		splitB[i] = shards[i][25000:]

		// Concencate it to itself
		merged[i] = append(make([]byte, 0, len(shards[i])*2), shards[i]...)
		merged[i] = append(merged[i], shards[i]...)
	}

	// Each part should still verify as ok.
	ok, err = enc.Verify(shards)
	if ok && err == nil {
		fmt.Println("splitA ok")
	}

	ok, err = enc.Verify(splitB)
	if ok && err == nil {
		fmt.Println("splitB ok")
	}

	ok, err = enc.Verify(merged)
	if ok && err == nil {
		fmt.Println("merge ok")
	}
}
Output:

encode ok
splitA ok
splitB ok
merge ok
Example (Xor)

This demonstrates that shards can xor'ed and still remain a valid set.

The xor value must be the same for element 'n' in each shard, except if you xor with a similar sized encoded shard set.

package main

import (
	"fmt"
	"math/rand"

	"github.com/klauspost/reedsolomon"
)

func fillRandom(p []byte) {
	for i := 0; i < len(p); i += 7 {
		val := rand.Int63()
		for j := 0; i+j < len(p) && j < 7; j++ {
			p[i+j] = byte(val)
			val >>= 8
		}
	}
}

func main() {
	// Create some sample data
	var data = make([]byte, 25000)
	fillRandom(data)

	// Create 5 data slices of 5000 elements each
	enc, _ := reedsolomon.New(5, 3)
	shards, _ := enc.Split(data)
	err := enc.Encode(shards)
	if err != nil {
		panic(err)
	}

	// Check that it verifies
	ok, err := enc.Verify(shards)
	if !ok || err != nil {
		fmt.Println("falied initial verify", err)
	}

	// Create an xor'ed set
	xored := make([][]byte, 8)

	// We xor by the index, so you can see that the xor can change,
	// It should however be constant vertically through your slices.
	for i := range shards {
		xored[i] = make([]byte, len(shards[i]))
		for j := range xored[i] {
			xored[i][j] = shards[i][j] ^ byte(j&0xff)
		}
	}

	// Each part should still verify as ok.
	ok, err = enc.Verify(xored)
	if ok && err == nil {
		fmt.Println("verified ok after xor")
	}
}
Output:

verified ok after xor

func New

func New(dataShards, parityShards int, opts ...Option) (Encoder, error)

New creates a new encoder and initializes it to the number of data shards and parity shards that you want to use. You can reuse this encoder. Note that the maximum number of total shards is 256. If no options are supplied, default options are used.

type Option

type Option func(*options)

Option allows to override processing parameters.

func WithAutoGoroutines

func WithAutoGoroutines(shardSize int) Option

WithAutoGoroutines will adjust the number of goroutines for optimal speed with a specific shard size. Send in the shard size you expect to send. Other shard sizes will work, but may not run at the optimal speed. Overwrites WithMaxGoroutines. If shardSize <= 0, it is ignored.

func WithCauchyMatrix

func WithCauchyMatrix() Option

WithCauchyMatrix will make the encoder build a Cauchy style matrix. The output of this is not compatible with the standard output. A Cauchy matrix is faster to generate. This does not affect data throughput, but will result in slightly faster start-up time.

func WithConcurrentStreamReads added in v1.9.7

func WithConcurrentStreamReads(enabled bool) Option

WithConcurrentStreamReads will enable concurrent reads from the input streams. Default: Disabled, meaning only one stream will be read at the time. Ignored if not used on a stream input.

func WithConcurrentStreamWrites added in v1.9.7

func WithConcurrentStreamWrites(enabled bool) Option

WithConcurrentStreamWrites will enable concurrent writes to the the output streams. Default: Disabled, meaning only one stream will be written at the time. Ignored if not used on a stream input.

func WithConcurrentStreams added in v1.9.7

func WithConcurrentStreams(enabled bool) Option

WithConcurrentStreams will enable concurrent reads and writes on the streams. Default: Disabled, meaning only one stream will be read/written at the time. Ignored if not used on a stream input.

func WithFastOneParityMatrix added in v1.9.8

func WithFastOneParityMatrix() Option

WithFastOneParityMatrix will switch the matrix to a simple xor if there is only one parity shard. The PAR1 matrix already has this property so it has little effect there.

func WithInversionCache added in v1.9.11

func WithInversionCache(enabled bool) Option

WithInversionCache allows to control the inversion cache. This will cache reconstruction matrices so they can be reused. Enabled by default.

func WithMaxGoroutines

func WithMaxGoroutines(n int) Option

WithMaxGoroutines is the maximum number of goroutines number for encoding & decoding. Jobs will be split into this many parts, unless each goroutine would have to process less than minSplitSize bytes (set with WithMinSplitSize). For the best speed, keep this well above the GOMAXPROCS number for more fine grained scheduling. If n <= 0, it is ignored.

func WithMinSplitSize

func WithMinSplitSize(n int) Option

WithMinSplitSize is the minimum encoding size in bytes per goroutine. By default this parameter is determined by CPU cache characteristics. See WithMaxGoroutines on how jobs are split. If n <= 0, it is ignored.

func WithPAR1Matrix

func WithPAR1Matrix() Option

WithPAR1Matrix causes the encoder to build the matrix how PARv1 does. Note that the method they use is buggy, and may lead to cases where recovery is impossible, even if there are enough parity shards.

func WithStreamBlockSize added in v1.9.7

func WithStreamBlockSize(n int) Option

WithStreamBlockSize allows to set a custom block size per round of reads/writes. If not set, any shard size set with WithAutoGoroutines will be used. If WithAutoGoroutines is also unset, 4MB will be used. Ignored if not used on stream.

type StreamEncoder

type StreamEncoder interface {
	// Encode parity shards for a set of data shards.
	//
	// Input is 'shards' containing readers for data shards followed by parity shards
	// io.Writer.
	//
	// The number of shards must match the number given to NewStream().
	//
	// Each reader must supply the same number of bytes.
	//
	// The parity shards will be written to the writer.
	// The number of bytes written will match the input size.
	//
	// If a data stream returns an error, a StreamReadError type error
	// will be returned. If a parity writer returns an error, a
	// StreamWriteError will be returned.
	Encode(data []io.Reader, parity []io.Writer) error

	// Verify returns true if the parity shards contain correct data.
	//
	// The number of shards must match the number total data+parity shards
	// given to NewStream().
	//
	// Each reader must supply the same number of bytes.
	// If a shard stream returns an error, a StreamReadError type error
	// will be returned.
	Verify(shards []io.Reader) (bool, error)

	// Reconstruct will recreate the missing shards if possible.
	//
	// Given a list of valid shards (to read) and invalid shards (to write)
	//
	// You indicate that a shard is missing by setting it to nil in the 'valid'
	// slice and at the same time setting a non-nil writer in "fill".
	// An index cannot contain both non-nil 'valid' and 'fill' entry.
	// If both are provided 'ErrReconstructMismatch' is returned.
	//
	// If there are too few shards to reconstruct the missing
	// ones, ErrTooFewShards will be returned.
	//
	// The reconstructed shard set is complete, but integrity is not verified.
	// Use the Verify function to check if data set is ok.
	Reconstruct(valid []io.Reader, fill []io.Writer) error

	// Split a an input stream into the number of shards given to the encoder.
	//
	// The data will be split into equally sized shards.
	// If the data size isn't dividable by the number of shards,
	// the last shard will contain extra zeros.
	//
	// You must supply the total size of your input.
	// 'ErrShortData' will be returned if it is unable to retrieve the
	// number of bytes indicated.
	Split(data io.Reader, dst []io.Writer, size int64) (err error)

	// Join the shards and write the data segment to dst.
	//
	// Only the data shards are considered.
	//
	// You must supply the exact output size you want.
	// If there are to few shards given, ErrTooFewShards will be returned.
	// If the total data size is less than outSize, ErrShortData will be returned.
	Join(dst io.Writer, shards []io.Reader, outSize int64) error
}

StreamEncoder is an interface to encode Reed-Salomon parity sets for your data. It provides a fully streaming interface, and processes data in blocks of up to 4MB.

For small shard sizes, 10MB and below, it is recommended to use the in-memory interface, since the streaming interface has a start up overhead.

For all operations, no readers and writers should not assume any order/size of individual reads/writes.

For usage examples, see "stream-encoder.go" and "streamdecoder.go" in the examples folder.

Example

This will show a simple stream encoder where we encode from a []io.Reader which contain a reader for each shard.

Input and output can be exchanged with files, network streams or what may suit your needs.

package main

import (
	"bytes"
	"fmt"
	"io"
	"io/ioutil"
	"log"
	"math/rand"

	"github.com/klauspost/reedsolomon"
)

func fillRandom(p []byte) {
	for i := 0; i < len(p); i += 7 {
		val := rand.Int63()
		for j := 0; i+j < len(p) && j < 7; j++ {
			p[i+j] = byte(val)
			val >>= 8
		}
	}
}

func main() {
	dataShards := 5
	parityShards := 2

	// Create a StreamEncoder with the number of data and
	// parity shards.
	rs, err := reedsolomon.NewStream(dataShards, parityShards)
	if err != nil {
		log.Fatal(err)
	}

	shardSize := 50000

	// Create input data shards.
	input := make([][]byte, dataShards)
	for s := range input {
		input[s] = make([]byte, shardSize)
		fillRandom(input[s])
	}

	// Convert our buffers to io.Readers
	readers := make([]io.Reader, dataShards)
	for i := range readers {
		readers[i] = io.Reader(bytes.NewBuffer(input[i]))
	}

	// Create our output io.Writers
	out := make([]io.Writer, parityShards)
	for i := range out {
		out[i] = ioutil.Discard
	}

	// Encode from input to output.
	err = rs.Encode(readers, out)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println("ok")
}
Output:

ok

func NewStream

func NewStream(dataShards, parityShards int, o ...Option) (StreamEncoder, error)

NewStream creates a new encoder and initializes it to the number of data shards and parity shards that you want to use. You can reuse this encoder. Note that the maximum number of data shards is 256.

func NewStreamC

func NewStreamC(dataShards, parityShards int, conReads, conWrites bool, o ...Option) (StreamEncoder, error)

NewStreamC creates a new encoder and initializes it to the number of data shards and parity shards given.

This functions as 'NewStream', but allows you to enable CONCURRENT reads and writes.

type StreamReadError

type StreamReadError struct {
	Err    error // The error
	Stream int   // The stream number on which the error occurred
}

StreamReadError is returned when a read error is encountered that relates to a supplied stream. This will allow you to find out which reader has failed.

func (StreamReadError) Error

func (s StreamReadError) Error() string

Error returns the error as a string

func (StreamReadError) String

func (s StreamReadError) String() string

String returns the error as a string

type StreamWriteError

type StreamWriteError struct {
	Err    error // The error
	Stream int   // The stream number on which the error occurred
}

StreamWriteError is returned when a write error is encountered that relates to a supplied stream. This will allow you to find out which reader has failed.

func (StreamWriteError) Error

func (s StreamWriteError) Error() string

Error returns the error as a string

func (StreamWriteError) String

func (s StreamWriteError) String() string

String returns the error as a string

Directories

Path Synopsis
_gen module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL