carbites

package module
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2023 License: Apache-2.0, MIT Imports: 16 Imported by: 16

README

carbites

Build Standard README Go Report Card

Chunking for CAR files. Split a single CAR into multiple CARs.

Install

go get github.com/alanshaw/go-carbites

Usage

Carbites supports 2 different strategies:

  1. Simple (default) - fast but naive, only the first CAR output has a root CID, subsequent CARs have a placeholder "empty" CID. The first CAR output has roots in the header, subsequent CARs have an empty root CID bafkqaaa as recommended.
  2. Treewalk - walks the DAG to pack sub-graphs into each CAR file that is output. Every CAR file has the same root CID but contains a different portion of the DAG. The DAG is traversed from the root node and each block is decoded and links extracted in order to determine which sub-graph to include in each CAR.
package main

import (
	"io"
	"os"
	"github.com/alanshaw/go-carbites"
)

func main() {
	bigCar, _ := os.Open("big.car")
	targetSize := 1024 * 1024 // 1MiB chunks
	strategy := carbites.Simple // also carbites.Treewalk
	spltr, _ := carbites.Split(bigCar, targetSize, strategy)

	var i int
	for {
		car, err := spltr.Next()
		if err != nil {
			if err == io.EOF {
				break
			}
			panic(err)
		}
		b, _ := ioutil.ReadAll(car)
		ioutil.WriteFile(fmt.Sprintf("chunk-%d.car", i), b, 0644)
		i++
	}
}

API

pkg.go.dev Reference

Contribute

Feel free to dive in! Open an issue or submit PRs.

License

Dual-licensed under MIT + Apache 2.0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Join added in v0.1.1

func Join(in []io.Reader, s Strategy) (io.Reader, error)

Join together multiple CAR files into a single CAR file.

func JoinSimple added in v0.1.1

func JoinSimple(in []io.Reader) (io.Reader, error)

Join together multiple CAR files that were split using the "simple" strategy into a single CAR file.

func JoinTreewalk added in v0.3.0

func JoinTreewalk(in []io.Reader) (io.Reader, error)

Join together multiple CAR files into a single CAR file using the "treewalk" strategy. Note that binary equality between the original CAR and the joined CAR is not guaranteed.

func NewCarMerger added in v0.3.0

func NewCarMerger(in []io.Reader) (io.Reader, error)

NewCarMerger creates a new CAR file (an io.Reader) that is a result of merging the passed CAR files. The resultant CAR has the combined roots of the passed CAR files and any duplicate blocks are removed.

Types

type BlockReader added in v0.4.0

type BlockReader interface {
	Get(context.Context, cid.Cid) (blocks.Block, error)
}

type SimpleSplitter added in v0.4.0

type SimpleSplitter struct {
	// contains filtered or unexported fields
}

func NewSimpleSplitter added in v0.4.0

func NewSimpleSplitter(in io.Reader, targetSize int) (*SimpleSplitter, error)

Create a new CAR file splitter to create multiple smaller CAR files using the "simple" strategy.

func (*SimpleSplitter) Next added in v0.4.0

func (spltr *SimpleSplitter) Next() (io.Reader, error)

type Splitter added in v0.4.0

type Splitter interface {
	// Next splits the next CAR file out from the input CAR file.
	Next() (io.Reader, error)
}

func Split

func Split(in io.Reader, targetSize int, s Strategy) (Splitter, error)

Split a CAR file and create multiple smaller CAR files.

type Strategy

type Strategy int

Strategy describes how CAR files should be split.

const (
	// Simple is fast but naive, only the first CAR output has a root CID,
	// subsequent CARs have a placeholder "empty" CID.
	Simple Strategy = iota
	// Treewalk walks the DAG to pack sub-graphs into each CAR file that is
	// output. Every CAR has the same root CID, but contains a different portion
	// of the DAG.
	Treewalk
)

type TreewalkSplitter added in v0.4.0

type TreewalkSplitter struct {
	// contains filtered or unexported fields
}

func NewTreewalkSplitter added in v0.4.0

func NewTreewalkSplitter(r io.Reader, targetSize int) (*TreewalkSplitter, error)

Split a CAR file and create multiple smaller CAR files using the "treewalk" strategy. Note: the entire CAR will be cached in memory. Use NewTreewalkSplitterFromPath or NewTreewalkSplitterFromBlockReader for non-memory bound splitting.

func NewTreewalkSplitterFromBlockReader added in v0.4.0

func NewTreewalkSplitterFromBlockReader(root cid.Cid, br BlockReader, targetSize int) (*TreewalkSplitter, error)

Split a CAR file (passed as a root CID and a block reader populated with the blocks from the CAR) and create multiple smaller CAR files using the "treewalk" strategy.

func NewTreewalkSplitterFromPath added in v0.4.0

func NewTreewalkSplitterFromPath(path string, targetSize int) (*TreewalkSplitter, error)

Split a CAR file found on disk at the given path and create multiple smaller CAR files using the "treewalk" strategy.

func (*TreewalkSplitter) Next added in v0.4.0

func (spltr *TreewalkSplitter) Next() (io.Reader, error)

Directories

Path Synopsis
cmd module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL