ranger

package module
v0.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 7, 2023 License: MIT Imports: 7 Imported by: 2

README

RANGER

Go Reference

Download large files in parallel chunks in Go.

Why?

Current Go HTTP clients download files as a stream of bytes, usually with a buffer. If you consider every file an array of bytes, this means that when you initiate a download a connection is opened, and you receive an io.Reader. As you Read bytes off this Reader, more bytes are loaded up into an internal buffer (an in-memory byte array that stores a certain amount of data in the expectation that you'll read it soon). As you keep reading data, the HTTP client will fill the buffer up as fast as it can from the server.

So? Why is this a problem?

Most of the time this is what we want and need. But when we're downloading large files (say from Amazon S3 or CloudFront, or any other file-server) this is usually not optimal. These services have per-connection speed limits on the bytes going out, and if you're downloading a very large file (say over 25 GB) you're also not likely to be using the caches. This means that the number of bytes coming in per second (bandwidth) is usually lower than what your connection actually supports.

What does Ranger do?

Ranger does two orthogonal things to speed up transfers — one, it downloads files in chunks: so if there are 1000 bytes, for example, it can download the file in chunks of 100 bytes, by requesting byte range 0-99, 100-199, 200-299 and so on using an HTTP RANGE GET. This allows the service to cache each chunk, because even if the total file size is too large to cache, each chunk is still small enough to fit in. See the CloudFront Developer Guide for more information.

Two, it downloads upcoming chunks in parallel, so if the parallelism count is set at 3, in the example above it would download byte ranges 0-99, 100-199 and 200-299 in parallel, even while the first range is being Read. It will also start downloading the fourth range after the first one is read, and so on. This allows trading RAM for speed - deciding to dedicate 3 x 100 bytes of memory allows downloads to go on that much faster. In practice, 8MB to 16MB is a good chunk size, especially if that lines up with the multipart upload boundaries in a system like S3. See the S3 Developer Guide for more information.

Usage & Integration

The lowest-level usage is to create a new Ranger with chunk size and parallelism, and a fetcher function passed in. When the Ranger is invoked with a file length, it calls the fetch function with the byte range, in parallel, and collects and orders the resulting chunk Readers. This is a low level API that you can use if you have a custom protocol to fetch data.

For regular use, RangingHTTPClient uses a given http.Client to fetch chunks as configured. RangingHTTPClient also exposes a standard http.Client via the RangingHTTPClient.StandardClient method. The returned client will fetch chunk ranges using the RangingHTTPClient.

This means that Ranger integrates well on both sides - Grab and other download managers can use a ranging client via a standard http.Client, while wrapping other HTTPClients that provide automatic retry, etc like Heimdall or go-retryablehttp.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrNoOverlap is returned by ParseRange if first-byte-pos of
	// all of the byte-range-spec values is greater than the content size.
	ErrNoOverlap = errors.New("invalid range: failed to overlap")

	// ErrInvalid is returned by ParseRange on invalid input.
	ErrInvalid = errors.New("invalid range")
)

Functions

func NewSeqRangingClient added in v0.6.0

func NewSeqRangingClient(ranger Ranger, client *http.Client) http.RoundTripper

func NewSeqReader added in v0.6.0

func NewSeqReader(client *http.Client, url string, ranger SizedRanger) io.ReadSeekCloser

NewSeqReader returns a new io.ReadSeekCloser that reads from the given url using the given client. Instead of reading the whole file at once, it reads the file in sequential chunks, using the given ranger to determine the ranges to read. This allows for reading very large files in CDN-cacheable chunks using RANGE GETs.

Types

type ByteRange

type ByteRange struct {
	From int64
	To   int64
}

ByteRange represents a range of bytes available in a file

func (ByteRange) Contains added in v0.2.0

func (br ByteRange) Contains(offset int64) bool

func (ByteRange) Floor added in v0.6.0

func (br ByteRange) Floor(offset int64) ByteRange

func (ByteRange) Length

func (br ByteRange) Length() int64

Length returns the length of the byte range.

func (ByteRange) RangeHeader added in v0.2.0

func (br ByteRange) RangeHeader() string

RangeHeader returns the HTTP header representation of the byte range, suitable for use in the Range header, as described in https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range

func (ByteRange) Request added in v0.6.0

func (br ByteRange) Request(url string) (req *http.Request, err error)

type Range added in v0.7.1

type Range struct {
	Start  int64
	Length int64
}

Range specifies the byte range to be sent to the client.

func ParseRange added in v0.7.1

func ParseRange(s string, size int64) ([]Range, error)

ParseRange parses a Range header string as per RFC 7233. ErrNoOverlap is returned if none of the ranges overlap. ErrInvalid is returned if s is invalid range.

func (Range) ContentRange added in v0.7.1

func (r Range) ContentRange(size int64) string

ContentRange returns Content-Range header value.

type Ranger

type Ranger struct {
	// contains filtered or unexported fields
}

Ranger can split a file into chunks of a given size.

func NewRanger

func NewRanger(chunkSize int64) Ranger

NewRanger creates a new Ranger with the given chunk size. If the chunk size is <= 0, the default chunk size is used.

func (Ranger) Index

func (r Ranger) Index(i int64) int

Index returns the index of the chunk that contains the given offset.

func (Ranger) Ranges

func (r Ranger) Ranges(length int64) []ByteRange

Ranges creates a list of byte ranges with the given chunk size.

type SizedRanger added in v0.6.0

type SizedRanger struct {
	// contains filtered or unexported fields
}

func NewSizedRanger added in v0.6.0

func NewSizedRanger(length int64, ranger Ranger) SizedRanger

func (SizedRanger) Length added in v0.6.0

func (r SizedRanger) Length() int64

func (SizedRanger) RangeContaining added in v0.7.0

func (r SizedRanger) RangeContaining(offset int64) (br ByteRange)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL