chunk

package module
v1.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 13, 2023 License: MIT Imports: 14 Imported by: 1

README

Chunk

Tests Format Lint GoDoc

Chunk is a download tool for slow and unstable servers.

Usage

CLI

Install it with go install github.com/cuducos/chunk/cmd/chunk@latest then:

$ chunk <URLs>

Use --help for detailed instructions.

API

The Download method returns a channel with DownloadStatus statuses. This channel is closed once all downloads are finished, but the user is in charge of handling errors.

Simplest use case
d := chunk.DefaultDownloader()
ch := d.Dowload(urls)
Customizing some options
d := chunk.DefaultDownloader()
d.MaxRetries = 42
ch := d.Dowload(urls)
Customizing everything
d := chunk.Downloader{...}
ch := d.Download(urls)

How?

It uses HTTP range requests, retries per HTTP request (not per file), prevents re-downloading the same content range and supports wait time to give servers time to recover.

Download using HTTP range requests

In order to complete downloads from slow and unstable servers, the download should be done in “chunks” using HTTP range requests. This does not rely on long-standing HTTP connections, and it makes it predictable the idea of how long is too long for a non-response.

Retries by chunk, not by file

In order to be quicker and avoid rework, the primary way to handle failure is to retry that “chunk” (content range), not the whole file.

Control of which chunks are already downloaded

In order to avoid re-starting from the beginning in case of non-handled errors, chunk knows which ranges from each file were already downloaded; so, when restarted, it only downloads what is really needed to complete the downloads.

Detect server failures and give it a break

In order to avoid unnecessary stress on the server, chunk relies not only on HTTP responses but also on other signs that the connection is stale and can recover from that and give the server some time to recover from stress.

Why?

The idea of the project emerged as it was difficult for Minha Receita to handle the download of 37 files that adds up to just approx. 5Gb. Most of the download solutions out there (e.g. got) seem to be prepared for downloading large files, not for downloading from slow and unstable servers — which is the case at hand.

Documentation

Index

Constants

View Source
const (
	DefaultTimeout              = 90 * time.Second
	DefaultConcurrencyPerServer = 8
	DefaultMaxRetries           = 5
	DefaultChunkSize            = 8192
	DefaultWaitRetry            = 1 * time.Second
	DefaultRestartDownload      = false
	DefaultUserAgent            = ""
)
View Source
const DefaultChunkDir = ".chunk"

DefaultChunkDir is the directory where Chunk keeps track of each chunk downloaded of each file. It us created under the user's home directory by default. It can be replaced by the environment variable CHUNK_DIR.

Variables

This section is empty.

Functions

This section is empty.

Types

type DownloadStatus

type DownloadStatus struct {
	// URL this status refers to
	URL string

	// DownloadedFilePath in the user local system
	DownloadedFilePath string

	// FileSizeBytes is the total size of the file as informed by the server
	FileSizeBytes int64

	// DownloadedFileBytes already downloaded from this URL
	DownloadedFileBytes int64

	// Any non-recoerable error captured during the download (this means that
	// some errors are ignored the download is retried instead of propagating
	// the error).
	Error error
}

DownloadStatus is the data propagated via the channel sent back to the user and it contains information about the download from each URL.

func (*DownloadStatus) IsFinished

func (s *DownloadStatus) IsFinished() bool

IsFinished informs the user whether a download is done (successfully or with error).

type Downloader

type Downloader struct {
	// OutputDir is where the downloaded files will be saved.  If not set,
	// defaults to the current working directory.
	OutputDir string

	// Client is the HTTP client used for all requests. It uses a customized
	// HTTP transport and timeout to handle content ranges download and
	// parallel requests to the same server. Check NewHTTPClient for
	// customizing it.
	Client *http.Client

	// TimeoutPerChunk is the timeout for the download of each chunk from each
	// URL. A chunk is a part of a file requested using the content range HTTP
	// header. Thus, this timeout is not the timeout for the each file or for
	// the the download of every file).
	Timeout time.Duration

	// MaxParallelDownloadsPerServer controls the max number of concurrent
	// connections opened to the same server. If all the URLs are from the same
	// server this is the total of concurrent connections. If the user is downloading
	// files from different servers, this limit is applied to each server
	// independently.
	ConcurrencyPerServer int

	// MaxRetriesPerChunk is the maximum amount of retries for each HTTP request
	// using the content range header that fails.
	MaxRetries uint

	// ChunkSize is the maximum size of each HTTP request done using the
	// content range header. There is no way to specify how many chunks a
	// download will need, the focus is on slicing it in smaller chunks so slow
	// and unstable servers can respond before dropping it.
	ChunkSize int64

	// WaitBetweenRetries is an optional pause before retrying an HTTP request
	// that has failed.
	WaitRetry time.Duration

	// RestartDownloads controls whether or not to continue the download of
	// previous download attempts, skipping chunks already downloaded.
	RestartDownloads bool

	// ProgressDir is the directory where Chunk keeps track of each chunk
	// downloaded of each file.
	ProgressDir string

	// UserAgent is the user agent used for all requests. If not set, no
	// user agent is sent.
	UserAgent string
}

Downloader can be configured by the user before starting the download using the following fields. This configurations impacts how the download will be handled, including retries, amount of requests, and size of each request, for example.

func DefaultDownloader

func DefaultDownloader() *Downloader

NewDownloader creates a downloader with the default configuration. Check the constants in this package for their values.

func (*Downloader) Download

func (d *Downloader) Download(urls ...string) <-chan DownloadStatus

Download from all URLs slicing each in a series of chunks, of small HTTP requests using the content range header.

func (*Downloader) DownloadWithContext

func (d *Downloader) DownloadWithContext(ctx context.Context, urls ...string) <-chan DownloadStatus

DownloadWithContext is a version of Download that takes a context. The context can be used to stop all downloads in progress.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL