repro-get

module
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 14, 2023 License: Apache-2.0

README ΒΆ

[⬇️ Download] [πŸ“– Quick start] [❓FAQs & Troubleshooting]

repro-get: reproducible apt, dnf, apk, and pacman, with content-addressing

βœ… HTTP and HTTPS

βœ… Filesystems

βœ… OCI (Open Container Initiative) registries

βœ… IPFS

repro-get installs a specific snapshot of packages using SHA256SUMS, for the sake of reproducible builds:

$ cat SHA256SUMS-amd64
35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc  pool/main/h/hello/hello_2.10-2_amd64.deb

$ repro-get install SHA256SUMS-amd64
(001/001) hello_2.10-2_amd64.deb Downloading from http://debian.notset.fr/snapshot/by-hash/SHA256/35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc
...
Preparing to unpack .../35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc ...
Unpacking hello (2.10-2) ...
Setting up hello (2.10-2) ...

repro-get supports the following distros:

Distro "Batteries included" Support generating Dockerfiles Support verifying package signatures
debian βœ… βœ… ❌
ubuntu βœ… ❌ ❌
fedora (Experimental) βœ… ❌ βœ…
alpine (Experimental) ❌ ❌ βœ…
arch βœ… βœ… βœ…
"Batteries included" for Debian, Ubuntu, Fedora, and Arch Linux.

On Debian, the packages are fetched from the following URLs by default:

  • http://deb.debian.org/debian/{{.Name}} for recent packages (fast, but ephemeral)
  • http://snapshot-cloudflare.debian.org/archive/debian/{{timeToDebianSnapshot .Epoch}}/{{.Name}} for archived packages (slow, but persistent)

On Ubuntu: http://launchpad.net/ubuntu/+archive/primary/+files/{{.Basename}}

On Fedora: https://kojipkgs.fedoraproject.org/packages/{{.Name}}

On Arch Linux: https://archive.archlinux.org/packages/{{.Name}}

On other distros, the file provider has to be manually specified in the --provider=... flag for long-term persistence.

The following file providers are supported:

  • HTTP/HTTPS URLs, such as http://debian.notset.fr/snapshot/by-hash/SHA256/{{.SHA256}}
  • Filesystems, such as file:///mnt/nfs/files/{{.Basename}}, or file:///mnt/nfs/blobs/{{.SHA256}}
  • OCI-compliant container registries, such as oci://ghcr.io/USERNAME/REPO
  • IPFS gateways, such as http://ipfs.io/ipfs/{{.CID}}

Quick start

Set up

Download the latest binary release from https://github.com/reproducible-containers/repro-get/releases .

To install repro-get from source, install Go, run make, and sudo make install. The recommended version of Go is written in the go.mod file.

The binary release can be reproduced locally by checking out the related tag and running make artifacts.docker.

Installing packages with the hash file

Create the SHA256SUMS-amd64 file for the hello package, using the information from apt-cache show hello:

35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc  pool/main/h/hello/hello_2.10-2_amd64.deb

Then run repro-get install SHA256SUMS-amd64:

$ repro-get install SHA256SUMS-amd64
(001/001) hello_2.10-2_amd64.deb Downloading from http://debian.notset.fr/snapshot/by-hash/SHA256/35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc
...
Preparing to unpack .../35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc ...
Unpacking hello (2.10-2) ...
Setting up hello (2.10-2) ...

See also Dockerfile for running repro-get inside containers.

Generating the hash file

Note

Make sure to run apt-get update before running repro-get hash generate.

See also Dockerfile for how to run apt-get update in a container image such as debian:bullseye-yyyyMMdd.

To generate the hash for all the installed packages, including the system packages:

repro-get hash generate >SHA256SUMS-amd64

To generate the hash for specific packages:

repro-get hash generate hello >SHA256SUMS-amd64

To generate the hash for newly installed packages:

repro-get hash generate >SHA256SUMS-amd64.old
apt-get install -y hello
repro-get hash generate --dedupe=SHA256SUMS-amd64.old >SHA256SUMS-amd64
Updating the hash file

Note

Make sure to run apt-get update before running repro-get hash update.

To update the hash file:

repro-get hash update SHA256SUMS-amd64

Advanced usage

Dockerfile

Warning

repro-get dockerfile generate is an experimental feature.

The following example produces an image with gcc, using the packages from 2021-12-20.

# Generate "Dockerfile.generate-hash" and "Dockerfile" in the current directory
repro-get --distro=debian dockerfile generate . debian:bullseye-20211220 gcc build-essential

 Enable BuildKit
export DOCKER_BUILDKIT=1

# Generate "SHA256SUMS-amd64" file in the current directory (needed by the next step)
docker build --output . -f Dockerfile.generate-hash .

# Build the image
docker build .

See ./examples/gcc for an example output.

See also FAQs for "bit-to-bit" reproducibility of container images.

Cache management

The cache directory (--cache) defaults to /var/cache/repro-get.

Populate

To populate the package files into the cache without installing them:

repro-get download SHA256SUMS-amd64
Export

To export the cached package files to the current directory:

repro-get cache export .
Import

To import package files in the current directory into the cache:

repro-get cache import .
Clean

To clean the cache:

repro-get cache clean
Container registries

repro-get supports downloading package files from OCI-compliant container registries.

Note

Make sure to create a container registry credential as ~/.docker/config.json .

Push

To push the package files into a container registry such as https://ghcr.io/ , use ORAS:

repro-get cache export .
oras push ghcr.io/USERNAME/dpkgs:latest *.deb
Pull

To pull and install packages from the registry:

repro-get --provider=oci://ghcr.io/USERNAME/dpkgs install SHA256SUMS-amd64

Tips about the oci://... provider strings:

  • The provider string does not need contain the :<TAG>@<DIGEST> value, as repro-get ignores the container manifests.
  • Defaults to HTTPS for non-localhost registries. Use oci+http://... scheme to disable HTTPS.
IPFS

repro-get also supports uploading package files to IPFS, and downloading them from IPFS via an IPFS gateway such as http://ipfs.io/ipfs/{{.CID}} .

Note

The ipfs command (Kubo) needs to be installed for pushing (not for pulling).

Push

Run repro-get ipfs push to push the package files, and update the hash file to include the IPFS CIDs:

$ cat SHA256SUMS-amd64
35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc  pool/main/h/hello/hello_2.10-2_amd64.deb

$ repro-get ipfs push SHA256SUMS-amd64
35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc  /ipfs/QmRY19HEWeTJtRC6vAdz7rDfX3PjSMgXmd1KYi9guAACUj

$ cat SHA256SUMS-amd64
35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc  pool/main/h/hello/hello_2.10-2_amd64.deb
35b1508eeee9c1dfba798c4c04304ef0f266990f936a51f165571edf53325cbc  /ipfs/QmRY19HEWeTJtRC6vAdz7rDfX3PjSMgXmd1KYi9guAACUj
Pull

To pull and install packages from IPFS:

repro-get --provider=http://ipfs.io/ipfs/{{.CID}} install SHA256SUMS-amd64

The hash file must contain the ... /ipfs/... lines.

The hash file may contain multiple CIDs for a single SHA256, but only a single CID is used for pulling.

FAQs

Why do we need reproducibility?

For supply chain security.

If a binary can be bit-to-bit reproducible by multiple independent people, the binary (and its distributor) can be considered more trustable than others.

Achieving bit-to-bit reproducibility is still challenging (see below), but even "quasi-"reproducibility is useful for avoiding regressions that could be introduced by installing unexpected updates.

See also https://reproducible-builds.org/docs/buy-in/ .

Why not just use snapshot.debian.org with apt-get?

Although it is already possible to reproduce a specific snapshot of Debian by specifying deb [...] http://snapshot.debian.org/archive/debian/yyyyMMddTHHmmssZ/ ... ... in /etc/apt/sources.list, this will cause a huge traffic on snapshot.debian.org when everybody begins to make builds reproducible.

repro-get mitigates this issue by content-addressing: A package file can be fetched from anywhere, such as HTTP(S) sites, local filesystems, OCI registries, or even IPFS, by its SHA256 (or CID) checksum. Also, as the package files are verified by checksums, existing package files are not affected by potential GPG key leakage.

Are container images "bit-to-bit" reproducible?

Yes, with BuildKit v0.11 or later.

See ./hack/test-dockerfile-repro.sh for testing reproducibility.

However, it should be noted that the reproducibility is not guaranteed across different versions of BuildKit. The host operating system version, filesystem configuration, etc. may affect reproducibility too.

How to use HTTPS on Debian/Ubuntu?
repro-get --provider='https://deb.debian.org/debian/{{.Name}},https://debian.notset.fr/snapshot/by-hash/SHA256/{{.SHA256}}' install

Using HTTPS needs the ca-certificates package to be installed. The ca-certificates package is not installed by default in the debian and ubuntu) images on Docker Hub.

Why not use HTTPS by default on Debian/Ubuntu?

Because apt-get does not use HTTPS by default, either. See an archive of whydoesaptnotusehttps.com for the reason.

Acknowledgement

A huge thanks to FrΓ©dΓ©ric Pierret (@fepitre) for maintaining the snapshot server http://snapshot.notset.fr/ . Also huge thanks to maintainers of http://snapshot.debian.org/ , https://kojipkgs.fedoraproject.org/ , and other package snapshot servers. repro-get could not be implemented without these snapshot servers.

Directories ΒΆ

Path Synopsis
cmd
pkg
cache
Package cache provides the blob cache.
Package cache provides the blob cache.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL