cosmovisor

package module
v0.0.0-...-3d6b546 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2020 License: Apache-2.0 Imports: 18 Imported by: 0

README

Cosmovisor

This is a tiny shim around Cosmos SDK binaries that use the upgrade module that allows for smooth and configurable management of upgrading binaries as a live chain is upgraded, and can be used to simplify validator devops while doing upgrades or to make syncing a full node for genesis simple. The cosmovisor will monitor the stdout of the daemon to look for messages from the upgrade module indicating a pending or required upgrade and act appropriately. (With better integrations possible in the future).

Arguments

cosmovisor is a shim around a native binary. All arguments passed to the cosmovisor command will be passed to the current daemon binary (as a subprocess). It will return stdout and stderr of the subprocess as it's own. Because of that, it cannot accept any command line arguments, nor print anything to output (unless it dies before executing a binary).

Configuration will be passed in the following environmental variables:

  • DAEMON_HOME is the location where upgrade binaries should be kept (can be $HOME/.gaiad or $HOME/.xrnd)
  • DAEMON_NAME is the name of the binary itself (eg. xrnd, gaiad, simd)
  • DAEMON_ALLOW_DOWNLOAD_BINARIES (optional) if set to true will enable auto-downloading of new binaries (for security reasons, this is intended for fullnodes rather than validators)
  • DAEMON_RESTART_AFTER_UPGRADE (optional) if set to true it will restart the sub-process with the same args (but new binary) after a successful upgrade. By default, the cosmovisor dies afterward and allows the cosmovisor to restart it if needed. Note that this will not auto-restart the child if there was an error.

Folder Layout

$DAEMON_HOME/cosmovisor is expected to belong completely to the cosmovisor and subprocesses controlled by it. Under this folder, we will see the following:

.
├── current -> genesis or upgrades/<name>
├── genesis
│   └── bin
│       └── $DAEMON_NAME
└── upgrades
    └── <name>
        └── bin
            └── $DAEMON_NAME

Each version of the chain is stored under either genesis or upgrades/<name>, which holds bin/$DAEMON_NAME along with any other needed files (maybe the cli client? maybe some dlls?). current is a symlink to the currently active folder (so current/bin/$DAEMON_NAME is the binary)

Note: the <name> after upgrades is the URI-encoded name of the upgrade as specified in the upgrade module plan.

Please note that $DAEMON_HOME/cosmovisor just stores the binaries and associated program code. The cosmovisor binary can be stored in any typical location (eg /usr/local/bin). The actual blockchain program will store it's data under $GAIA_HOME etc, which is independent of the $DAEMON_HOME. You can choose to export GAIA_HOME=$DAEMON_HOME and then end up with a configuation like the following, but this is left as a choice to the admin for best directory layout.

.gaiad
├── config
├── data
└── cosmovisor

Usage

Basic Usage:

  • The admin is responsible for installing the cosmovisor and setting it as a eg. systemd service to auto-restart, along with proper environmental variables
  • The admin is responsible for installing the genesis folder manually
  • The cosmovisor will set the current link to point to genesis at first start (when no current link exists)
  • The admin is (generally) responsible for installing the upgrades/<name> folders manually
  • The cosmovisor handles switching over the binaries at the correct points, so the admin can prepare days in advance and relax at upgrade time

Note that chains that wish to support upgrades may package up a genesis cosmovisor tar file with this info, just as they prepare the genesis binary tar file. In fact, they may offer a tar file will all upgrades up to current point for easy download for those who wish to sync a fullnode from start.

The DAEMON specific code, like the tendermint config, the application db, syncing blocks, etc is done as normal. The same eg. GAIA_HOME directives and command-line flags work, just the binary name is different.

Upgradeable Binary Specification

In the basic version, the cosmovisor will read the stdout log messages to determine when an upgrade is needed. We are considering more complex solutions via signaling of some sort, but starting with the simple design:

  • when an upgrade is needed the binary will print a line that matches this regular expression: UPGRADE "(.*)" NEEDED at height (\d+):(.*).
  • the second match in the above regular expression can be a JSON object with a binaries key as described above

The name (first regexp) will be used to select the new binary to run. If it is present, the current subprocess will be killed, current will be upgraded to the new directory, and the new binary will be launched.

Question should we just kill the cosmovisor after it does the updates? so it gets a clean restart and just runs the new binary (under current). it should be safe to restart (as a service).

Auto-Download

Generally, the system requires that the administrator place all relevant binaries on the disk before the upgrade happens. However, for people who don't need such control and want an easier setup (maybe they are syncing a non-validating fullnode and want to do little maintenance), there is another option.

If you set DAEMON_ALLOW_DOWNLOAD_BINARIES=on then when an upgrade is triggered and no local binary can be found, the cosmovisor will attempt to download and install the binary itself. The plan stored in the upgrade module has an info field for arbitrary json. This info is expected to be outputed on the halt log message. There are two valid format to specify a download in such a message:

  1. Store an os/architecture -> binary URI map in the upgrade plan info field as JSON under the "binaries" key, eg:
{
  "binaries": {
    "linux/amd64":"https://example.com/gaia.zip?checksum=sha256:aec070645fe53ee3b3763059376134f058cc337247c978add178b6ccdfb0019f"
  }
}

The "any" key, if it exists, will be used as a default if there is not a specific os/architecture key. 2. Store a link to a file that contains all information in the above format (eg. if you want to specify lots of binaries, changelog info, etc without filling up the blockchain).

e.g https://example.com/testnet-1001-info.json?checksum=sha256:deaaa99fda9407c4dbe1d04bd49bab0cc3c1dd76fa392cd55a9425be074af01e

This file contained in link will be retrieved by go-getter and the "binaries" field will be parsed as above.

If there is no local binary, DAEMON_ALLOW_DOWNLOAD_BINARIES=true, and we can access a canonical url for the new binary, then the cosmovisor will download it with go-getter and unpack it into the upgrades/<name> folder to be run as if we installed it manually

Note that for this mechanism to provide strong security guarantees, all URLs should include a sha{256,512} checksum. This ensures that no false binary is run, even if someone hacks the server or hijacks the dns. go-getter will always ensure the downloaded file matches the checksum if it is provided. And also handles unpacking archives into directories (so these download links should be a zip of all data in the bin directory).

To properly create a checksum on linux, you can use the sha256sum utility. eg. sha256sum ./testdata/repo/zip_directory/autod.zip which should return 29139e1381b8177aec909fab9a75d11381cab5adf7d3af0c05ff1c9c117743a7. You can also use sha512sum if you like longer hashes, or md5sum if you like to use broken hashes. Make sure to set the hash algorithm properly in the checksum argument to the url.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DoUpgrade

func DoUpgrade(cfg *Config, info *UpgradeInfo) error

DoUpgrade will be called after the log message has been parsed and the process has terminated. We can now make any changes to the underlying directory without interference and leave it in a state, so we can make a proper restart

func DownloadBinary

func DownloadBinary(cfg *Config, info *UpgradeInfo) error

DownloadBinary will grab the binary and place it in the proper directory

func EnsureBinary

func EnsureBinary(path string) error

EnsureBinary ensures the file exists and is executable, or returns an error

func GetDownloadURL

func GetDownloadURL(info *UpgradeInfo) (string, error)

GetDownloadURL will check if there is an arch-dependent binary specified in Info

func LaunchProcess

func LaunchProcess(cfg *Config, args []string, stdout, stderr io.Writer) (bool, error)

LaunchProcess runs a subprocess and returns when the subprocess exits, either when it dies, or *after* a successful upgrade.

func MarkExecutable

func MarkExecutable(path string) error

MarkExecutable will try to set the executable bits if not already set Fails if file doesn't exist or we cannot set those bits

func OSArch

func OSArch() string

Types

type Config

type Config struct {
	Home                  string
	Name                  string
	AllowDownloadBinaries bool
	RestartAfterUpgrade   bool
}

Config is the information passed in to control the daemon

func GetConfigFromEnv

func GetConfigFromEnv() (*Config, error)

GetConfigFromEnv will read the environmental variables into a config and then validate it is reasonable

func (*Config) CurrentBin

func (cfg *Config) CurrentBin() (string, error)

CurrentBin is the path to the currently selected binary (genesis if no link is set) This will resolve the symlink to the underlying directory to make it easier to debug

func (*Config) GenesisBin

func (cfg *Config) GenesisBin() string

GenesisBin is the path to the genesis binary - must be in place to start manager

func (*Config) Root

func (cfg *Config) Root() string

Root returns the root directory where all info lives

func (*Config) SetCurrentUpgrade

func (cfg *Config) SetCurrentUpgrade(upgradeName string) error

SetCurrentUpgrade sets the named upgrade to be the current link, returns error if this binary doesn't exist

func (*Config) SymLinkToGenesis

func (cfg *Config) SymLinkToGenesis() (string, error)

Symlink to genesis

func (*Config) UpgradeBin

func (cfg *Config) UpgradeBin(upgradeName string) string

UpgradeBin is the path to the binary for the named upgrade

func (*Config) UpgradeDir

func (cfg *Config) UpgradeDir(upgradeName string) string

UpgradeDir is the directory named upgrade

type UpgradeConfig

type UpgradeConfig struct {
	Binaries map[string]string `json:"binaries"`
}

UpgradeConfig is expected format for the info field to allow auto-download

type UpgradeInfo

type UpgradeInfo struct {
	Name string
	Info string
}

UpgradeInfo is the details from the regexp

func WaitForUpdate

func WaitForUpdate(scanner *bufio.Scanner) (*UpgradeInfo, error)

WaitForUpdate will listen to the scanner until a line matches upgradeRegexp. It returns (info, nil) on a matching line It returns (nil, err) if the input stream errored It returns (nil, nil) if the input closed without ever matching the regexp

func WaitForUpgradeOrExit

func WaitForUpgradeOrExit(cmd *exec.Cmd, scanOut, scanErr *bufio.Scanner) (*UpgradeInfo, error)

WaitForUpgradeOrExit listens to both output streams of the process, as well as the process state itself When it returns, the process is finished and all streams have closed.

It returns (info, nil) if an upgrade should be initiated (and we killed the process) It returns (nil, err) if the process died by itself, or there was an issue reading the pipes It returns (nil, nil) if the process exited normally without triggering an upgrade. This is very unlikely to happened with "start" but may happened with short-lived commands like `gaiad export ...`

type WaitResult

type WaitResult struct {
	// contains filtered or unexported fields
}

WaitResult is used to wrap feedback on cmd state with some mutex logic. This is needed as multiple go-routines can affect this - two read pipes that can trigger upgrade As well as the command, which can fail

func (*WaitResult) AsResult

func (u *WaitResult) AsResult() (*UpgradeInfo, error)

AsResult reads the data protected by mutex to avoid race conditions

func (*WaitResult) SetError

func (u *WaitResult) SetError(myErr error)

SetError will set with the first error using a mutex don't set it once info is set, that means we chose to kill the process

func (*WaitResult) SetUpgrade

func (u *WaitResult) SetUpgrade(up *UpgradeInfo)

SetUpgrade sets first non-nil upgrade info, ensure error is then nil pass in a command to shutdown on successful upgrade

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL