wr

command module
v0.15.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 3, 2018 License: GPL-3.0 Imports: 3 Imported by: 0

README

wr - workflow runner

GoDoc Go Report Card develop branch: Build Status Coverage Status

wr is a workflow runner. You use it to run the commands in your workflow easily, automatically, reliably, with repeatability, and while making optimal use of your available computing resources.

wr is implemented as a polling-free in-memory job queue with an on-disk acid transactional embedded database, written in go.

Its main benefits over other software workflow management systems are its very low latency and overhead, its high performance at scale, its real-time status updates with a view on all your workflows on one screen, its permanent searchable history of all the commands you have ever run, and its "live" dependencies enabling easy automation of on-going projects.

Furthermore, wr has best-in-class support for OpenStack, providing incredibly easy deployment and auto-scaling without you having to know anything about OpenStack. And it has built-in support for mounting S3-like object stores, providing an easy way of running commands against remote files whilst enjoying high performance.

Current Status

wr is still being actively developed, with some significant features unimplemented, and some likelihood of encountering bugs.

However, for simple usage, for example easily running your own manually-specified commands in an OpenStack environment, it is probably safe to use (it is being used in production by multiple groups at the Sanger Institute).

So if you want to be adventurous and provide feedback...

Download

download

Alternatively, build it yourself (at least v1.10 of go is required):

  1. Install go on your machine and setup the environment according to: golang.org/doc/install (make sure to set your $GOPATH). An example way of setting up a personal Go installation in your home directory would be:

     wget "https://dl.google.com/go/go1.10.3.linux-amd64.tar.gz"
     tar -xvzf go1.10.3.linux-amd64.tar.gz && rm go1.10.3.linux-amd64.tar.gz
     export GOROOT=$HOME/go
     export PATH=$PATH:$GOROOT/bin
     mkdir work
     export GOPATH=$HOME/work
     mkdir $GOPATH/bin
     export PATH=$GOPATH/bin:$PATH
    
  2. Download, compile, and install wr:

     go get -u -d -tags netgo github.com/VertebrateResequencing/wr
     cd $GOPATH/src/github.com/VertebrateResequencing/wr
     make
    
  3. The wr executable should now be in $GOPATH/bin

If you don't have make installed and don't mind if wr version will not work, you can instead replace make above with:

curl -s https://glide.sh/get | sh
$GOPATH/bin/glide install
go install -tags netgo

Usage instructions

The download .zip should contain the wr executable, this README.md, a CHANGELOG.md and an example config file called wr_config.yml, which details all the config options available. The main things you need to know are:

  • You can use the wr executable directly from where you extracted it, or move it to where you normally install software to.
  • Use the -h option on wr and all its sub commands to get further help and instructions.
  • The default config should be fine for most people, but if you want to change something, copy the example config file to ~/.wr_config.yml and make changes to that. Alternatively, as the example config file explains, add environment variables to your shell login script and then source it. If you'll be using OpenStack, it is strongly recommended to configure database backups to go to S3.
  • The wr executable must be available at that same absolute path on all compute nodes in your cluster, so you either need to place it on a shared disk, or install it in the same place on all machines (eg. have it as part of your OS image). If you use config files, these must also be readable by all nodes (when you don't have a shared disk, it's best to configure using environment variables).
  • If you are ssh tunnelling to the node where you are running wr and wish to use the web interface, you will have to forward the host and port that it tells you the web interface can be reached on, and/or perhaps also dynamic forward using something like nc. An example .ssh/config is at the end of this document.

Right now, with the limited functionality available, you will run something like (change the options as appropriate):

  • wr manager start -s lsf
  • wr add -f cmds_in_a_file.txt -m 1G -t 2h -i my_first_cmds -r mycmd_x_mode
  • [view status on the web interface]
  • wr manager stop

(It isn't necessary to stop the manager; you can just leave it running forever.)

Note that for viewing the web interface, your browser will raise a security warning, since by default wr will generate and use its own self-signed certificate. So the first time you view it you will need to allow an exception. See the wiki for more details regarding security.

For usage on OpenStack, while you can bring up your own OpenStack server, ssh there and run wr manager start -s openstack [options] as normal it's easier to:

  • wr cloud deploy [options]
  • wr add [options]
  • [view status on the web interface]
  • wr cloud teardown

This way, you don't have to directly interact with OpenStack at all, or even know how it works.

If you have any problems getting things to start up, check out the wiki for additional guidance.

An alternative way of interacting with wr is to use it's REST API, also documented on the wiki

Performance considerations

For the most part, you should be able to throw as many jobs at wr as you like, running on as many compute nodes as you have available, and trust that wr will cope. There are no performance-related parameters to fiddle with: fast mode is always on!

However you should be aware that wr's performance will typically be limited by that of the disk you configure wr's database to be stored on (by default it is stored in your home directory), since to ensure that workflows don't break and recovery is possible after crashes or power outages, every time you add jobs to wr, and every time you finish running a job, before the operation completes it must wait for the job state to be persisted to disk in the database.

This means that in extreme edge cases, eg. you're trying to run thousands of jobs in parallel, each of which completes in milliseconds, each of which want to add new jobs to the system, you could become limited by disk performance if you're using old or slow hardware.

You're unlikely to see any performance degradation even in extreme edge cases if using an SSD and a modern disk controller. Even an NFS mount could give more than acceptable performance.

But an old spinning disk or an old disk controller (eg. limited to 100MB/s) could cause things to slow to a crawl in this edge case. "High performance" disk systems like Lustre should also be avoided, since these tend to have incredibly bad performance when dealing with many tiny writes to small files.

If this is the only hardware you have available to you, you can half the impact of disk performance by reorganising your workflow such that you add all your jobs in a single wr add call, instead of calling wr add many times with subsets of those jobs.

Implemented so far

  • Adding manually generated commands to the manager's queue.
  • Automatically running those commands on the local machine, or via LSF or OpenStack.
  • Mounting of S3-like object stores.
  • Getting the status of your commands.
  • Manually retrying failed commands.
  • Automatic retrying of failed commands, using more memory/time reservation as necessary.
  • Learning of how much memory and time commands take for best resource utilization.
  • Draining the queue if you want to stop the system as gracefully as possible, and recovering from drains, stops and crashes.
  • Specifying command dependencies, and allowing for automation by these dependencies being "live", automatically re-running commands if their dependencies get re-run or added to.

Not yet implemented

  • While the help mentions workflows, nothing workflow-related has been implemented (though you can manually build a workflow by specifying command dependencies). Common Workflow Language (CWL) compatibility is planned.
  • Get a complete listing of all commands with a given id via the webpage.
  • Checkpointing for long running commands.
  • Re-run button in web interface for successfully completed commands.
  • Ability to alter expected memory and time or change env-vars of commands.

Example .ssh/config

If you're having difficulty accessing the web frontend via an ssh tunnel, the following example ~/.ssh/config file may help. (In this example, 11302 is the web interface port that wr tells you about.)

Host ssh.myserver.org
LocalForward 11302 login.internal.myserver.org:11302
DynamicForward 20002
ProxyCommand none
Host *.internal.myserver.org
User myusername
ProxyCommand nc -X 5 -x localhost:20002 %h %p

You'll then be able to access the website at https://login.internal.myserver.org:11302 or perhaps https://localhost:11302

Documentation

Overview

Package main is a stub for wr's command line interface, with the actual implementation in the cmd package.

wr is a workflow runner. You use it to run the commands in your workflow easily, automatically, reliably, with repeatability, and while making optimal use of your available computing resources.

wr is implemented as a polling-free in-memory job queue with an on-disk acid transactional embedded database, written in go.

Its main benefits over other software workflow management systems are its very low latency and overhead, its high performance at scale, its real-time status updates with a view on all your workflows on one screen, its permanent searchable history of all the commands you have ever run, and its "live" dependencies enabling easy automation of on-going projects.

Basics

Start up the manager daemon, which gives you a url you can view the web interface on:

wr manager start -s local

In addition to the "local" scheduler, which will run your commands on all available cores of the local machine, you can also have it run your commands on your LSF cluster or in your OpenStack environment (where it will scale the number of servers needed up and down automatically).

Now, stick the commands you want to run in a text file and:

wr add -f myCommands.txt

Arbitrarily complex workflows can be formed by specifying command dependencies. Use the --help option of `wr add` for details.

Package Overview

wr's core is implemented in the queue package. This is the in-memory job queue that holds commands that still need to be run. Its multiple sub-queues enable certain guarantees: a given command will only get run by a single client at any one time; if a client dies, the command will get run by another client instead; if a command cannot be run, it is buried until the user takes action; if a command has a dependency, it won't run until its dependencies are complete.

The jobqueue package provides client+server code for interacting with the in-memory queue from the queue package, and by storing all new commands in an on-disk database, provides an additional guarantee: that (dynamic) workflows won't break because a job that was added got "lost" before it got run. It also retains all completed jobs, enabling searching through of past workflows and allowing for "live" dependencies, triggering the rerunning of previously completed commands if their dependencies change.

The jobqueue package is also what actually does the main "work" of the system: the server component knows how many commands need to be run and what their resource requirements (memory, time, cpus etc.) are, and submits the appropriate number of jobqueue runner clients to the job scheduler.

The jobqueue/scheduler package has the scheduler-specific code that ensures that these runner clients get run on the configured system in the most efficient way possible. Eg. for LSF, if we have 10 commands that need 2GB of memory to run, we will submit a job array of size 10 with 2GB of memory reservation to LSF. The most limited (and therefore potentially least contended) queue capable of running the commands will be chosen. For OpenStack, the cheapest server (in terms of cores and memory) that can run the commands will be spawned, and once there is no more work to do on those servers, they get terminated to free up resources.

The cloud package implements methods for interacting with cloud environments such as OpenStack. The corresponding jobqueue/scheduler package uses these methods to do their work.

The static subdirectory contains the html, css and javascript needed for the web interface. See jobqueue/serverWebI.go for how the web interface backend is implemented.

The internal package contains general utility functions, and most notably config.go holds the code for how the command line interface deals with config options.

Directories

Path Synopsis
Package cloud provides functions to interact with cloud providers, used to create cloud resources so that you can spawn servers, then delete those resources when you're done.
Package cloud provides functions to interact with cloud providers, used to create cloud resources so that you can spawn servers, then delete those resources when you're done.
Package cmd implements wr's command line interface.
Package cmd implements wr's command line interface.
Package internal houses code for wr's general utility functions.
Package internal houses code for wr's general utility functions.
Package jobqueue provides server/client functions to interact with the queue structure provided by the queue package over a network.
Package jobqueue provides server/client functions to interact with the queue structure provided by the queue package over a network.
scheduler
Package scheduler lets the jobqueue server interact with the configured job scheduler (if any) to submit jobqueue runner clients and have them run on a compute cluster (or local machine).
Package scheduler lets the jobqueue server interact with the configured job scheduler (if any) to submit jobqueue runner clients and have them run on a compute cluster (or local machine).
Package queue provides an in-memory queue structure suitable for the safe and low latency implementation of a real job queue.
Package queue provides an in-memory queue structure suitable for the safe and low latency implementation of a real job queue.
Package rp ("resource protector") provides functions that help control access to some limited resource.
Package rp ("resource protector") provides functions that help control access to some limited resource.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL