scheduler

package
v0.15.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 3, 2018 License: GPL-3.0 Imports: 21 Imported by: 0

Documentation

Overview

Package scheduler lets the jobqueue server interact with the configured job scheduler (if any) to submit jobqueue runner clients and have them run on a compute cluster (or local machine).

Currently implemented schedulers are local, LSF and OpenStack. The implementation of each supported scheduler type is in its own .go file.

It's a pseudo plug-in system in that it is designed so that you can easily add a go file that implements the methods of the scheduleri interface, to support a new job scheduler. On the other hand, there is no dynamic loading of these go files; they are all imported (they all belong to the scheduler package), and the correct one used at run time. To "register" a new scheduleri implementation you must add a case for it to New() and rebuild.

import "github.com/VertebrateResequencing/wr/jobqueue/scheduler"
s, err := scheduler.New("local", &scheduler.ConfigLocal{"bash"})
req := &scheduler.Requirements{RAM: 300, Time: 2 * time.Hour, Cores: 1}
err = s.Schedule("myWRRunnerClient -args", req, 24)
// wait, and when s.Busy() returns false, your command has been run 24 times

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrBadScheduler = "unknown scheduler name"
	ErrImpossible   = "scheduler cannot accept the job, since its resource requirements are too high"
	ErrBadFlavor    = "unknown server flavor"
)

Err* constants are found in the returned Errors under err.Err, so you can cast and check if it's a certain type of error.

Functions

This section is empty.

Types

type BadServerCallBack added in v0.10.0

type BadServerCallBack func(server *cloud.Server)

BadServerCallBack functions receive a server when a cloud scheduler discovers that a server it spawned no longer seems functional. It's possible that this was due to a temporary networking issue, in which case the callback will be called again with the same server when it is working fine again: check server.IsBad(). If it's bad, you'd probably call server.Destroy() after confirming the server is definitely unusable (eg. ask the end user to manually check).

type CloudConfig added in v0.12.0

type CloudConfig interface {
	// AddConfigFile takes a value like that of the ConfigFiles property of the
	// struct implementing this interface, and appends this value to what is
	// in ConfigFiles, or sets it if unset.
	AddConfigFile(spec string)
}

CloudConfig interface could be satisfied by the config option taken by cloud schedulers which have a ConfigFiles property.

type CmdStatus

type CmdStatus struct {
	Count   int
	Running [][2]int // a slice of [id, index] tuples
	Pending [][2]int // ditto
	Other   [][2]int // ditto, for jobs in some strange state
}

CmdStatus lets you describe how many of a given cmd are already in the job scheduler, and gives the details of those jobs.

type ConfigLSF

type ConfigLSF struct {
	// deployment is one of "development" or "production".
	Deployment string

	// shell is the shell to use to run the commands to interact with your job
	// scheduler; 'bash' is recommended.
	Shell string
}

ConfigLSF represents the configuration options required by the LSF scheduler. All are required with no usable defaults.

type ConfigLocal

type ConfigLocal struct {
	// Shell is the shell to use to run your commands with; 'bash' is
	// recommended.
	Shell string

	// StateUpdateFrequency is the frequency at which to re-check the queue to
	// see if anything can now run. 0 (default) is treated as 1 minute.
	StateUpdateFrequency time.Duration

	// MaxCores is the maximum number of CPU cores on the machine to use for
	// running jobs. Specifying more cores than the machine has results in using
	// as many cores as the machine has, which is also the default. Values
	// below 1 are treated as default.
	MaxCores int

	// MaxRAM is the maximum amount of machine memory to use for running jobs.
	// The unit is in MB, and defaults to all available memory. Specifying more
	// than this uses the default amount. Values below 1 are treated as default.
	MaxRAM int
}

ConfigLocal represents the configuration options required by the local scheduler. All are required with no usable defaults.

type ConfigOpenStack

type ConfigOpenStack struct {
	// ResourceName is the resource name prefix used to name any resources (such
	// as keys, security groups and servers) that need to be created.
	ResourceName string

	// OSPrefix is the prefix or full name of the Operating System image you
	// wish spawned servers to run by default (overridden during Schedule() by a
	// Requirements.Other["cloud_os"] value)
	OSPrefix string

	// OSUser is the login username of your chosen Operating System from
	// OSPrefix. (Overridden during Schedule() by a
	// Requirements.Other["cloud_user"] value.)
	OSUser string

	// OSRAM is the minimum RAM in MB needed to bring up a server instance that
	// runs your Operating System image. It defaults to 2048. (Overridden during
	// Schedule() by a Requirements.Other["cloud_os_ram"] value.)
	OSRAM int

	// OSDisk is the minimum disk in GB with which to bring up a server instance
	// that runs your Operating System image. It defaults to 1. (Overridden
	// during Schedule() by a Requirements.Disk value.)
	OSDisk int

	// FlavorRegex is a regular expression that you can use to limit what
	// flavors of server will be created to run commands on. The default of an
	// empty string means there is no limit, and any available flavor can be
	// used. (The flavor chosen for a command will be the flavor with the least
	// specifications (RAM, CPUs, Disk) capable of running the command, that
	// also satisfies this regex.)
	FlavorRegex string

	// PostCreationScript is the []byte content of a script you want executed
	// after a server is Spawn()ed. (Overridden during Schedule() by a
	// Requirements.Other["cloud_script"] value.)
	PostCreationScript []byte

	// ConfigFiles is a comma separated list of paths to config files that
	// should be copied over to all spawned servers. Absolute paths are copied
	// over to the same absolute path on the new server. To handle a config file
	// that should remain relative to the home directory (and where the spawned
	// server may have a different username and thus home directory path
	// compared to the current server), use the prefix ~/ to signify the home
	// directory. It silently ignores files that don't exist locally.
	// (Appended to during Schedule() by a
	// Requirements.Other["cloud_config_files"] value.)
	ConfigFiles string

	// ServerPorts are the TCP port numbers you need to be open for
	// communication with any spawned servers. At a minimum you will need to
	// specify []int{22}.
	ServerPorts []int

	// SavePath is an absolute path to a file on disk where details of any
	// created resources can be read from and written to.
	SavePath string

	// ServerKeepTime is the time to wait before an idle server is destroyed.
	// Zero duration means "never destroy due to being idle".
	ServerKeepTime time.Duration

	// StateUpdateFrequency is the frequency at which to check spawned servers
	// that are being used to run things, to see if they're still alive.
	// 0 (default) is treated as 1 minute.
	StateUpdateFrequency time.Duration

	// MaxInstances is the maximum number of instances we are allowed to spawn.
	// -1 means we will be limited by your quota, if any. 0 (the default) means
	// no additional instances will be spawned (commands will run locally on the
	// same instance the manager is running on).
	MaxInstances int

	// Shell is the shell to use to run your commands with; 'bash' is
	// recommended.
	Shell string

	// CIDR describes the range of network ips that can be used to spawn
	// OpenStack servers on which to run our commands. The default is
	// "192.168.0.0/18", which allows for 16381 servers to be spawned. This
	// range ends at 192.168.63.254.
	CIDR string

	// GatewayIP is the gateway ip address for the subnet that will be created
	// with the given CIDR. It defaults to 192.168.0.1.
	GatewayIP string

	// DNSNameServers is a slice of DNS IP addresses to use for lookups on the
	// created subnet. It defaults to Google's: []string{"8.8.4.4", "8.8.8.8"}
	DNSNameServers []string
}

ConfigOpenStack represents the configuration options required by the OpenStack scheduler. All are required with no usable defaults, unless otherwise noted. This struct implements the CloudConfig interface.

func (*ConfigOpenStack) AddConfigFile added in v0.12.0

func (c *ConfigOpenStack) AddConfigFile(configFile string)

AddConfigFile takes a value as per the ConfigFiles property, and appends it to the existing ConfigFiles value (or sets it if unset).

type Error

type Error struct {
	Scheduler string // the scheduler's Name
	Op        string // name of the method
	Err       string // one of our Err* vars
}

Error records an error and the operation and scheduler that caused it.

func (Error) Error

func (e Error) Error() string

type MessageCallBack added in v0.10.0

type MessageCallBack func(msg string)

MessageCallBack functions receive a message that would be good to display to end users, so they understand current error conditions related to the scheduler.

type Requirements

type Requirements struct {
	RAM   int               // the expected peak RAM in MB Cmd will use while running
	Time  time.Duration     // the expected time Cmd will take to run
	Cores float64           // how many processor cores the Cmd will use
	Disk  int               // the required local disk space in GB the Cmd needs to run
	Other map[string]string // a map that will be passed through to the job scheduler, defining further arbitrary resource requirements
}

Requirements describes the resource requirements of the commands you want to run, so that when provided to a scheduler it will be able to schedule things appropriately.

func (*Requirements) Stringify added in v0.4.0

func (req *Requirements) Stringify() string

Stringify represents the contents of the Requirements as a string, sorting the keys of Other to ensure the same result is returned for the same content every time. Note that the data in Other undergoes a 1-way transformation, so you cannot recreate the Requirements from the the output of this method.

type Scheduler

type Scheduler struct {
	Name string

	sync.Mutex
	log15.Logger
	// contains filtered or unexported fields
}

Scheduler gives you access to all of the methods you'll need to interact with a job scheduler.

func New

func New(name string, config interface{}, logger ...log15.Logger) (*Scheduler, error)

New creates a new Scheduler to interact with the given job scheduler. Possible names so far are "lsf", "local" and "openstack". You must also provide a config struct appropriate for your chosen scheduler, eg. for the local scheduler you will provide a ConfigLocal.

Providing a logger allows for debug messages to be logged somewhere, along with any "harmless" or unreturnable errors. If not supplied, we use a default logger that discards all log messages.

func (*Scheduler) Busy

func (s *Scheduler) Busy() bool

Busy reports true if there are any Schedule()d cmds still in the job scheduler's system. This is useful when testing and other situations where you want to avoid shutting down the server while there are still clients running/ about to run.

func (*Scheduler) Cleanup

func (s *Scheduler) Cleanup()

Cleanup means you've finished using a scheduler and it can delete any remaining jobs in its system and clean up any other used resources.

func (*Scheduler) HostToID added in v0.10.0

func (s *Scheduler) HostToID(host string) string

HostToID will return the server id of the server with the given host name, if the scheduler is cloud based. Otherwise this just returns an empty string.

func (*Scheduler) MaxQueueTime

func (s *Scheduler) MaxQueueTime(req *Requirements) time.Duration

MaxQueueTime returns the maximum amount of time that jobs with the given resource requirements are allowed to run for in the job scheduler's queue. If the job scheduler doesn't have a queue system, or if the queue allows jobs to run forever, then this returns a 0 length duration, which should be regarded as "infinite" queue time.

func (*Scheduler) ReserveTimeout

func (s *Scheduler) ReserveTimeout() int

ReserveTimeout returns the number of seconds that runners spawned in this scheduler should wait for new jobs to appear in the manager's queue.

func (*Scheduler) Schedule

func (s *Scheduler) Schedule(cmd string, req *Requirements, count int) error

Schedule gets your cmd scheduled in the job scheduler. You give it a command that you would like `count` identical instances of running via your job scheduler. If you already had `count` many scheduled, it will do nothing. If you had less than `count`, it will schedule more to run. If you have more than `count`, it will remove the appropriate number of scheduled (but not yet running) jobs that were previously scheduled for this same cmd (counts of 0 are legitimate - it will get rid of all non-running jobs for the cmd). If no error is returned, you know all `count` of your jobs are now scheduled and will eventually run unless you call Schedule() again with the same command and a lower count. NB: there is no guarantee that the jobs run successfully, and no feedback on their success or failure is given.

func (*Scheduler) SetBadServerCallBack added in v0.10.0

func (s *Scheduler) SetBadServerCallBack(cb BadServerCallBack)

SetBadServerCallBack sets the function that will be called when a cloud scheduler discovers that one of the servers it spawned seems to no longer be functional or reachable. Only relevant for cloud schedulers.

func (*Scheduler) SetMessageCallBack added in v0.10.0

func (s *Scheduler) SetMessageCallBack(cb MessageCallBack)

SetMessageCallBack sets the function that will be called when a scheduler has some message that could be informative to end users wondering why something is not getting scheduled. The message typically describes an error condition.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL