README

go-drmaa

GoDoc Apache V2 License Go Report Card

This is a job submission library for Go (#golang) which is compatible to the DRMAA standard. The Go library is a wrapper around the DRMAA C library implementation provided by many distributed resource managers (cluster schedulers).

The library was developed using Univa Grid Engine's libdrmaa.so. It was tested with Grid Engine, Torque, and SLURM, but it should work also other resource managers / cluster schedulers which provide libdrmaa.so.

The "gestatus" subpackage only works with Grid Engine (some values are only available on Univa Grid Engine).

The DRMAA (Distributed Resource Management Application API) standard is meanwhile available in version 2. DRMAA2 provides more functionality around cluster monitoring and job session management. DRMAA and DRMAA2 are not compatible hence it is expected that both libraries are co-existing for a while. The Go DRMAA2 can be found here.

Note: Univa Grid Engine 8.3.0 and later added new functions that allows you to submit a job on behalf of another user. This helps creating a DRMAA service (like a web portal) that submits jobs. This functionality is available in the UGE83_sudo branch: https://github.com/dgruber/drmaa/tree/UGE83_sudo The functions are: RunJobsAs(), RunBulkJobsAs(), and ControlAs()

Compilation

First download the package:

   export GOPATH=${GOPATH:-~/src/go}
   mkdir -p $GOPATH
   go get -d github.com/dgruber/drmaa
   cd $GOPATH/github.com/dgruber/drmaa

Next, we need to compile the code.

For Univa Grid Engine and original SGE:

   source /path/to/grid/engine/installation/default/settings.sh
   ./build.sh
   cd examples/simplesubmit
   go build
   export LD_LIBRARY_PATH=$SGE_ROOT/lib/lx-amd64
   ./simplesubmit

For Son of Grid Engine ("loveshack"):

   source /path/to/grid/engine/installation/default/settings.sh
   ./build.sh --sog
   cd examples/simplesubmit
   go build
   export LD_LIBRARY_PATH=$SGE_ROOT/lib/lx-amd64
   ./simplesubmit

For Torque:

If your Torque drmaa.h header file is not located under /usr/include/torque, you will have to modify the build.sh script before running it.

   ./build.sh --torque
   cd examples/simplesubmit
   go build
   ./simplesubmit

For SLURM:

   ./build.sh --slurm
   cd examples/simplesubmit
   go build
   ./simplesubmit

The example program submits a sleep job into the system and prints out detailed job information as soon as the job is started.

Short Introduction in Go DRMAA

Go DRMAA applications need to open a DRMAA session before the DRMAA calls can be executed. Opening a DRMAA session usually establishes a connection to the cluster scheduler (distributed resource manager). Hence if no more DRMAA calls are made the Exit() method of the session must be executed. This tears down the connection. When an application does not call the Exit() method this can leave a communication handle open on the cluster scheduler side (which can take a while to be removed automatically). It should be always avoided not to call Exit(). In Go the defer statement can be used but remember that the function is not executed when an os.Exit() call is made.

Creating a DRMAA session:

s, err := drmaa.MakeSession()

Usually jobs and job workflows are submitted within DRMAA applications. In order to submit a job first a job template needs to be allocated:

jt, errJT := s.AllocateJobTemplate()
if errJT != nil {
   fmt.Printf("Error during allocating a new job template: %s\n", errJT)
   return
}

Underneath a C job template is allocated which is out-of-scope of the Go system. Hence it must be ensured that the job template is deleted when it is not used anymore. Also here the Go defer statement is useful.

// prevent memory leaks by freeing the allocated C job template at the end
defer s.DeleteJobTemplate(&jt)

The job template contains the specification of the job, like the command to be executed and its parameters. Those can be set by the setter methods of the job.

// set the application to submit
jt.SetRemoteCommand("sleep")
// set the parameter (use SetArgs() when having more parameters)
jt.SetArg("1")

A job can be executed with the session RunJob() method. If the same command should be executed many times, running it as a job array would make sense. In Grid Engine each instance gets a task ID assigned which the job can see in the SGE_TASK_ID environment variable (which is set to unknown for normal jobs). This task ID can be used for finding the right data set the job (array job task) needs to process. Submitting an array job is done with the RunBulkJobs() method.

jobID, errSubmit := s.RunJob(&jt)

// submitting 1000 instances of the same job
jobIDs, errBulkSubmit := s.RunBulkJobs(&jt, 1, 1000, 1)

A job state can also be changed (suspended / resumed / put in hold / deleted):

errTerm := s.TerminateJob(jobID)

The JobInfo data structure contains the runtime information of the job, like exit status or the amount of used resources (memory / IO / etc.). The JobInfo data structure can be get with the Wait() method.

jinfo, errWait := s.Wait(jobID, drmaa.TimeoutWaitForever)

For more details please consult the documentation and the DRMAA standard specifications.

More examples can be found on my blog at http://www.gridengine.eu.

Documentation

Overview

Package drmaa is a job submission library for job schedulers like Univa Grid Engine. It is based on the open Distributed Resource Management Application API standard (version 1). It requires a C library (libdrmaa.so) usually shipped with a job job scheduler.

Index

Constants

View Source
const (
	// TimeoutWaitForever is a time value of infinit.
	TimeoutWaitForever int64 = -1
	// TimeoutNoWait is a time value for zero.
	TimeoutNoWait int64 = 0
)

Timeout is either a positive number in seconds or one of those constants.

View Source
const (
	// Suspend is a control action for suspending a job (usually sending SIGSTOP).
	Suspend controlType = iota
	// Resume is a control action fo resuming a suspended job.
	Resume
	// Hold is a control action for preventing that a job is going to be scheduled.
	Hold
	// Release is a control action for allowing that a job in hold state is allowed to be scheduled.
	Release
	// Terminate is a control action for deleting a job.
	Terminate
)
View Source
const PlaceholderHomeDirectory string = "$drmaa_hd_ph$"

PlaceholderHomeDirectory is a placeholder for the user's home directory when filling out job template.

View Source
const PlaceholderTaskID string = "$drmaa_incr_ph$"

PlaceholderTaskID is a placeholder for the array job task ID which can be used in the job template (like in the input or output path specification).

View Source
const PlaceholderWorkingDirectory string = "$drmaa_wd_ph$"

PlaceholderWorkingDirectory is a placeholder for the working directory path which can be used in the job template (like in the input or output path specification).

Variables

This section is empty.

Functions

func GetContact

func GetContact() (string, error)

GetContact returns the contact string of the DRM system.

func GetVersion

func GetVersion() (int, int, error)

GetVersion returns the version of the DRMAA standard.

func StrError

func StrError(id ErrorID) string

StrError maps an ErrorId to an error string.

Types

type Error

type Error struct {
	Message string
	ID      ErrorID
}

Error is a Go DRMAA error type which implements the Go Error interface with the Error() method. Each external error can be casted to a pointer to that struct in order to get more information about the error (the error id).

func (Error) Error

func (ce Error) Error() string

Error implements the Go error interface for the drmaa.Error type.

type ErrorID

type ErrorID int

ErrorID is DRMAA error ID representation type

const (
	// Success indicates that no errors occurred
	Success ErrorID = iota
	// InternalError indicates an error within the DRM
	InternalError
	// DrmCommunicationFailure indicates a communication problem
	DrmCommunicationFailure
	// AuthFailure indication an error during authentification
	AuthFailure
	// InvalidArgument indicates a wrong imput parameter or an unsupported method call
	InvalidArgument
	// NoActiveSession indicates an error due to a non valid session state
	NoActiveSession
	// NoMemory indicates an OOM situation
	NoMemory
	// InvalidContactString indicates a wrong contact string
	InvalidContactString
	// DefaultContactStringError indicates an error with the contact string
	DefaultContactStringError
	// NoDefaultContactStringSelected indicates an error with the contact string
	NoDefaultContactStringSelected
	// DrmsInitFailed indicates an error when establishing a connection to the DRM
	DrmsInitFailed
	// AlreadyActiveSession indicates an error with an already existing connection
	AlreadyActiveSession
	// DrmsExitError indicates an error when shutting down the connection to the DRM
	DrmsExitError
	// InvalidAttributeFormat is an attribute format error
	InvalidAttributeFormat
	// InvalidAttributeValue is an attribute value error
	InvalidAttributeValue
	// ConflictingAttributeValues is a semantic error with conflicting attribute settings
	ConflictingAttributeValues
	// TryLater indicates a temporal problem with the DRM
	TryLater
	// DeniedByDrm indicates a permission problem
	DeniedByDrm
	// InvalidJob indicates a problem with the job or job ID
	InvalidJob
	// ResumeInconsistentState indicates a state problem
	ResumeInconsistentState
	// SuspendInconsistentState indicates a state problem
	SuspendInconsistentState
	// HoldInconsistentState indicates a state problem
	HoldInconsistentState
	// ReleaseInconsistentState indicates a state problem
	ReleaseInconsistentState
	// ExitTimeout indicates a timeout issue
	ExitTimeout
	// NoRusage indicates an issue with resource usage values
	NoRusage
	// NoMoreElements indicates that no more elements are available
	NoMoreElements
)

type FileTransferMode

type FileTransferMode struct {
	ErrorStream  bool
	InputStream  bool
	OutputStream bool
}

FileTransferMode determines which files should be staged.

type JobInfo

type JobInfo struct {
	// contains filtered or unexported fields
}

JobInfo contains all runtime information about a job.

func (*JobInfo) ExitStatus

func (ji *JobInfo) ExitStatus() int64

ExitStatus returns the exit status of the job.

func (*JobInfo) HasAborted

func (ji *JobInfo) HasAborted() bool

HasAborted returns if the job was aborted.

func (*JobInfo) HasCoreDump

func (ji *JobInfo) HasCoreDump() bool

HasCoreDump returns if the job has generated a core dump.

func (*JobInfo) HasExited

func (ji *JobInfo) HasExited() bool

HasExited returns if the job has exited.

func (*JobInfo) HasSignaled

func (ji *JobInfo) HasSignaled() bool

HasSignaled returns if the job has been signaled.

func (*JobInfo) JobID

func (ji *JobInfo) JobID() string

JobID returns the job id as string.

func (*JobInfo) ResourceUsage

func (ji *JobInfo) ResourceUsage() map[string]string

ResourceUsage returns the resource usage as a map.

func (*JobInfo) TerminationSignal

func (ji *JobInfo) TerminationSignal() string

TerminationSignal returns the termination signal of the job.

type JobTemplate

type JobTemplate struct {
	// contains filtered or unexported fields
}

JobTemplate represents a job template which is required to submit a job. A JobTemplate contains job submission parameters like a name, accounting string, command to execute, it's arguments and so on. In this implementation within a job template a pointer to an allocated C job template is stored which must be freed by the user. The values can only be accessed by the defined methods.

func (*JobTemplate) Args

func (jt *JobTemplate) Args() ([]string, error)

Args returns the arguments set in the job template for the jobs process.

func (*JobTemplate) BlockEmail

func (jt *JobTemplate) BlockEmail() (bool, error)

BlockEmail returns true if BLOCK_EMAIL is set in the job template.

func (*JobTemplate) DeadlineTime

func (jt *JobTemplate) DeadlineTime() (deadlineTime time.Duration, err error)

DeadlineTime returns deadline time. Unsupported in Grid Engine.

func (*JobTemplate) Email

func (jt *JobTemplate) Email() ([]string, error)

Email returns the email addresses set in the job template which are notified on defined events of the underlying job.

func (*JobTemplate) Env

func (jt *JobTemplate) Env() ([]string, error)

Env returns the environment variables set in the job template for the jobs process.

func (*JobTemplate) ErrorPath

func (jt *JobTemplate) ErrorPath() (string, error)

ErrorPath returns the error path set in the job template.

func (*JobTemplate) HardRunDurationLimit

func (jt *JobTemplate) HardRunDurationLimit() (deadlineTime time.Duration, err error)

HardRunDurationLimit returns the hard run-time limit for the job from the job template.

func (*JobTemplate) HardWallclockTimeLimit

func (jt *JobTemplate) HardWallclockTimeLimit() (deadlineTime time.Duration, err error)

HardWallclockTimeLimit returns the wall-clock time set in the job template.

func (*JobTemplate) InputPath

func (jt *JobTemplate) InputPath() (string, error)

InputPath returns the input file of the remote command set in the job template.

func (*JobTemplate) JobName

func (jt *JobTemplate) JobName() (string, error)

JobName returns the name set in the job template.

func (*JobTemplate) JobSubmissionState

func (jt *JobTemplate) JobSubmissionState() (SubmissionState, error)

JobSubmissionState returns the job submission state set in the job template.

func (*JobTemplate) JoinFiles

func (jt *JobTemplate) JoinFiles() (bool, error)

JoinFiles returns if join files is set in the job template.

func (*JobTemplate) NativeSpecification

func (jt *JobTemplate) NativeSpecification() (string, error)

NativeSpecification returns the native specification set i in the job template. The native specification string is used for injecting DRM specific job submission requests to the system.

func (*JobTemplate) OutputPath

func (jt *JobTemplate) OutputPath() (string, error)

OutputPath returns the output path set in the job template.

func (*JobTemplate) RemoteCommand

func (jt *JobTemplate) RemoteCommand() (string, error)

RemoteCommand returns the currently set binary which is going to be executed from the job template.

func (*JobTemplate) SetArg

func (jt *JobTemplate) SetArg(arg string) error

SetArg sets a single argument. Simple wrapper for SetArgs([]string{arg}).

func (*JobTemplate) SetArgs

func (jt *JobTemplate) SetArgs(args []string) error

SetArgs sets the arguments for the job executable in the job template.

func (*JobTemplate) SetBlockEmail

func (jt *JobTemplate) SetBlockEmail(blockmail bool) error

SetBlockEmail set the DRMAA_BLOCK_EMAIL in the job template. When this is set it overrides any default behavior of the that might send emails when a job reached a specific state. This is used to prevent emails are going to be send.

func (*JobTemplate) SetDeadlineTime

func (jt *JobTemplate) SetDeadlineTime(deadline time.Duration) error

SetDeadlineTime sets deadline time in job template. Unsupported in Grid Engine.

func (*JobTemplate) SetEmail

func (jt *JobTemplate) SetEmail(emails []string) error

SetEmail sets the emails addresses in the job template used by the cluster scheduler to send emails to.

func (*JobTemplate) SetEnv

func (jt *JobTemplate) SetEnv(envs []string) error

SetEnv sets a set of environment variables inherited from the current environment forwarded to the environment of the job when it is executed.

func (*JobTemplate) SetErrorPath

func (jt *JobTemplate) SetErrorPath(path string) error

SetErrorPath sets the path to a directory or a file which is used as error file or directory. Everything the job writes to standard error (stderr) is written in that file.

func (*JobTemplate) SetHardRunDurationLimit

func (jt *JobTemplate) SetHardRunDurationLimit(limit time.Duration) error

SetHardRunDurationLimit sets a hard run-time limit for the job in the job template.

func (*JobTemplate) SetHardWallclockTimeLimit

func (jt *JobTemplate) SetHardWallclockTimeLimit(limit time.Duration) error

SetHardWallclockTimeLimit sets a hard wall-clock time limit for the job.

func (*JobTemplate) SetInputPath

func (jt *JobTemplate) SetInputPath(path string) error

SetInputPath sets the input file which the job gets set when it is executed. The content of the file is forwarded as STDIN to the job.

func (*JobTemplate) SetJobName

func (jt *JobTemplate) SetJobName(jobname string) error

SetJobName sets the name of the job in the job template.

func (*JobTemplate) SetJobSubmissionState

func (jt *JobTemplate) SetJobSubmissionState(state SubmissionState) error

SetJobSubmissionState sets the job submission state (like the hold state) in the job template.

func (*JobTemplate) SetJoinFiles

func (jt *JobTemplate) SetJoinFiles(join bool) error

SetJoinFiles sets that the error and output files of the job have to be joined.

func (*JobTemplate) SetNativeSpecification

func (jt *JobTemplate) SetNativeSpecification(native string) error

SetNativeSpecification sets the native specification (DRM system depended job submission settings) for the job.

func (*JobTemplate) SetOutputPath

func (jt *JobTemplate) SetOutputPath(path string) error

SetOutputPath sets the path to a directory or a file which is used as output file or directory. Everything the job writes to standard output (stdout) is written in that file.

func (*JobTemplate) SetRemoteCommand

func (jt *JobTemplate) SetRemoteCommand(cmd string) error

SetRemoteCommand sets the path or the name of the binary to be started as a job in the job template.

func (*JobTemplate) SetSoftRunDurationLimit

func (jt *JobTemplate) SetSoftRunDurationLimit(limit time.Duration) error

SetSoftRunDurationLimit sets the soft run-time limit for the job in the job template.

func (*JobTemplate) SetSoftWallclockTimeLimit

func (jt *JobTemplate) SetSoftWallclockTimeLimit(limit time.Duration) error

SetSoftWallclockTimeLimit sets a soft wall-clock time limit for the job in the job template.

func (*JobTemplate) SetStartTime

func (jt *JobTemplate) SetStartTime(time time.Time) error

SetStartTime sets the earliest job start time for the job.

func (*JobTemplate) SetTransferFiles

func (jt *JobTemplate) SetTransferFiles(mode FileTransferMode) error

SetTransferFiles sets the file transfer mode in the job template.

func (*JobTemplate) SetWD

func (jt *JobTemplate) SetWD(dir string) error

SetWD sets the working directory for the job in the job template.

func (*JobTemplate) SoftRunDurationLimit

func (jt *JobTemplate) SoftRunDurationLimit() (deadlineTime time.Duration, err error)

SoftRunDurationLimit returns the soft run-time limit set in the job template.

func (*JobTemplate) SoftWallclockTimeLimit

func (jt *JobTemplate) SoftWallclockTimeLimit() (deadlineTime time.Duration, err error)

SoftWallclockTimeLimit returns the soft wall-clock time limit for the job set in the job template.

func (*JobTemplate) StartTime

func (jt *JobTemplate) StartTime() (time.Time, error)

StartTime returns the job start time set for the job.

func (*JobTemplate) String

func (jt *JobTemplate) String() string

String implements the Stringer interface for the JobTemplate. Note that this operation is not very efficient since it needs to get all values out of the C object.

func (*JobTemplate) TransferFiles

func (jt *JobTemplate) TransferFiles() (FileTransferMode, error)

TransferFiles returns the FileTransferModes set in the job template.

func (*JobTemplate) WD

func (jt *JobTemplate) WD() (string, error)

WD returns the working directory set in the job template

type PsType

type PsType int

PsType specifies a job state (output of JobPs()).

const (
	// PsUndetermined represents an unknown job state
	PsUndetermined PsType = iota
	// PsQueuedActive means the job is queued and eligible to run
	PsQueuedActive
	// PsSystemOnHold means the job is put into an hold state by the system
	PsSystemOnHold
	// PsUserOnHold means the job is put in the hold state by the user
	PsUserOnHold
	// PsUserSystemOnHold means the job is put in the hold state by the system and by the user
	PsUserSystemOnHold
	// PsRunning means the job is currently executed
	PsRunning
	// PsSystemSuspended means the job is suspended by the DRM
	PsSystemSuspended
	// PsUserSuspended means the job is suspended by the user
	PsUserSuspended
	// PsUserSystemSuspended means the job is suspended by the DRM and by the user
	PsUserSystemSuspended
	// PsDone means the job finished normally
	PsDone
	// PsFailed means the job  finished and failed
	PsFailed
)

func (PsType) String

func (pt PsType) String() string

String implements Stringer interface for simplified output of the job state (PsType).

type Session

type Session struct {
	// contains filtered or unexported fields
}

Session is a type which represents a DRMAA session.

func MakeSession

func MakeSession() (Session, error)

MakeSession creates and initializes a new DRMAA session.

func (*Session) AllocateJobTemplate

func (s *Session) AllocateJobTemplate() (jt JobTemplate, err error)

AllocateJobTemplate allocates a new C drmaa job template. On successful allocation the DeleteJobTemplate() method must be called in order to avoid memory leaks.

func (*Session) Control

func (s *Session) Control(jobID string, action controlType) error

Control sends a job modification request, i.e. terminates, suspends, resumes a job or sets it in a the hold state or release it from the job hold state.

func (*Session) DeleteJobTemplate

func (s *Session) DeleteJobTemplate(jt *JobTemplate) error

DeleteJobTemplate delets (and frees memory) of an allocated job template. Must be called in to prevent memory leaks. JobTemplates are not handled in Go garbage collector.

func (*Session) Exit

func (s *Session) Exit() error

Exit disengages a session frmo the DRMAA library and cleans it up.

func (*Session) GetAttributeNames

func (s *Session) GetAttributeNames() ([]string, error)

GetAttributeNames returns a set of supported DRMAA attributes allowed in the job template.

func (*Session) GetDrmSystem

func (s *Session) GetDrmSystem() (string, error)

GetDrmSystem returns the DRM system identification string.

func (*Session) GetDrmaaImplementation

func (s *Session) GetDrmaaImplementation() string

GetDrmaaImplementation returns information about the DRMAA implementation.

func (*Session) GetVectorAttributeNames

func (s *Session) GetVectorAttributeNames() ([]string, error)

GetVectorAttributeNames returns a set of supported DRMAA vector attributes allowed in a C job template. This functionality is not required for the Go DRMAA implementation.

func (*Session) HoldJob

func (s *Session) HoldJob(jobID string) error

HoldJob put a job into the hold state.

func (*Session) Init

func (s *Session) Init(contactString string) error

Init intitializes a DRMAA session. If contact string is "" a new session is created otherwise an existing session is connected.

func (*Session) JobPs

func (s *Session) JobPs(jobID string) (PsType, error)

JobPs returns the current state of a job.

func (*Session) ReleaseJob

func (s *Session) ReleaseJob(jobID string) error

ReleaseJob removes a hold state from a job.

func (*Session) ResumeJob

func (s *Session) ResumeJob(jobID string) error

ResumeJob sends a job resume request to the job executor.

func (*Session) RunBulkJobs

func (s *Session) RunBulkJobs(jt *JobTemplate, start, end, incr int) ([]string, error)

RunBulkJobs submits a job as an array job.

func (*Session) RunJob

func (s *Session) RunJob(jt *JobTemplate) (string, error)

RunJob submits a job in a (initialized) session to the cluster scheduler.

func (*Session) SuspendJob

func (s *Session) SuspendJob(jobID string) error

SuspendJob sends a job suspenion request to the job executor.

func (*Session) Synchronize

func (s *Session) Synchronize(jobIds []string, timeout int64, dispose bool) error

Synchronize blocks the programm until the given jobs finshed or a specific timeout is reached.

func (*Session) TerminateJob

func (s *Session) TerminateJob(jobID string) error

TerminateJob sends a job termination request to the job executor.

func (*Session) Wait

func (s *Session) Wait(jobID string, timeout int64) (jobinfo JobInfo, err error)

Wait blocks until the job left the DRM system or a timeout is reached and returns a JobInfo structure.

type SubmissionState

type SubmissionState int

SubmissionState is the initial job state when the job is submitted.

const (
	// HoldState is a job submission state which means the job should not be scheduled.
	HoldState SubmissionState = iota
	// ActiveState is a job submission state which means the job is allowed to be scheduled.
	ActiveState
)

Source Files

Directories

Path Synopsis
examples
private_gestatus
Package geparser contains functions for parsing Univa Grid Engine qstat -xml output.
Package geparser contains functions for parsing Univa Grid Engine qstat -xml output.