Documentation
¶
Index ¶
- func AllFiles(dir string, exclude ...string) ([]string, error)
- func JobIDsFromPB(ids []int64) []int
- func JobIDsToPB(ids []int) []int64
- func JobToPB(job *Job) *pb.Job
- func JobsToPB(jobs []*Job) *pb.JobList
- func TarFiles(w io.Writer, dir string, gz bool, files ...string) error
- func Untar(r io.Reader, dir string, gz bool) error
- type BareMetal
- func (bm *BareMetal) CancelJobs(ids ...int) error
- func (bm *BareMetal) FetchResults(resultsGlob string, ids ...int) ([]*Job, error)
- func (bm *BareMetal) Init()
- func (bm *BareMetal) Interactive()
- func (bm *BareMetal) JobStatus(ids ...int) []*Job
- func (bm *BareMetal) OpenLog(filename string) error
- func (bm *BareMetal) RecoverJob(job *Job) (*Job, error)
- func (bm *BareMetal) Server(name string) (*Server, error)
- func (bm *BareMetal) StartBGUpdates()
- func (bm *BareMetal) Submit(src, path, script, results string, files []byte) *Job
- func (bm *BareMetal) UpdateJobs() (nrun, nfinished int, err error)
- type Client
- func (cl *Client) CancelJobs(ids ...int) error
- func (cl *Client) Connect() error
- func (cl *Client) FetchResults(resultsGlob string, ids ...int) ([]*Job, error)
- func (cl *Client) JobStatus(ids ...int) ([]*Job, error)
- func (cl *Client) RecoverJob(job *Job) (*Job, error)
- func (cl *Client) Submit(source, path, script, resultsGlob string, files []byte) (*Job, error)
- func (cl *Client) UpdateJobs()
- type Job
- type Jobs
- type Server
- type ServerAvail
- type Servers
- type Status
- func (i Status) Desc() string
- func (i Status) Int64() int64
- func (i Status) MarshalText() ([]byte, error)
- func (i *Status) SetInt64(in int64)
- func (i *Status) SetString(s string) error
- func (i Status) String() string
- func (i *Status) UnmarshalText(text []byte) error
- func (i Status) Values() []enums.Enum
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AllFiles ¶
AllFiles returns all file names within given directory, including subdirectory, excluding those matching given glob expressions. Files are relative to dir, and do not include the full path.
func JobIDsFromPB ¶
JobIDsFromPB returns job id numbers from int64 in pb.JobIDList
func JobIDsToPB ¶
JobIDsToPB returns job id numbers as int64 for pb.JobIDList
Types ¶
type BareMetal ¶
type BareMetal struct {
// Servers is the ordered list of server machines.
Servers Servers `json:"-"`
// NextID is the next job ID to assign.
NextID int `edit:"-"`
// Active has all the active (pending, running) jobs being managed,
// in the order submitted.
// The unique key is the bare metal job ID (int).
Active Jobs
// Done has all the completed jobs that have been run.
// This list can be purged by time as needed.
// The unique key is the bare metal job ID (int).
Done Jobs
// Lock for responding to inputs.
// everything below top-level input processing is assumed to be locked.
sync.Mutex `json:"-" toml:"-"`
// contains filtered or unexported fields
}
BareMetal is the overall bare metal job manager.
func NewBareMetal ¶
func NewBareMetal() *BareMetal
func (*BareMetal) CancelJobs ¶
CancelJobs cancels list of job IDs. Returns error for jobs not found.
func (*BareMetal) FetchResults ¶
FetchResults gets job results back from server for given job id(s). Results are available as job.Results as a compressed tar file.
func (*BareMetal) Init ¶
func (bm *BareMetal) Init()
Init does the full initialization of the baremetal server.
func (*BareMetal) Interactive ¶
func (bm *BareMetal) Interactive()
Interactive runs the interpreter in interactive mode.
func (*BareMetal) JobStatus ¶
JobStatus gets current job data for given job id(s). An empty list returns all of the currently Active jobs.
func (*BareMetal) RecoverJob ¶ added in v0.1.3
RecoverJob reinstates job information so files can be recovered etc.
func (*BareMetal) StartBGUpdates ¶
func (bm *BareMetal) StartBGUpdates()
StartBGUpdates starts a ticker to update job status periodically.
func (*BareMetal) UpdateJobs ¶
UpdateJobs runs any pending jobs if there are available GPUs to run on. returns number of jobs started, and any errors incurred in starting jobs.
type Client ¶
type Client struct {
// The server address including port number.
Host string `default:"localhost:8585"`
Timeout time.Duration
// contains filtered or unexported fields
}
func (*Client) CancelJobs ¶
CancelJobs cancels list of job IDs. Returns error for jobs not found.
func (*Client) FetchResults ¶
FetchResults gets job results back from server for given job id(s). Results are available as job.Results as a compressed tar file.
func (*Client) RecoverJob ¶ added in v0.1.3
RecoverJob recovers a job which has been lost somehow. It just adds the given job to the job table.
func (*Client) UpdateJobs ¶
func (cl *Client) UpdateJobs()
UpdateJobs pings the server to run its updates. This happens automatically very 10 seconds but this is for the impatient.
type Job ¶
type Job struct {
// ID is the overall baremetal unique ID number.
ID int
// Status is the current status of the job.
Status Status
// Source is info about the source of the job, e.g., simrun sim project.
Source string
// Path is the path from the SSH home directory to launch the job in.
// This path will be created on the server when the job is run.
Path string
// Script is name of the job script to run, which must be at the top level
// within the given tar file.
Script string
// Files is the gzipped tar file of the job files set at submission.
Files []byte `display:"-"`
// ResultsGlob is a glob expression for the result files to get back
// from the server (e.g., *.tsv). job.out is automatically included as well,
// which has the job stdout, stederr output.
ResultsGlob string `display:"-"`
// Results is the gzipped tar file of the job result files, gathered
// at completion or when queried for results.
Results []byte `display:"-"`
// Submit is the time submitted.
Submit time.Time
// Start is the time actually started.
Start time.Time
// End is the time stopped running.
End time.Time
// ServerName is the name of the server it is running / ran on. Empty for pending.
ServerName string
// ServerGPU is the logical index of the GPU assigned to this job (0..N-1).
ServerGPU int
// pid is the process id of the job script.
PID int
}
Job is one bare metal job.
func JobsFromPB ¶
JobsFromPB returns Jobs from the protobuf version of given Jobs list.
type Server ¶
type Server struct {
// Name is the alias used for gossh.
Name string
// SSH is string to gossh to.
SSH string
// NGPUs is the number of GPUs on this server.
NGPUs int
// Used is a map of GPUs current being used.
Used map[int]bool `edit:"-" toml:"-"`
}
Server specifies a bare metal Server.
func (*Server) NextGPU ¶
NextGPU returns the next GPU index available, and adds it to the Used list. Returns -1 if none available.
type ServerAvail ¶
ServerAvail is used to report the number of available gpus per server.
type Status ¶
type Status int32 //enums:enum
Status are the job status values.
const ( // NoStatus is the unknown status state. NoStatus Status = iota // Pending means the job has been submitted, but not yet run. Pending // Running means the job is running. Running // Completed means the job finished on its own, with no error status. Completed // Canceled means the job was canceled by the user. Canceled // Errored means the job quit with an error Errored )
const StatusN Status = 6
StatusN is the highest valid value for type Status, plus one.
func StatusValues ¶
func StatusValues() []Status
StatusValues returns all possible values for the type Status.
func (Status) MarshalText ¶
MarshalText implements the encoding.TextMarshaler interface.
func (*Status) SetString ¶
SetString sets the Status value from its string representation, and returns an error if the string is invalid.
func (*Status) UnmarshalText ¶
UnmarshalText implements the encoding.TextUnmarshaler interface.