Documentation ¶
Overview ¶
Package scheduler provides primitives for scheduling imbalanced data. In addition to static scheduling that reduces the load imbalance, it supports a feedback-directed optimization that adaptively adjusts the workload on each worker and a guided optimization that minimizes the zero padding for packed sequences.
Index ¶
Constants ¶
const ( STATIC = iota DYNAMIC GUIDED )
Variables ¶
This section is empty.
Functions ¶
Types ¶
type DynamicScheduler ¶
type DynamicScheduler struct {
// contains filtered or unexported fields
}
DynamicScheduler provides a feedback-directed optimization. It adaptively adjusts the workload on each worker, which can be useful in heterogeneous clusters where the workers have different compute capabilities.
func NewDynamicScheduler ¶
func NewDynamicScheduler(dataset data.Dataset, worldSize, batchSize int, sizes []int64) *DynamicScheduler
NewDynamicScheduler creates a new dynamic scheduler with the given arguments.
func (*DynamicScheduler) Len ¶
func (s *DynamicScheduler) Len() int
func (*DynamicScheduler) OnEpochEnd ¶
func (s *DynamicScheduler) OnEpochEnd(epoch, rank int64, coefficient, intercept float64)
OnEpochEnd updates the worker profile with the given feedback.
func (*DynamicScheduler) OnTrainEnd ¶
func (s *DynamicScheduler) OnTrainEnd()
func (*DynamicScheduler) Schedule ¶
func (s *DynamicScheduler) Schedule(step int) [][]int64
Schedule assigns the next mini-batch to each of the workers based on their performance indicators. It adopts best-fit-decreasing with a random first pivot to equalize the training time while randomizing the training sequence. This is a revised version of our original scheme for straggler mitigation against imbalanced data, which has been proposed in the 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid). Chronica paper: https://ieeexplore.ieee.org/document/10171495
type GuidedScheduler ¶ added in v0.2.0
type GuidedScheduler struct {
// contains filtered or unexported fields
}
GuidedScheduler is a padding-aware scheduler that provides a guided optimization for packed sequences. It accelerates training by reducing unnecessary operations caused by zero padding.
func NewGuidedScheduler ¶ added in v0.2.0
func NewGuidedScheduler(dataset data.Dataset, worldSize, batchSize int, sizes []int64) *GuidedScheduler
NewGuidedScheduler creates a new guided scheduler with the given arguments.
func (*GuidedScheduler) Len ¶ added in v0.2.0
func (s *GuidedScheduler) Len() int
func (*GuidedScheduler) OnEpochEnd ¶ added in v0.2.0
func (s *GuidedScheduler) OnEpochEnd(epoch, rank int64, coefficient, intercept float64)
func (*GuidedScheduler) OnTrainEnd ¶ added in v0.2.0
func (s *GuidedScheduler) OnTrainEnd()
func (*GuidedScheduler) Schedule ¶ added in v0.2.0
func (s *GuidedScheduler) Schedule(step int) [][]int64
Schedule assigns the next mini-batch to each of the workers while minimizing the zero padding. The higher the rank, the larger the mini-batches are assigned in that more workloads such as scheduling and parameter synchronization are given to the master.
type Scheduler ¶
type Scheduler interface { // Schedule selects data samples for the next mini-batch. Schedule(step int) [][]int64 // Len returns the number of required steps for each training epoch. Len() int // OnEpochEnd is called at the end of an epoch during training. OnEpochEnd(epoch, rank int64, coefficient, intercept float64) // OnTrainEnd terminates the training environment. OnTrainEnd() }
Scheduler represents the data scheduler. In addition to the scheduling algorithm, one should implement callbacks called OnEpochEnd and OnTrainEnd, which are called at the end of each epoch and training, respectively.
type StaticScheduler ¶
type StaticScheduler struct {
// contains filtered or unexported fields
}
StaticScheduler provides balanced workload to each of the workers while limiting the peak device memory usage; this allows for larger batch size, reducing the communication overheads and thereby improving the scalability.
func NewStaticScheduler ¶
func NewStaticScheduler(dataset data.Dataset, worldSize, batchSize int, sizes []int64) *StaticScheduler
NewStaticScheduler creates a new static scheduler with the given arguments.
func (*StaticScheduler) Len ¶
func (s *StaticScheduler) Len() int
func (*StaticScheduler) OnEpochEnd ¶
func (s *StaticScheduler) OnEpochEnd(epoch, rank int64, coefficient, intercept float64)
func (*StaticScheduler) OnTrainEnd ¶
func (s *StaticScheduler) OnTrainEnd()
func (*StaticScheduler) Schedule ¶
func (s *StaticScheduler) Schedule(step int) [][]int64
Schedule assigns the next mini-batch to each of the workers. It adopts first-fit-decreasing (FFD), which is an approximately-optimal heuristic for bin packing. FFD paper: https://dspace.mit.edu/bitstream/handle/1721.1/57819/17595570-MIT.pdf Python implementation: https://github.com/erelsgl/prtpy/blob/ebe54010513ea725f7a3221e4aa0258afa15d6fb/prtpy/packing/first_fit.py