edl

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2020 License: Apache-2.0 Imports: 24 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AddResourceList

func AddResourceList(a v1.ResourceList, b v1.ResourceList)

AddResourceList add another v1.ResourceList to first's inner quantity. v1.ResourceList is equal to map[string]Quantity

Types

type Autoscaler

type Autoscaler struct {
	// contains filtered or unexported fields
}

Autoscaler launches and scales the training jobs.

func (*Autoscaler) OnAdd

func (a *Autoscaler) OnAdd(trainingjob *edlresource.TrainingJob)

OnAdd notifies the autoscaler that a job has been added.

func (*Autoscaler) OnDel

func (a *Autoscaler) OnDel(trainingjob *edlresource.TrainingJob)

OnDel notifies the autoscaler that a job has been deleted.

func (*Autoscaler) OnUpdate

func (a *Autoscaler) OnUpdate(trainingjob *edlresource.TrainingJob)

OnUpdate notifies the autoscaler that a job has been deleted.

func (*Autoscaler) Run

func (a *Autoscaler) Run()

Run monitors the cluster resources and training jobs in a loop, scales the training jobs according to the cluster resource.

type Cluster

type Cluster struct {
	// contains filtered or unexported fields
}

Cluster is our interface to the Kubernetes cluster. It can inquiry the cluster's overall status and the status of a specific PaddlePaddle trainning job. It can also create training jobs and replica.

TODO(yi): The above functionalities are NOT logically related with each other. I am not sure if it is a good idea to group them in this source file.

func (*Cluster) CreateJob

func (c *Cluster) CreateJob(j *batchv1.Job) (*batchv1.Job, error)

CreateJob creates a Job.

func (*Cluster) CreateReplicaSet

func (c *Cluster) CreateReplicaSet(r *v1beta1.ReplicaSet) (*v1beta1.ReplicaSet, error)

CreateReplicaSet creates a ReplicaSet.

func (*Cluster) DeleteReplicaSet

func (c *Cluster) DeleteReplicaSet(namespace, name string) error

DeleteReplicaSet delete a ReplicaSet and their pods.

func (*Cluster) DeleteTrainerJob

func (c *Cluster) DeleteTrainerJob(namespace, name string) error

DeleteTrainerJob deletes a trainerjob and their pods. see: https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/

func (*Cluster) GetReplicaSet

func (c *Cluster) GetReplicaSet(namespace, name string) (*v1beta1.ReplicaSet, error)

GetReplicaSet gets a ReplicaSet.

func (Cluster) GetTrainerJob

func (c Cluster) GetTrainerJob(job *edlresource.TrainingJob) (*batchv1.Job, error)

GetTrainerJob gets the trainer job spec.

func (Cluster) GetTrainerJobByName

func (c Cluster) GetTrainerJobByName(namespace, name string) (*batchv1.Job, error)

GetTrainerJobByName gets the trainer job spec.

func (*Cluster) InquiryResource

func (c *Cluster) InquiryResource() (res ClusterResource, err error)

InquiryResource returns the idle and total resources of the Kubernetes cluster.

func (Cluster) JobPods

func (c Cluster) JobPods(job *edlresource.TrainingJob) (total, running, pending int, err error)

JobPods returns the number total desired pods and the number of running pods of a job.

func (Cluster) UpdateTrainerJob

func (c Cluster) UpdateTrainerJob(job *batchv1.Job) error

UpdateTrainerJob updates the trainer job spec this will do the actual scale up/down.

type ClusterResource

type ClusterResource struct {
	NodeCount int // The total number of nodes in the cluster.

	// Each Kubernetes job could require some number of GPUs in
	// the range of [request, limit].
	GPURequest int // \sum_job num_gpu_request(job)
	GPULimit   int // \sum_job num_gpu_limit(job)
	GPUTotal   int // The total number of GPUs in the cluster

	// Each Kubernetes job could require some CPU timeslices in
	// the unit of *milli*.
	CPURequestMilli int64 // \sum_job cpu_request_in_milli(job)
	CPULimitMilli   int64 // \sum_job cpu_request_in_milli(job)
	CPUTotalMilli   int64 // The total amount of CPUs in the cluster in milli.

	// Each Kubernetes job could require some amount of memory in
	// the unit of *mega*.
	MemoryRequestMega int64 // \sum_job memory_request_in_mega(job)
	MemoryLimitMega   int64 // \sum_job memory_limit_in_mega(job)
	MemoryTotalMega   int64 // The total amount of memory in the cluster in mega.

	Nodes Nodes
}

ClusterResource is the resource of a cluster

type Controller

type Controller struct {
	// contains filtered or unexported fields
}

Controller for dispatching TrainingJob resource.

func New

func New(c *rest.RESTClient, cs *kubernetes.Clientset, maxLoadDesired float64) (*Controller, error)

New construct a new Controller struct

func (*Controller) Run

func (c *Controller) Run()

Run start to watch kubernetes events and do handlers.

func (*Controller) WatchTrainingJobs

func (c *Controller) WatchTrainingJobs()

WatchTrainingJobs moinitors trainingjobs resources.

type DefaultJobParser

type DefaultJobParser int

DefaultJobParser implement a basic JobParser.

func (*DefaultJobParser) ParseToMaster

func (p *DefaultJobParser) ParseToMaster(job *edlresource.TrainingJob) *v1beta1.ReplicaSet

ParseToMaster parse TrainingJob to a kubernetes replicaset resource.

func (*DefaultJobParser) ParseToPserver

func (p *DefaultJobParser) ParseToPserver(job *edlresource.TrainingJob) *v1beta1.ReplicaSet

ParseToPserver generate a pserver replicaset resource according to "TrainingJob" resource specs.

func (*DefaultJobParser) ParseToTrainer

func (p *DefaultJobParser) ParseToTrainer(job *edlresource.TrainingJob) *batchv1.Job

ParseToTrainer parse TrainingJob to a kubernetes job resource.

func (*DefaultJobParser) Validate

func (p *DefaultJobParser) Validate(job *edlresource.TrainingJob) error

Validate updates default values for the added job and validates the fields.

type EtcdClient added in v0.2.0

type EtcdClient struct {
	// contains filtered or unexported fields
}

EtcdClient is the etcd client that the pserver uses for fault tolerance, service registry and coordination.

func NewEtcdClient added in v0.2.0

func NewEtcdClient(endpoints string, numPservers int, dialtimeout time.Duration, ttlSec int) *EtcdClient

NewEtcdClient creates an EtcdClient

func (*EtcdClient) GetKey added in v0.2.0

func (e *EtcdClient) GetKey(key string, timeout time.Duration) ([]byte, error)

GetKey gets the value by the specified key

func (*EtcdClient) PutKey added in v0.2.0

func (e *EtcdClient) PutKey(key string, value []byte, timeout time.Duration, withLease bool) error

PutKey put into etcd with value by key specified

func (*EtcdClient) Register added in v0.2.0

func (e *EtcdClient) Register(port int) error

Register returns the index of the current pserver.

func (*EtcdClient) Shutdown added in v0.2.0

func (e *EtcdClient) Shutdown() error

Shutdown shuts down the etcd client gracefully.

type JobParser

type JobParser interface {
	Validate(job *edlresource.TrainingJob) error
	ParseToTrainer(job *edlresource.TrainingJob) *batchv1.Job
	ParseToPserver(job *edlresource.TrainingJob) *v1beta1.ReplicaSet
	ParseToMaster(job *edlresource.TrainingJob) *v1beta1.ReplicaSet
}

JobParser is a interface can parse "TrainingJob" to ReplicaSet and job.

type Nodes

type Nodes struct {
	NodesCPUIdleMilli   map[string]int64 // node id -> idle CPU
	NodesMemoryFreeMega map[string]int64 // node id -> free memory
}

Nodes records the amount of idle CPU and free memory of each node in the cluster.

func (*Nodes) String

func (ns *Nodes) String() string

Directories

Path Synopsis
apis
paddlepaddle/v1
Package v1 is the v1 version of the API.
Package v1 is the v1 version of the API.
client
clientset/versioned
This package has the automatically generated clientset.
This package has the automatically generated clientset.
clientset/versioned/fake
This package has the automatically generated fake clientset.
This package has the automatically generated fake clientset.
clientset/versioned/scheme
This package contains the scheme of the automatically generated clientset.
This package contains the scheme of the automatically generated clientset.
clientset/versioned/typed/paddlepaddle/v1
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
clientset/versioned/typed/paddlepaddle/v1/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL