trainer

package
v0.0.0-...-2c5db8d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 18, 2018 License: Apache-2.0 Imports: 21 Imported by: 0

Documentation

Overview

Package trainer is to manage Caffe2 training jobs.

Index

Constants

View Source
const (
	SuccessfulCreateReason = "SuccessfulCreate"
	FailedCreateReason     = "FailedCreate"
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Caffe2Config

type Caffe2Config struct {
	// Cluster represents a Caffe2 ClusterSpec.
	// See: https://www.tensorflow.org/api_docs/python/tf/train/ClusterSpechttps://www.tensorflow.org/api_docs/python/tf/train/ClusterSpec
	Cluster     ClusterSpec `json:"cluster"`
	Task        TaskSpec    `json:"task"`
	Environment string      `json:"environment"`
}

Caffe2Config is a struct representing the TensorFlow config. This struct is turned into an environment which is used by TensorFlow processes to configure themselves.

type Caffe2ReplicaSet

type Caffe2ReplicaSet struct {
	ClientSet kubernetes.Interface

	// Job is a pointer to the TrainingJob to which this replica belongs.
	Job  *TrainingJob
	Spec api.Caffe2ReplicaSpec
	// contains filtered or unexported fields
}

Caffe2ReplicaSet is a set of Caffe2 processes all acting as the same role (e.g. worker

func NewCaffe2ReplicaSet

func NewCaffe2ReplicaSet(clientSet kubernetes.Interface, recorder record.EventRecorder, caffe2ReplicaSpec api.Caffe2ReplicaSpec, job *TrainingJob) (*Caffe2ReplicaSet, error)

func (*Caffe2ReplicaSet) Create

func (s *Caffe2ReplicaSet) Create(config *api.ControllerConfig) error

func (*Caffe2ReplicaSet) Delete

func (s *Caffe2ReplicaSet) Delete() error

Delete deletes the replicas

func (*Caffe2ReplicaSet) GetSingleReplicaStatus

func (s *Caffe2ReplicaSet) GetSingleReplicaStatus(index int32) api.ReplicaState

func (*Caffe2ReplicaSet) GetStatus

func (s *Caffe2ReplicaSet) GetStatus() (api.Caffe2ReplicaStatus, error)

Status returns the status of the replica set.

func (*Caffe2ReplicaSet) Labels

func (s *Caffe2ReplicaSet) Labels() KubernetesLabels

Labels returns the labels for this replica set.

type Caffe2ReplicaSetInterface

type Caffe2ReplicaSetInterface interface {
	Create() error
	Delete() error
	GetStatus() (api.Caffe2ReplicaStatus, error)
}

Caffe2Replicas is an interface for managing a set of replicas.

type ClusterSpec

type ClusterSpec map[string][]string

ClusterSpec represents a cluster Caffe2 specification. https://www.tensorflow.org/deploy/distributed#create_a_tftrainclusterspec_to_describe_the_cluster It is a map from job names to network addresses.

type KubernetesLabels

type KubernetesLabels map[string]string

KubernetesLabels represents a set of labels to apply to a Kubernetes resources.

func (KubernetesLabels) ToSelector

func (l KubernetesLabels) ToSelector() (string, error)

ToSelector converts the labels to a selector matching the labels.

type TaskSpec

type TaskSpec struct {
	Type  string `json:"type"`
	Index int    `json:"index"`
}

type TrainingJob

type TrainingJob struct {
	KubeCli kubernetes.Interface

	Replicas []*Caffe2ReplicaSet
	// contains filtered or unexported fields
}

TODO: We should switch a New pattern and make trainingJob private so we can ensure correctness on creation.

func NewJob

func NewJob(kubeCli kubernetes.Interface, jobClient jobclient.Interface, recorder record.EventRecorder, job *api.Caffe2Job, config *api.ControllerConfig) (*TrainingJob, error)

func (*TrainingJob) ClusterSpec

func (j *TrainingJob) ClusterSpec() ClusterSpec

func (*TrainingJob) Delete

func (j *TrainingJob) Delete()

func (*TrainingJob) GetStatus

func (j *TrainingJob) GetStatus() (api.State, []*api.Caffe2ReplicaStatus, error)

func (*TrainingJob) Reconcile

func (j *TrainingJob) Reconcile(config *api.ControllerConfig) error

reconcile tries to get the job into the desired state.

func (*TrainingJob) UID

func (j *TrainingJob) UID() types.UID

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL