agentrm

package
v0.0.0-...-3511abf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 2, 2023 License: Apache-2.0 Imports: 51 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BestFit

func BestFit(req *sproto.AllocateRequest, agent *agentState) float64

BestFit returns a float affinity score between 0 and 1 for the affinity between the task and the agent. This method attempts to allocate tasks to the agent that is both most utilized and offers the fewest slots. This method should be used when the cluster is dominated by multi-slot applications.

func MakeFitFunction

func MakeFitFunction(fittingPolicy string) func(
	*sproto.AllocateRequest, *agentState) float64

MakeFitFunction returns the corresponding fitting function.

func Single

func Single[K comparable, V any](m map[K]V) (kr K, vr V, ok bool)

Single asserts there's a single element in the map and take it.

func WorstFit

func WorstFit(req *sproto.AllocateRequest, agent *agentState) float64

WorstFit returns a float affinity score between 0 and 1 for the affinity between the task and the agent. This method attempts to allocate tasks to the agent that is least utilized. This method should be used when the cluster is dominated by single-slot applications.

Types

type HardConstraint

type HardConstraint func(req *sproto.AllocateRequest, agent *agentState) bool

HardConstraint returns true if the task can be assigned to the agent and false otherwise.

type ResourceManager

type ResourceManager struct {
	*actorrm.ResourceManager
}

ResourceManager is a resource manager for Determined-managed resources.

func New

func New(
	system *actor.System,
	db *db.PgDB,
	echo *echo.Echo,
	config *config.ResourceConfig,
	opts *aproto.MasterSetAgentOptions,
	cert *tls.Certificate,
) *ResourceManager

New returns a new ResourceManager, which manages communicating with and scheduling on Determined agents.

func (ResourceManager) CheckMaxSlotsExceeded

func (a ResourceManager) CheckMaxSlotsExceeded(name string, slots int) (bool, error)

CheckMaxSlotsExceeded checks if the job exceeded the maximum number of slots.

func (ResourceManager) DisableSlot

func (a ResourceManager) DisableSlot(
	req *apiv1.DisableSlotRequest,
) (resp *apiv1.DisableSlotResponse, err error)

DisableSlot implements 'det slot disable...' functionality.

func (ResourceManager) EnableSlot

func (a ResourceManager) EnableSlot(
	req *apiv1.EnableSlotRequest,
) (resp *apiv1.EnableSlotResponse, err error)

EnableSlot implements 'det slot enable...' functionality.

func (ResourceManager) GetAgents

func (a ResourceManager) GetAgents(
	msg *apiv1.GetAgentsRequest,
) (resp *apiv1.GetAgentsResponse, err error)

GetAgents gets the state of connected agents. Go around the RM and directly to the agents actor to avoid blocking asks through it.

func (*ResourceManager) GetSlot

func (a *ResourceManager) GetSlot(
	req *apiv1.GetSlotRequest,
) (resp *apiv1.GetSlotResponse, err error)

GetSlot implements rm.ResourceManager.

func (ResourceManager) NotifyContainerRunning

func (a ResourceManager) NotifyContainerRunning(
	msg sproto.NotifyContainerRunning,
) error

NotifyContainerRunning receives a notification from the container to let the master know that the container is running.

func (ResourceManager) ResolveResourcePool

func (a ResourceManager) ResolveResourcePool(name string, workspaceID, slots int) (string, error)

ResolveResourcePool fully resolves the resource pool name.

func (ResourceManager) TaskContainerDefaults

func (a ResourceManager) TaskContainerDefaults(
	pool string,
	fallbackConfig model.TaskContainerDefaultsConfig,
) (result model.TaskContainerDefaultsConfig, err error)

TaskContainerDefaults returns TaskContainerDefaults for the specified pool.

func (ResourceManager) ValidateResourcePool

func (a ResourceManager) ValidateResourcePool(name string) error

ValidateResourcePool validates existence of a resource pool.

func (ResourceManager) ValidateResourcePoolAvailability

func (a ResourceManager) ValidateResourcePoolAvailability(name string, slots int) (
	[]command.LaunchWarning,
	error,
)

ValidateResourcePoolAvailability is a default implementation to satisfy the interface.

func (ResourceManager) ValidateResources

func (a ResourceManager) ValidateResources(name string, slots int, command bool) error

ValidateResources ensures enough resources are available for a command.

type Scheduler

type Scheduler interface {
	Schedule(rp *resourcePool) ([]*sproto.AllocateRequest, []model.AllocationID)
	JobQInfo(rp *resourcePool) map[model.JobID]*sproto.RMJobInfo
}

Scheduler schedules tasks on agents. Its only function Schedule is called to determine which pending requests can be fulfilled and which scheduled tasks can be terminated. Schedule is expected to ba called every time there is a change to the cluster status, for example, new agents being connected, devices being disabled, and etc,. Schedule should avoid unnecessary shuffling tasks on agents to avoid the overhead of restarting a preempted task.

func MakeScheduler

func MakeScheduler(conf *config.SchedulerConfig) Scheduler

MakeScheduler returns the corresponding scheduler implementation.

func NewFairShareScheduler

func NewFairShareScheduler() Scheduler

NewFairShareScheduler creates a new scheduler that schedules tasks according to the max-min fairness of groups. For groups that are above their fair share, the scheduler requests them to terminate their idle tasks until they have achieved their fair share.

func NewPriorityScheduler

func NewPriorityScheduler(config *config.SchedulerConfig) Scheduler

NewPriorityScheduler creates a new scheduler that schedules tasks via priority.

func NewRoundRobinScheduler

func NewRoundRobinScheduler() Scheduler

NewRoundRobinScheduler creates a new scheduler that schedules tasks via round-robin of groups sorted low to high by their current allocated slots.

type SoftConstraint

type SoftConstraint func(req *sproto.AllocateRequest, agent *agentState) float64

SoftConstraint returns a score from 0 (lowest) to 1 (highest) representing how optimal is the state of the cluster if the task were assigned to the agent.

Directories

Path Synopsis
aws
gcp

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL