nodemanager

package
v1.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 22, 2021 License: Apache-2.0 Imports: 23 Imported by: 1

Documentation

Overview

not sure if this is a good pattern for decoupling the pod_controller from the node controller... Going to give it a try.

Index

Constants

View Source
const (
	ParameterCACertificate     = "ca.crt"
	ParameterServerCertificate = "server.crt"
	ParameterServerKey         = "server.key"
	ParameterItzoVersion       = "itzo_version"
	ParameterItzoURL           = "itzo_url"
	ParameterCellConfig        = "cell_config.yaml"
)

Variables

View Source
var (
	// TODO: this was changed to handle mac1.metal boot, ideally we should have different
	// bootTimeouts depending on instance family
	BootTimeout         time.Duration = 20 * time.Minute
	HealthyTimeout      time.Duration = 90 * time.Second
	HealthcheckPause    time.Duration = 5 * time.Second
	SpotRequestPause    time.Duration = 60 * time.Second
	BootImage           cloud.Image   = cloud.Image{}
	MaxBootPerIteration int           = 10
)

Making these vars makes it easier testing non-const timeouts were endorsed by Mitchell Hashimoto

Functions

This section is empty.

Types

type BindingNodeScaler

type BindingNodeScaler struct {
	// contains filtered or unexported fields
}

func NewBindingNodeScaler

func NewBindingNodeScaler(nodeReg StatusUpdater, standbyNodes []StandbyNodeSpec, bootLimiter *InstanceBootLimiter, defaultVolumeSize string, fixedSizeVolume bool) *BindingNodeScaler

func (*BindingNodeScaler) Compute

func (s *BindingNodeScaler) Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)

A brief summary of how we figure out what nodes need to be started and what nodes need to be shut down:

1. We only care about watiting pods and available or creat(ing|ed) nodes.

2. Re-generate the podNodeBinding map by looking at the existing bindings from nodes to pods in each node's node.Status.BoundPodName. Also make sure that any pods listed in there are actually still waiting (because the user might have killed a pod). Along the way we keep track of unbound pods and nodes.

3. Match any unbound pods to unbound nodes. Before doing that, we need to ensure that we order our pods and nodes so that we choose the most specific matches for our pods and nodes).

4. Any remaining unbound pods that haven't been matched will get a node booted for them with the exception of node requests that we know cannot be fulfilled due to unavailability in the cloud.

5. Finally, make sure that we have enough nodes to satisfy our standby pools of nodes.

At the end of this process, return the nodes that we should start, the nodes that need to be shut down and the current bindings map (so that the dispatcher can be fast).

type InstanceBootLimiter added in v1.0.0

type InstanceBootLimiter struct {
	// contains filtered or unexported fields
}

func NewInstanceBootLimiter added in v1.0.0

func NewInstanceBootLimiter() *InstanceBootLimiter

func (*InstanceBootLimiter) AddUnavailableInstance added in v1.0.0

func (s *InstanceBootLimiter) AddUnavailableInstance(instanceType string, spot bool)

func (*InstanceBootLimiter) IsUnavailableInstance added in v1.0.0

func (s *InstanceBootLimiter) IsUnavailableInstance(instanceType string, spot bool) bool

func (*InstanceBootLimiter) Start added in v1.0.0

func (s *InstanceBootLimiter) Start()

type InstanceConfig added in v1.0.5

type InstanceConfig struct {
	ItzoVersion string
	ItzoURL     string
	CellConfig  map[string]string
}

type NodeController

type NodeController struct {
	Config             NodeControllerConfig
	NodeRegistry       *registry.NodeRegistry
	LogRegistry        *registry.LogRegistry
	PodReader          registry.PodLister
	NodeDispenser      *NodeDispenser
	NodeScaler         ScalingAlgorithm
	CloudClient        cloud.CloudClient
	NodeClientFactory  nodeclient.ItzoClientFactoryer
	Events             *events.EventSystem
	PoolLoopTimer      *stats.LoopTimer
	ImageIdCache       *timeoutmap.TimeoutMap
	CloudInitFile      *cloudinitfile.File
	CertificateFactory *certs.CertificateFactory
	BootLimiter        *InstanceBootLimiter
	BootImageSpec      cloud.BootImageSpec
}

func (*NodeController) Dump

func (c *NodeController) Dump() []byte

func (*NodeController) ResumeWaits

func (c *NodeController) ResumeWaits()

When we restart the server, we had old nodes that we were starting or restarting and we were polling them in order to know when to change their state to available. We need to restart those polls.

func (*NodeController) Start

func (c *NodeController) Start(quit <-chan struct{}, wg *sync.WaitGroup)

func (*NodeController) StopCreatingNodes

func (c *NodeController) StopCreatingNodes()

If the controller was shut down while creating a node, it will remain in creating indeffinately since we don't have an instanceID for the node. Kill it here.

type NodeControllerConfig

type NodeControllerConfig struct {
	PoolInterval           time.Duration
	HeartbeatInterval      time.Duration
	ReaperInterval         time.Duration
	ItzoVersion            string
	ItzoURL                string
	CellConfig             map[string]string
	UseCloudParameterStore bool
	DefaultIAMPermissions  string
}

when configuring these intervals we want the following constraints to be satisfied:

1. The pool interval should be longer than the heartbeat interval 2. The heartbeat interval should be longer than the heartbeat client timeout.

type NodeDispenser

type NodeDispenser struct {
	NodeRequestChan chan NodeRequest
	NodeReturnChan  chan NodeReturn
}

func NewNodeDispenser

func NewNodeDispenser() *NodeDispenser

func (*NodeDispenser) RequestNode

func (e *NodeDispenser) RequestNode(requestingPod api.Pod) NodeReply

we pass in a copy of the requesting pod for safety reasons.

func (*NodeDispenser) ReturnNode

func (e *NodeDispenser) ReturnNode(nodeName string, unused bool)

type NodeReply

type NodeReply struct {
	Node *api.Node
	// When there's no binding for a pod, that either means the
	// pod is new or something might have gone wrong with the pod
	// spec, possibly it was created by a replicaSet and we can't
	// satisfy the placement spec of the pod.  We use NoBinding
	// to signal that we can't currently create a node for the pod.
	// if a pod remains unbound for too long, we can act accordingly
	// (e.g. for a replicaSet pod, we kill the pod).
	NoBinding bool
}

type NodeRequest

type NodeRequest struct {
	ReplyChan chan NodeReply
	// contains filtered or unexported fields
}

type NodeReturn

type NodeReturn struct {
	NodeName string
	Unused   bool
}

type ScalingAlgorithm

type ScalingAlgorithm interface {
	// todo, figure out what we really need to pass in
	// and return value will likely get much more complex
	//Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, int)
	Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)
}

type StandbyNodeSpec

type StandbyNodeSpec struct {
	InstanceType string `json:"instanceType"`
	Count        int    `json:"count"`
	Spot         bool   `json:"spot"`
	Dedicated    bool   `json:"dedicated"`
}

Used externally in provider.yaml to specify a buffered node

type StatusUpdater

type StatusUpdater interface {
	UpdateStatus(*api.Node) (*api.Node, error)
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL