nodemanager

package

v1.1.3 Latest Latest Go to latest Published: Apr 22, 2021 License: Apache-2.0 Imports: 23 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/elotl/kip

Links

Open Source Insights

Documentation ¶

Overview ¶

not sure if this is a good pattern for decoupling the pod_controller from the node controller... Going to give it a try.

Index ¶

Constants
Variables
type BindingNodeScaler
- func NewBindingNodeScaler(nodeReg StatusUpdater, standbyNodes []StandbyNodeSpec, ...) *BindingNodeScaler
- func (s *BindingNodeScaler) Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)
type InstanceBootLimiter
- func NewInstanceBootLimiter() *InstanceBootLimiter
- func (s *InstanceBootLimiter) AddUnavailableInstance(instanceType string, spot bool)
- func (s *InstanceBootLimiter) IsUnavailableInstance(instanceType string, spot bool) bool
- func (s *InstanceBootLimiter) Start()
type InstanceConfig
type NodeController
- func (c *NodeController) Dump() []byte
- func (c *NodeController) ResumeWaits()
- func (c *NodeController) Start(quit <-chan struct{}, wg *sync.WaitGroup)
- func (c *NodeController) StopCreatingNodes()
type NodeControllerConfig
type NodeDispenser
- func NewNodeDispenser() *NodeDispenser
- func (e *NodeDispenser) RequestNode(requestingPod api.Pod) NodeReply
- func (e *NodeDispenser) ReturnNode(nodeName string, unused bool)
type NodeReply
type NodeRequest
type NodeReturn
type ScalingAlgorithm
type StandbyNodeSpec
type StatusUpdater

Constants ¶

View Source

const (
	ParameterCACertificate     = "ca.crt"
	ParameterServerCertificate = "server.crt"
	ParameterServerKey         = "server.key"
	ParameterItzoVersion       = "itzo_version"
	ParameterItzoURL           = "itzo_url"
	ParameterCellConfig        = "cell_config.yaml"
)

Variables ¶

View Source

var (
	// TODO: this was changed to handle mac1.metal boot, ideally we should have different
	// bootTimeouts depending on instance family
	BootTimeout         time.Duration = 20 * time.Minute
	HealthyTimeout      time.Duration = 90 * time.Second
	HealthcheckPause    time.Duration = 5 * time.Second
	SpotRequestPause    time.Duration = 60 * time.Second
	BootImage           cloud.Image   = cloud.Image{}
	MaxBootPerIteration int           = 10
)

Making these vars makes it easier testing non-const timeouts were endorsed by Mitchell Hashimoto

Functions ¶

This section is empty.

Types ¶

type BindingNodeScaler ¶

type BindingNodeScaler struct {
	// contains filtered or unexported fields
}

func NewBindingNodeScaler ¶

func NewBindingNodeScaler(nodeReg StatusUpdater, standbyNodes []StandbyNodeSpec, bootLimiter *InstanceBootLimiter, defaultVolumeSize string, fixedSizeVolume bool) *BindingNodeScaler

func (*BindingNodeScaler) Compute ¶

func (s *BindingNodeScaler) Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)

A brief summary of how we figure out what nodes need to be started and what nodes need to be shut down:

1. We only care about watiting pods and available or creat(ing|ed) nodes.

2. Re-generate the podNodeBinding map by looking at the existing bindings from nodes to pods in each node's node.Status.BoundPodName. Also make sure that any pods listed in there are actually still waiting (because the user might have killed a pod). Along the way we keep track of unbound pods and nodes.

3. Match any unbound pods to unbound nodes. Before doing that, we need to ensure that we order our pods and nodes so that we choose the most specific matches for our pods and nodes).

4. Any remaining unbound pods that haven't been matched will get a node booted for them with the exception of node requests that we know cannot be fulfilled due to unavailability in the cloud.

5. Finally, make sure that we have enough nodes to satisfy our standby pools of nodes.

At the end of this process, return the nodes that we should start, the nodes that need to be shut down and the current bindings map (so that the dispatcher can be fast).

type InstanceBootLimiter ¶ added in v1.0.0

type InstanceBootLimiter struct {
	// contains filtered or unexported fields
}

func NewInstanceBootLimiter ¶ added in v1.0.0

func NewInstanceBootLimiter() *InstanceBootLimiter

func (*InstanceBootLimiter) AddUnavailableInstance ¶ added in v1.0.0

func (s *InstanceBootLimiter) AddUnavailableInstance(instanceType string, spot bool)

func (*InstanceBootLimiter) IsUnavailableInstance ¶ added in v1.0.0

func (s *InstanceBootLimiter) IsUnavailableInstance(instanceType string, spot bool) bool

func (*InstanceBootLimiter) Start ¶ added in v1.0.0

func (s *InstanceBootLimiter) Start()

type InstanceConfig ¶ added in v1.0.5

type InstanceConfig struct {
	ItzoVersion string
	ItzoURL     string
	CellConfig  map[string]string
}

type NodeController ¶

type NodeController struct {
	Config             NodeControllerConfig
	NodeRegistry       *registry.NodeRegistry
	LogRegistry        *registry.LogRegistry
	PodReader          registry.PodLister
	NodeDispenser      *NodeDispenser
	NodeScaler         ScalingAlgorithm
	CloudClient        cloud.CloudClient
	NodeClientFactory  nodeclient.ItzoClientFactoryer
	Events             *events.EventSystem
	PoolLoopTimer      *stats.LoopTimer
	ImageIdCache       *timeoutmap.TimeoutMap
	CloudInitFile      *cloudinitfile.File
	CertificateFactory *certs.CertificateFactory
	BootLimiter        *InstanceBootLimiter
	BootImageSpec      cloud.BootImageSpec
}

func (*NodeController) Dump ¶

func (c *NodeController) Dump() []byte

func (*NodeController) ResumeWaits ¶

func (c *NodeController) ResumeWaits()

When we restart the server, we had old nodes that we were starting or restarting and we were polling them in order to know when to change their state to available. We need to restart those polls.

func (*NodeController) Start ¶

func (c *NodeController) Start(quit <-chan struct{}, wg *sync.WaitGroup)

func (*NodeController) StopCreatingNodes ¶

func (c *NodeController) StopCreatingNodes()

If the controller was shut down while creating a node, it will remain in creating indeffinately since we don't have an instanceID for the node. Kill it here.

type NodeControllerConfig ¶

type NodeControllerConfig struct {
	PoolInterval           time.Duration
	HeartbeatInterval      time.Duration
	ReaperInterval         time.Duration
	ItzoVersion            string
	ItzoURL                string
	CellConfig             map[string]string
	UseCloudParameterStore bool
	DefaultIAMPermissions  string
}

when configuring these intervals we want the following constraints to be satisfied:

1. The pool interval should be longer than the heartbeat interval 2. The heartbeat interval should be longer than the heartbeat client timeout.

type NodeDispenser ¶

type NodeDispenser struct {
	NodeRequestChan chan NodeRequest
	NodeReturnChan  chan NodeReturn
}

func NewNodeDispenser ¶

func NewNodeDispenser() *NodeDispenser

func (*NodeDispenser) RequestNode ¶

func (e *NodeDispenser) RequestNode(requestingPod api.Pod) NodeReply

we pass in a copy of the requesting pod for safety reasons.

func (*NodeDispenser) ReturnNode ¶

func (e *NodeDispenser) ReturnNode(nodeName string, unused bool)

type NodeReply ¶

type NodeReply struct {
	Node *api.Node
	// When there's no binding for a pod, that either means the
	// pod is new or something might have gone wrong with the pod
	// spec, possibly it was created by a replicaSet and we can't
	// satisfy the placement spec of the pod.  We use NoBinding
	// to signal that we can't currently create a node for the pod.
	// if a pod remains unbound for too long, we can act accordingly
	// (e.g. for a replicaSet pod, we kill the pod).
	NoBinding bool
}

type NodeRequest ¶

type NodeRequest struct {
	ReplyChan chan NodeReply
	// contains filtered or unexported fields
}

type NodeReturn ¶

type NodeReturn struct {
	NodeName string
	Unused   bool
}

type ScalingAlgorithm ¶

type ScalingAlgorithm interface {
	// todo, figure out what we really need to pass in
	// and return value will likely get much more complex
	//Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, int)
	Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)
}

type StandbyNodeSpec ¶

type StandbyNodeSpec struct {
	InstanceType string `json:"instanceType"`
	Count        int    `json:"count"`
	Spot         bool   `json:"spot"`
	Dedicated    bool   `json:"dedicated"`
}

Used externally in provider.yaml to specify a buffered node

type StatusUpdater ¶

type StatusUpdater interface {
	UpdateStatus(*api.Node) (*api.Node, error)
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL