scalesim

package module
v0.0.0-...-7916917 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2024 License: Apache-2.0 Imports: 14 Imported by: 0

README

scaler-simulator

This project is WIP - DO NOT TRY TILL RELEASE

Scaling Simulator that determines which garden worker pool must be scaled to host unschedulable pods

Setup

  1. Ensure you are using Go version 1.22. Use go version to check your version.
  2. Run ./hack/setup.sh
    1. This will generate a launch.env file in the project dir
    2. Ex: ./hack/setup.sh -p scalesim # setups scalesim for sap-landscape-dev (default) and scalesim cluster project
    3. Ex: ./hack/setup.sh -l staging -p scalesim # setups scalesim for sap-landscape-staging and scalesim cluster project
  3. Take a look at generated launch.env and change params to your liking if you want.
  4. Source the launch.env file using command below (only necessary once in term session)
    1. set -o allexport && source launch.env && set +o allexport
  5. Run the simulation server: go run cmd/scalesim/main.go
  6. The KUBECONFIG for simulated control plane should be generated at /tmp/scalesim-kubeconfig.yaml
    1. export KUBECONFIG=/tmp/scalesim-kubeconfig.yaml
    2. kubectl get ns
Executing within Goland/Intellij IDE
  1. Install the EnvFile plugin.
  2. There is a run configuration already checked-in at .idea/.idea/runConfigurations/LaunchSimServer.xml
    1. This will automatically source the generated launch.env leveraging the plugin
    2. You should be able to execute using Run > LaunchSimServer

Usage

Op Commands
Sync Virtual Cluster with Shoot Cluster

curl -XPOST localhost:8080/op/sync/<myShoot>

Clear Virtual Cluster

curl -XDELETE localhost:8080/op/virtual-cluster

Scenario Commands
Execute Scenario A

curl -XPOST localhost:8080/scenarios/A

Objectives

TODO: REFINE THE BELOW

Given a garden shoot configured with different worker pools and Pod(s) to be deployed on the shoot cluster: the simulator will report the following advice:

  1. In case scale-up is needed, the simulator will recommend which worker pool must be scaled-up to host the unschedulable pod(s).
  2. The simulator will recommend which node belonging to which worker pool will host the Pod(s)
  3. ?? Then check will be made against real-shoot cluster on which Pods will be deployed. The simulator's advice will be verified against real-world node scaleup and pod-assignment.

The above will be repeated for different worker pool and Pod specs representing various simulation scenarios

Simulator Mechanics

The Simulator works by replicating shoot cluster into its virtual cluster by maintaining its independent copy of api server+scheduler. The engine then executes various simulation scenarios.

graph LR
    engine--1:GetShootWorkerPooolAndClusterData-->ShootCluster
    subgraph ScalerSimulator
       engine--2:PopulateVirtualCluster-->apiserver
       engine--3:RunSimulation-->simulation
       simulation--DeployPods-->apiserver
       simulation--LauchNodesIfPodUnschedulable-->apiserver
       simulation--QueryAssignedNode-->apiserver
       scheduler--AssignPodToNode-->apiserver
       simulation--ScalingRecommendation-->advice
    end
    advice[(ScalingRecommendation)]
Demo Simulation Scenarios (4th March)

Other simulations are Work In Progress at the moment.

Scenario with daemon set pods and virtual nodes with reserved space

Simple Worker Pool with m5.large (vCPU:2,8GB).

graph TB
 subgraph WorkerPool-P1
  SpecB["machineType: m5.large\n(vCPU:2, 8GB)\nmin:1,max:5"]
 end
  1. We taint the existing nodes in the real shoot cluster.
  2. We create replicas num of App pods in the real shoot cluster.
  3. We get the daemon set pods from the real shoot cluster.
  4. We get the unscheduled app pods from the real shoot cluster.
  5. We synchronize the virtual cluster nodes with the real shoot cluster nodes.
  6. We scale all the virtual worker pools till max
    1. Node.Allocatable is now considered.
  7. We deploy the daemon set pods into the virtual cluster.
  8. We deploy the unscheduled application pods into the virtual cluster.
  9. We wait till there are no unscheduled pods or till timeout.
  10. We "Trim" the virtual cluster. (Delete empty nodes and daemon set pods on those nodes)
  11. We trim the Virtual Cluster after scheduler assigns pods.
  12. We obtain the Node<->Pod assignments
  13. We compute the scaling recommendation and print the same.
  14. We scale up the real shoot cluster and compare our scale-up recommendation against the shoot current scale-up.
Scenario with Declaration based Priority for Worker Pool Scale-Up

This is to demonstrate preference for worker pool over others through simple order of declaration.

3 Worker Pools in decreasing order of resources. We ask operation to configure shoot with a declaration based priority paying careful attention to their max bound

graph TB
   subgraph WorkerPool-P3
      SpecC["m5.large\n(vCPU:2, 8GB)\nmin:1,max:2"]
   end
   subgraph WorkerPool-P2
      SpecB["m5.xlarge\n(vCPU:4, 16GB)\nmin:1,max:2"]
   end
   subgraph WorkerPool-P1
      SpecA["m5.2xlarge\n(vCPU:8, 32GB)\nmin:1,max:2"]
   end
  1. We sync the virtual cluster nodes with real shoot cluster nodes.
  2. We deploy podA count of Pod-A's and podB count of Pod-B's.
  3. We go through each worker pool by order by declaration.
    1. We scale the worker pool till max.
    2. We wait till for an interval to permit scheduler to assign pods to nodes.
    3. If there are still un-schedulable Pods we continue to next worker pool, else break.
  4. We trim the Virtual Cluster after scheduler finishes.
  5. We obtain the Node<->Pod assignments
  6. We compute the scaling recommendation and print the same.

This mechanism ensures that Nodes belonging to preferenced worker pool of higher priority are scaled first before pools of lower priority.

TODO: We can also enhance this scenario with a simulataed back-off when WPs run out of capacity.

Scenario: Tainted Worker Pools.
graph TB
 subgraph WP-B
  SpecB["machineType: m5.large\nmin:1,max:2"]
 end
 subgraph WP-A
  SpecA["machineType: m5.large\nmin:1,max:2,Taint:foo=bar:NoSchedule"]
 end
  • First worker pool is tainted with NoSchedule.
  • 2 Pod spec: X,Y are created: one with toleration to the taint and one without repectively.
Step-A
  1. Replicas of Pod-X are deployed which crosses the capacity of tainted node belonging to WP-A
  2. The simulation should advice scaling WP-A and assign the Pod to tainted nodes of WP-A.
Step-B
  1. More replicas of Pod-X are created which cannot fit into WP-A since it has reached its max.
  2. The simulator should report WP-A max is exceeded, pod replicas remain unschedulable and no other WP should be scaled.
Step-C
  1. Many replicas of the Pod-Y (spec without toleration) are deployed which crosses the capacity of existing node in WP-B
  2. The simulation should scale WP-B and assign the Pod to nodes of WP-B
Scenario: Topology Spread Constraints
graph TB
 subgraph WP-A
  SpecB["machineType: m5.large\nmin:1,max:3, zones:a,b,c"]
 end

One Existing Worker Pool with 3 assigned zones There is one node started in the first zone a.

POD-X has spec with replicas:3, topologySpreadConstraints with a maxSkew: 1 and whenUnsatisfiable: DoNotSchedule

Step-A
  1. Deploy Pod-X mandating distribution of each replica on separate zone.
  2. Simulator should recommend scaling Nodes for zones b, c
Scenario: High Load with large number of Diff Pods

Check out how much time would such a simulation of node scale up take here.

  • 400+pods
Scenaio: Worker Pool Expansion By Priority
  • Scale up WP in order of priority until max is reached, , then move to next WP in priority.
  • Analogues our CA priority expander.

PROBLEM:

  • We need a better algo than launching virtual nodes one-by-one across pools with priority.
  • we need to measure how fast this approach is using virtual nodes with large number of Pods aand Worker Pools.
  • TODO: Look into whether kube-scheduler has recommendation advice.
Scenaio: Workload Redistribution (STRETCH)
  • Kerpenter like mechanics
Simple Scale Down of empty node(s).

We have a worker pool with started nodes and min-0.

Step-A
  1. All Pods are un-deployed.
  2. After scaleDownThreshold time, the WP should be scaled down to min.
Scale Down of un-needed node (STRETCH)

This requires resource utilization computation and we won't do this for now.

TODO: Maddy will describe this.

WP Out of Capacity (STRETCH)

TODO: describe me

MoM 14th

Vedran's concerns:

  • Load with large number of Pods and Pools
  • Reduce computational weight when where is a priority expander. Check performance.
  • How to determine scheduler ran into error and failed assignment.
  • How easy is it to consume the result of the kube-scheduler in case there is no assigned node.
  • machine selector approach may not be computationally scalable ??
    • in order to be computationally feasible we need the node priiority scores from the scheduler.
Prep for vedran

What do demo for vedran today ?

  • daemon set + allocabtle is taken care of.
  • declaration based priority -

We will let him know that we will take up: a) machine cost minimization b) machine resource minimization c) performance load test d) stretch: simple scale-down and then wind up the POC

Documentation

Overview

Package scalesim contains the API interface and structural types for the scaler simular project

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AllPricing

type AllPricing struct {
	Results []InstancePricing `json:"results"`
}

type Engine

type Engine interface {
	http.Handler
	VirtualClusterAccess() VirtualClusterAccess
	ShootAccess(shootName string) ShootAccess
	SyncVirtualNodesWithShoot(ctx context.Context, shootName string) error
	ScaleWorkerPoolsTillMaxOrNoUnscheduledPods(ctx context.Context, scenarioName string, since time.Time, shoot *gardencore.Shoot, w http.ResponseWriter) (int, error)
	ScaleAllWorkerPoolsTillMax(ctx context.Context, scenarioName string, shoot *gardencore.Shoot, w http.ResponseWriter) (int, error)
	ScaleWorkerPoolsTillNumZonesMultPoolsMax(ctx context.Context, scenarioName string, shoot *gardencore.Shoot, w http.ResponseWriter) (int, error)
	ScaleWorkerPoolTillMax(ctx context.Context, scenarioName string, pool *gardencore.Worker, w http.ResponseWriter) (int, error)
}

Engine is the primary simulation driver facade of the scaling simulator. Since Engine register routes for driving simulation scenarios it extends http.Handler

type InstancePricing

type InstancePricing struct {
	InstanceType string       `json:"instance_type"`
	VCPU         float64      `json:"vcpu"`
	Memory       float64      `json:"memory"`
	EDPPrice     PriceDetails `json:"edp_price"`
}

type NodePodAssignment

type NodePodAssignment struct {
	NodeName     string   `json:"nodeName"`
	ZoneName     string   `json:"zoneName"`
	PoolName     string   `json:"poolName"`
	InstanceType string   `json:"instanceType"`
	PodNames     []string `json:"podNames"`
}

func (NodePodAssignment) String

func (n NodePodAssignment) String() string

type NodePool

type NodePool struct {
	Name        string
	Zones       []string
	Max         int32
	Current     int32
	MachineType string
}

NodePool describes a worker pool in the shoot.

type NodeRunResult

type NodeRunResult struct {
	NodeName         string
	Pool             *gardencore.Worker
	WasteRatio       float64
	UnscheduledRatio float64
	CostRatio        float64
	CumulativeScore  float64
	//NumAssignedPods  int
	NumAssignedPodsToNode int
	NumAssignedPodsTotal  int
}

func (NodeRunResult) String

func (n NodeRunResult) String() string

type NodeRunResults

type NodeRunResults map[string]NodeRunResult

func (NodeRunResults) GetTotalAssignedPods

func (ns NodeRunResults) GetTotalAssignedPods() int

func (NodeRunResults) GetWinner

func (ns NodeRunResults) GetWinner() NodeRunResult

type PriceDetails

type PriceDetails struct {
	PayAsYouGo    float64 `json:"pay_as_you_go"`
	Reserved1Year float64 `json:"ri_1_year"`
	Reserved3Year float64 `json:"ri_3_years"`
}

type Recommendation

type Recommendation struct {
	WorkerPoolName string
	Replicas       int
	Cost           float64
	Waste          resource.Quantity
	Allocatable    resource.Quantity
}

type Recommendations

type Recommendations map[string]*Recommendation

func (Recommendations) String

func (r Recommendations) String() string

type Recommender

type Recommender struct {
	Engine          Engine
	ScenarioName    string
	ShootName       string
	StrategyWeights StrategyWeights
	LogWriter       http.ResponseWriter
}

type ScalerRecommendations

type ScalerRecommendations map[string]int

func (ScalerRecommendations) String

func (s ScalerRecommendations) String() string

type Scenario

type Scenario interface {
	http.Handler
	Description() string
	// Name is the name of this scenario
	Name() string
	//ShootName is the name of the shoot that the scenario executes against
	ShootName() string
}

Scenario represents a scaling simulation scenario. Each scenario is invocable by an HTTP endpoint and hence extends http.Handler

type ShootAccess

type ShootAccess interface {
	// ProjectName returns the project name that the shoot belongs to
	ProjectName() string

	// GetShootObj returns the shoot object describing the shoot cluster
	GetShootObj() (*gardencore.Shoot, error)

	// GetNodes returns slice of nodes of the shoot cluster
	GetNodes() ([]*corev1.Node, error)

	// GetUnscheduledPods returns slice of unscheduled pods of the shoot cluster
	GetUnscheduledPods() ([]corev1.Pod, error)

	GetDSPods() ([]corev1.Pod, error)

	// GetMachineDeployments returns slice of machine deployments of the shoot cluster
	GetMachineDeployments() ([]*machinev1alpha1.MachineDeployment, error)

	// ScaleMachineDeployment scales the given machine deployment to the given number of replicas
	ScaleMachineDeployment(machineDeploymentName string, replicas int32) error

	// CreatePods creates the given slice of k8s Pods in the shoot cluster
	CreatePods(filePath string, replicas int) error

	TaintNodes() error

	UntaintNodes() error

	DeleteAllPods() error

	CleanUp() error
}

ShootAccess is a facade to the real-world shoot data and real shoot cluster

type StrategyWeights

type StrategyWeights struct {
	LeastWaste float64
	LeastCost  float64
}

type VirtualClusterAccess

type VirtualClusterAccess interface {
	// KubeConfigPath gets path to the kubeconfig.yaml file that can be used by kubectl to connect to this vitual cluster
	KubeConfigPath() string

	// AddNodes adds the given slice of k8s Nodes to the virtual cluster
	AddNodes(context.Context, ...*corev1.Node) error

	// RemoveAllTaintsFromVirtualNodes removes the NoSchedule taint from all nodes in the virtual cluster
	RemoveAllTaintsFromVirtualNodes(context.Context) error

	// RemoveAllTaintsFromVirtualNode removes the NoSchedule taint from the given node in the virtual cluster
	RemoveAllTaintsFromVirtualNode(context.Context, string) error

	RemoveTaintFromVirtualNodes(ctx context.Context, taintKey string) error

	RemoveTaintFromVirtualNode(ctx context.Context, nodeName, taintKey string) error

	// AddTaintToNode adds the NoSchedule taint from the given node in the virtual cluster
	AddTaintToNode(context.Context, *corev1.Node) error

	// CreatePods creates the given slice of k8s Pods in the virtual cluster
	CreatePods(context.Context, string, ...corev1.Pod) error

	// AddPods adds pods in the virtual cluster
	AddPods(context.Context, ...corev1.Pod) error

	CreatePodsWithNodeAndScheduler(ctx context.Context, schedulerName string, nodeName string, pods ...corev1.Pod) error

	// CreatePodsFromYaml loads the pod yaml at the given podYamlPath and creates Pods for given number of replicas.
	CreatePodsFromYaml(ctx context.Context, podYamlPath string, replicas int) error

	// ApplyK8sObject applies all Objects into the virtual cluster
	ApplyK8sObject(context.Context, ...runtime.Object) error

	// ClearAll clears all k8s objects from the virtual cluster.
	ClearAll(ctx context.Context) error

	// ClearNodes  clears all nodes from the virtual cluster
	ClearNodes(context.Context) error

	// UpdateNodes updated nodes in the virtual cluster
	UpdateNodes(ctx context.Context, nodes ...corev1.Node) error

	// ClearPods clears all nodes from the virtual cluster
	ClearPods(context.Context) error

	// Shutdown shuts down all components of the virtualcluster cluster. Log all errors encountered during shutdown
	Shutdown()

	GetPod(ctx context.Context, fullName types.NamespacedName) (*corev1.Pod, error)
	GetNode(ctx context.Context, namespaceName types.NamespacedName) (*corev1.Node, error)
	ListEvents(cts context.Context) ([]corev1.Event, error)
	ListNodes(ctx context.Context) ([]corev1.Node, error)
	ListNodesInNodePool(ctx context.Context, nodePoolName string) ([]corev1.Node, error)

	// ListPods lists all pods from all namespaces
	ListPods(ctx context.Context) ([]corev1.Pod, error)

	// TrimCluster deletes unused nodes and daemonset pods on these nodes
	TrimCluster(ctx context.Context) error

	UpdatePods(ctx context.Context, pods ...corev1.Pod) error

	DeleteNode(ctx context.Context, name string) error

	DeletePods(ctx context.Context, pods ...corev1.Pod) error

	DeleteNodesWithMatchingLabels(ctx context.Context, labels map[string]string) error

	DeletePodsWithMatchingLabels(ctx context.Context, labels map[string]string) error

	ListNodesMatchingLabels(ctx context.Context, labels map[string]string) ([]corev1.Node, error)

	ListPodsMatchingLabels(ctx context.Context, labels map[string]string) ([]corev1.Pod, error)
	ListPodsMatchingPodNames(ctx context.Context, namespace string, podNames []string) ([]corev1.Pod, error)

	GetReferenceNode(ctx context.Context, instanceType string) (*corev1.Node, error)
}

VirtualClusterAccess represents access to the virtualcluster cluster managed by the simulator that shadows the real cluster

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL