multicluster-scheduler

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2018 License: Apache-2.0

README

Multicluster-Scheduler

Multicluster-scheduler is a system of Kubernetes controllers—a scheduler and its agents—that intelligently schedules workloads across clusters. It differs from Federation v2 in several ways:

  • Multi-cluster workloads can be declared in any member cluster and/or the scheduler's control plane (the Kubernetes API of the cluster hosting the scheduler), allowing greater architectural flexibility–e.g., don't give users direct access to the scheduler's cluster.
  • The agents push observations and pull scheduling decisions to/from the scheduler's control plane. The scheduler reconciles scheduling decisions with observations, but never calls the Kubernetes APIs of the member clusters.

Multicluster-scheduler implements a basic spread scheduler, which can be replaced by more advanced implementations leveraging pod, node, and node-pool observations. For example, Admiralty's hybrid and multicloud scheduler as a service, built upon multicluster-scheduler, supports strategies including burst-to-cloud and real-time arbitrage (cost optimization).

How it Works

Multicluster-scheduler includes custom resource definitions (CRDs) for:

  • node pools, which hold min/max node counts and pricing information;
  • multi-cluster workloads, e.g., multi-cluster deployments, multicluster-scheduler's user-facing API;
  • observations, images of observed node pools, nodes, pods, and multi-cluster workloads;
  • and decisions, images of desired single-cluster workloads.

Note: For now, multicluster-scheduler defines its own multi-cluster workload API, but we're considering integrations with Federation v2's API and/or Crossplane.

Node pools and multi-cluster workloads are defined in each member cluster, whereas observations and decisions are only defined in the scheduler's control plane.

The custom resources are controlled by two managers:

  • The agent, deployed in each member cluster, manages three controllers:
    • The node pool controller automatically creates node pool objects in the agent's cluster. In GKE and AKS, it uses the cloud.google.com/gke-nodepool or agentpool label, respectively; in the absence of those labels, a default node pool object is created. Min/max node counts and pricing information can be updated by the user, or controlled by other tools. Custom node pool objects can also be created using label selectors.
    • The observations controller, a multi-cluster controller, watches node pools, nodes, pods, and multi-cluster workloads in the agent's cluster and reconciles corresponding observations in the scheduler's control plane.
    • The decisions controller, another multi-cluster controller, watches scheduling decisions in the scheduler's control plane and reconciles corresponding single-cluster workloads in the agent's cluster.
  • The scheduler, deployed wherever, manages an eponymous controller that watches observations and makes scheduling decisions in the scheduler's cluster. For example, it creates one or several deployment decisions from a multi-cluster deployment observation, using node pool, node and pod observations to inform those decisions. The scheduler doesn't push anything to the member clusters.

Getting Started

We assume that you are a cluster admin for two clusters, associated with, e.g., the contexts "cluster1" and "cluster2" in your kubeconfig. We're going to install a basic scheduler in cluster1 and agents in cluster1 and cluster2, but the scripts can easily be adapted for other configurations. Then, we will deploy a multi-cluster NGINX.

CLUSTER1=cluster1 # change me
CLUSTER2=cluster2 # change me

Note: with Admiralty's hybrid and multicloud scheduler as a service, which is basically a managed scheduler with advanced features, you only need to install the agent and a Secret. To install the full multicluster-scheduler, please read on.

Installation
Scheduler

Choose a cluster to host the scheduler, download the basic scheduler's manifest and install it:

SCHEDULER_CLUSTER_NAME="$CLUSTER1"
RELEASE_URL=https://github.com/admiraltyio/multicluster-scheduler/releases/download/v0.1.0
kubectl config use-context "$SCHEDULER_CLUSTER_NAME"
kubectl apply -f "$RELEASE_URL/scheduler.yaml"
Federation

In the same cluster as the scheduler, create a namespace for each cluster federation and, in it, a service account and role binding for each member cluster. The scheduler's cluster can be a member too.

FEDERATION_NAMESPACE=foo
kubectl create namespace "$FEDERATION_NAMESPACE"
MEMBER_CLUSTER_NAMES=("$CLUSTER1" "$CLUSTER2") # Add as many member clusters as you want.
for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubectl create serviceaccount "$CLUSTER_NAME" \
        --namespace "$FEDERATION_NAMESPACE"
    kubectl create rolebinding "$CLUSTER_NAME" \
        --namespace "$FEDERATION_NAMESPACE" \
        --serviceaccount "$FEDERATION_NAMESPACE:$CLUSTER_NAME" \
        --clusterrole multicluster-scheduler-member
done
Multicluster-Service-Account

If you're already running multicluster-service-account in each member cluster, and the scheduler's cluster is known to them as $SCHEDULER_CLUSTER_NAME, you can skip this step and install the agents. Otherwise, read on.

Download the multicluster-service-account manifest and install it in each member cluster:

MCSA_RELEASE_URL=https://github.com/admiraltyio/multicluster-service-account/releases/download/v0.2.0
for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubectl --context "$CLUSTER_NAME" apply -f "$MCSA_RELEASE_URL/install.yaml"
done

Then, download the kubemcsa binary and run the bootstrap command to allow member clusters to import service accounts from the scheduler's cluster:

OS=linux # or darwin (i.e., OS X) or windows
ARCH=amd64 # if you're on a different platform, you must know how to build from source
curl -Lo kubemcsa "$MCSA_RELEASE_URL/kubemcsa-$OS-$ARCH"
chmod +x kubemcsa
sudo mv kubemcsa /usr/local/bin

for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubemcsa bootstrap "$CLUSTER_NAME" "$SCHEDULER_CLUSTER_NAME"
done
Agent

Download the agent manifest and install it in each member cluster:

curl -LO $RELEASE_URL/agent.yaml
for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    sed -e "s/SCHEDULER_CLUSTER_NAME/$SCHEDULER_CLUSTER_NAME/g" \
        -e "s/FEDERATION_NAMESPACE/$FEDERATION_NAMESPACE/g" \
        -e "s/CLUSTER_NAME/$CLUSTER_NAME/g" \
        agent.yaml > "agent-$CLUSTER_NAME.yaml"
    kubectl --context "$CLUSTER_NAME" apply -f "agent-$CLUSTER_NAME.yaml"
done

Check that node pool objects have been created in the agents' clusters and observations appear in the scheduler's control plane:

for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubectl --context "$CLUSTER_NAME" get nodepools
done
kubectl --namespace "$FEDERATION_NAMESPACE" get nodepoolobservations
kubectl --namespace "$FEDERATION_NAMESPACE" get nodeobservations
kubectl --namespace "$FEDERATION_NAMESPACE" get podobservations
# or
kubectl --namespace "$FEDERATION_NAMESPACE" get observations # by category
Example

Deploy NGINX as a MulticlusterDeployment in any of the member cluster, e.g., cluster2:

cat <<EOF | kubectl --context "$CLUSTER2" apply -f -
apiVersion: multicluster.admiralty.io/v1alpha1
kind: MulticlusterDeployment
metadata:
  name: nginx
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
EOF

Check that an observation of the multi-cluster deployment has been created in the scheduler's control plane, as well as one deployment decision for each member cluster, hence one deployment in each member cluster:

kubectl --namespace "$FEDERATION_NAMESPACE" get multiclusterdeploymentobservations
kubectl --namespace "$FEDERATION_NAMESPACE" get deploymentdecisions
for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubectl --context "$CLUSTER_NAME" get deployments
done

Bring Your Own Scheduler

You can easily implement your own multi-cluster scheduler using the Scheduler interface. For now, only deployments are supported. Here's a basic manager scaffolding, where the custom scheduler struct, which implements the Scheduler interface, is passed as an argument to the controller:

package main

import (
  "log"

  "admiralty.io/multicluster-controller/pkg/cluster"
  "admiralty.io/multicluster-controller/pkg/manager"
  "admiralty.io/multicluster-scheduler/pkg/apis/multicluster/v1alpha1"
  "admiralty.io/multicluster-scheduler/pkg/controllers/schedule"
  "admiralty.io/multicluster-service-account/pkg/config"
  appsv1 "k8s.io/api/apps/v1"
  corev1 "k8s.io/api/core/v1"
  _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
  "k8s.io/sample-controller/pkg/signals"
)

func main() {
  cfg, _, err := config.ConfigAndNamespace()
  if err != nil {
    log.Fatalf("cannot load config: %v", err)
  }
  cl := cluster.New("", cfg, cluster.Options{})

  co, err := schedule.NewController(cl, &scheduler{})
  if err != nil {
    log.Fatalf("cannot create scheduler controller: %v", err)
  }

  m := manager.New()
  m.AddController(co)

  if err := m.Start(signals.SetupSignalHandler()); err != nil {
    log.Fatalf("while or after starting manager: %v", err)
  }
}

type scheduler struct {
    // ... e.g., structured observations
}

func (s *scheduler) SetNodePool(np *v1alpha1.NodePool) {
    // ... e.g., store to inform scheduling decisions
    // Note: the controller has added a ClusterName to the object's metadata.
}

func (s *scheduler) SetNode(n *corev1.Node) {
    // ... e.g., store to inform scheduling decisions
    // Note: the controller has added a ClusterName to the object's metadata.
}
func (s *scheduler) SetPod(p *corev1.Pod) {
    // ... e.g., store to inform scheduling decisions
    // Note: the controller has added a ClusterName to the object's metadata.
}

func (s *scheduler) Schedule(mcd *v1alpha1.MulticlusterDeployment) ([]*appsv1.Deployment, error) {
    // ... Given a MulticlusterDeployment, produce a slice of Deployments,
    // where the ClusterName in each object's metadata MUST be provided.
}

API Reference

https://godoc.org/admiralty.io/multicluster-scheduler/

or

go get admiralty.io/multicluster-scheduler
godoc -http=:6060

then http://localhost:6060/pkg/admiralty.io/multicluster-scheduler/

Directories

Path Synopsis
cmd
pkg
apis
Package apis contains Kubernetes API groups.
Package apis contains Kubernetes API groups.
apis/multicluster
Package multicluster contains multicluster API versions
Package multicluster contains multicluster API versions
apis/multicluster/v1alpha1
Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io
Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL