multicluster-scheduler

module
Version: v0.8.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 20, 2020 License: Apache-2.0

README

Multicluster-Scheduler

Multicluster-scheduler is a system of Kubernetes controllers that intelligently schedules workloads across clusters. It is simple to use and simple to integrate with other tools.

  1. Install multicluster-scheduler in each cluster that you want to federate. Configure clusters as sources and/or targets to build a centralized or decentralized topology.
  2. Annotate any pod or pod template (e.g., of a Deployment, Job, or Argo Workflow, among others) in any source cluster with multicluster.admiralty.io/elect="".
  3. Multicluster-scheduler mutates the elected pods into proxy pods scheduled on virtual-kubelet nodes representing target clusters, and creates delegate pods in the remote clusters (actually running the containers).
  4. A feedback loop updates the statuses and annotations of the proxy pods to reflect the statuses and annotations of the delegate pods.
  5. Services that target proxy pods are rerouted to their delegates, replicated across clusters, and annotated with io.cilium/global-service=true to be load-balanced across a Cilium cluster mesh, if installed (other integrations are possible, please tell us about your network setup).

Check out Admiralty's blog post demonstrating how to run an Argo workflow across clusters to combine data from different regions or clouds and better utilize resources, or ITNEXT's blog post describing an integration with Argo CD (scroll down to the relevant section). There are many other use cases: dynamic CDNs, multi-region high availability and disaster recovery, central access control and auditing, cloud bursting, clusters as cattle... Tell us about your use case.

Getting Started

We assume that you are a cluster admin for two clusters, associated with, e.g., the contexts "cluster1" and "cluster2" in your kubeconfig. We're going to install multicluster-scheduler in both clusters, and configure cluster1 as a source and target, and cluster2 as a target only. This topology is typical of a cloud bursting use case. Then, we will deploy a multi-cluster NGINX.

CLUSTER1=cluster1 # change me
CLUSTER2=cluster2 # change me

Note: you can easily create two clusters on your machine with kind.

Installation
Prerequisites

Important! Multicluster-scheduler requires Kubernetes v1.17 or 1.18 (unless you build from source on a fork k8s.io/kubernetes, cf. #19).

Cert-manager v0.11+ must be installed in each cluster:

helm repo add jetstack https://charts.jetstack.io
helm repo update

for CONTEXT in $CLUSTER1 $CLUSTER2
do
  kubectl --context $CONTEXT apply --validate=false -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.12/deploy/manifests/00-crds.yaml
  kubectl --context $CONTEXT create namespace cert-manager
  helm install cert-manager \
    --kube-context $CONTEXT \
    --namespace cert-manager \
    --version v0.12.0 \
    --wait \
    jetstack/cert-manager
done
Optional: Cilium cluster mesh

For cross-cluster service calls, we rely in this guide on a Cilium cluster mesh and global services. If you need this feature, install Cilium and set up a cluster mesh. If you install Cilium later, you may have to restart pods.

Helm

The recommended way to install multicluster-scheduler is with Helm (v3):

helm repo add admiralty https://charts.admiralty.io
helm repo update

kubectl --context "$CLUSTER1" create namespace admiralty
helm install multicluster-scheduler admiralty/multicluster-scheduler \
  --kube-context "$CLUSTER1" \
  --namespace admiralty \
  --version 0.8.2 \
  --set clusterName=c1 \
  --set targetSelf=true \
  --set targets[0].name=c2

kubectl --context "$CLUSTER2" create namespace admiralty
helm install multicluster-scheduler admiralty/multicluster-scheduler \
  --kube-context "$CLUSTER2" \
  --namespace admiralty \
  --version 0.8.2 \
  --set clusterName=c2

Important! At this point, multicluster-scheduler will be stuck at ContainerCreating in cluster1, because it needs a secret from its remote target cluster2, see below. Note: when we move to defining targets at runtime with a CRD, this won't happen.

Service Account Exchange

For cross-cluster source-target communications, i.e., for multicluster-scheduler in a source cluster (here, cluster1) to talk to the Kubernetes API servers of remote target clusters (here, cluster2), we need to create service accounts in the target clusters, extract their tokens as kubeconfig files, and save those files inside secrets in their source clusters.

Note: for a source cluster that targets itself (here, cluster1), multicluster-scheduler simply uses its own service account to talk to its own Kubernetes API server.

In this getting started guide, we use klum to create a service account for cluster1 in cluster2 (there are other ways, contact us while we work on documenting them).

In cluster2, install klum and create a User named c1, bound to the multicluster-scheduler-source cluster role at the cluster scope (you could bind it to one or several namespaces only, and configure multicluster-scheduler with namespaced targets, cf. full installation guide).

kubectl --context "$CLUSTER2" apply -f https://raw.githubusercontent.com/ibuildthecloud/klum/v0.0.1/deploy.yaml

# klum registers the User CRD at runtime so wait a bit, then

cat <<EOF | kubectl --context "$CLUSTER2" apply -f -
kind: User
apiVersion: klum.cattle.io/v1alpha1
metadata:
  name: c1
spec:
  clusterRoles:
    - multicluster-scheduler-source
EOF

The kubemcsa export command of multicluster-service-account makes it easy to prepare a kubeconfig secret. First, install kubemcsa (you don't need to deploy multicluster-service-account):

MCSA_RELEASE_URL=https://github.com/admiraltyio/multicluster-service-account/releases/download/v0.6.1
OS=linux # or darwin (i.e., OS X) or windows
ARCH=amd64 # if you're on a different platform, you must know how to build from source
curl -Lo kubemcsa "$MCSA_RELEASE_URL/kubemcsa-$OS-$ARCH"
chmod +x kubemcsa

Then, run kubemcsa export to generate a template for a secret containing a kubeconfig equivalent to the c1 service account (that was created by klum), and apply the template with kubectl in cluster1:

./kubemcsa export --context "$CLUSTER2" -n klum c1 --as c2 \
  | kubectl --context "$CLUSTER1" -n admiralty apply -f -

Important! kubemcsa export combines a service account token with the Kubernetes API server addresses and associated certificates of the clusters found in your local kubeconfig. The addresses and certificates are routable and valid from your machine, but they need to be routable/valid from pods in the scheduler's cluster as well. For example, if you're using kind, by default the address is 127.0.0.1:SOME_PORT, because kind exposes API servers on random ports of your machine. However, 127.0.0.1 has a different meaning from the scheduler pod. On Linux, you can generate a kubeconfig with kind get kubeconfig --internal that will work from your machine and from pods, because it uses the master node container's IP in the overlay network (e.g., 172.17.0.x), instead of 127.0.0.1. Unfortunately, that won't work on Windows/Mac. In that case, you can either run the commands above from a container, or tweak the result of kubemcsa export before piping it into kubectl apply, to override the secret's server and ca.crt data fields (TODO: support overrides in kubemcsa export).

Verification

After a minute, check that virtual nodes named admiralty-c1 and admiralty-c2 have been created in cluster1:

kubectl --context "$CLUSTER1" get node
Multi-Cluster Deployment

Multicluster-scheduler's pod admission controller operates in namespaces labeled with multicluster-scheduler=enabled. In cluster1, label the default namespace:

kubectl --context "$CLUSTER1" label namespace default multicluster-scheduler=enabled

Then, deploy NGINX in it with the election annotation on the pod template:

cat <<EOF | kubectl --context "$CLUSTER1" apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 10
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        multicluster.admiralty.io/elect: ""
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          requests:
            cpu: 100m
            memory: 32Mi
        ports:
        - containerPort: 80
EOF

Things to check:

  1. The original pods have been transformed into proxy pods "running" on virtual nodes. Notice the original manifest saved as an annotation.
  2. Delegate pods have been created in either cluster. Notice that their spec matches the original manifest.
kubectl --context "$CLUSTER1" get pods -o wide # (-o yaml for details)
kubectl --context "$CLUSTER2" get pods -o wide # (-o yaml for details)
Advanced Scheduling

Multicluster-scheduler supports standard Kubernetes scheduling constraints including node selectors, affinities, etc. It will ensure that delegate pods target clusters that have nodes that match those constraints. For example, your nodes may be labeled with failure-domain.beta.kubernetes.io/region or topology.kubernetes.io/region (among other common labels).

kubectl --context "$CLUSTER1" get nodes --show-labels
kubectl --context "$CLUSTER2" get nodes --show-labels

If your test setup doesn't have region labels, you can add some:

kubectl --context "$CLUSTER1" label nodes -l virtual-kubelet.io/provider!=admiralty topology.kubernetes.io/region=us
kubectl --context "$CLUSTER2" label nodes -l virtual-kubelet.io/provider!=admiralty topology.kubernetes.io/region=eu

To schedule a deployment to a particular region, just add a node selector to its pod template:

kubectl --context "$CLUSTER1" patch deployment nginx -p '{
  "spec":{
    "template":{
      "spec": {
        "nodeSelector": {
          "topology.kubernetes.io/region": "eu"
        }
      }
    }
  }
}'

After a little while, delegate pods in cluster1 (US) will be terminated and more will be created in cluster2 (EU).

Optional: Service Reroute and Globalization

Our NGINX deployment isn't much use without a service to expose it. Kubernetes services route traffic to pods based on label selectors. We could directly create a service to match the labels of the delegate pods, but that would make it tightly coupled with multicluster-scheduler. Instead, let's create a service as usual, targeting the proxy pods. If a proxy pod were to receive traffic, it wouldn't know how to handle it, so multicluster-scheduler will change the service's label selector for us, to match the delegate pods instead, whose labels are similar to those of the proxy pods, except that their keys are prefixed with multicluster.admiralty.io/.

If some or all of the delegate pods are in a different cluster, we also need the service to route traffic to them. For that, we rely in this guide on a Cilium cluster mesh and global services. Multicluster-scheduler will annotate the service with io.cilium/global-service=true and replicate it across clusters. (Multicluster-scheduler replicates any global service across clusters, not just services targeting proxy pods.)

kubectl --context "$CLUSTER1" expose deployment nginx

We just created a service in cluster1, alongside our deployment. However, in the previous step, we rescheduled all NGINX pods to cluster2. Check that the service was rerouted, globalized, and replicated to cluster2:

kubectl --context "$CLUSTER1" get service nginx -o yaml
# Check the annotations and the selector,
# then check that a copy exists in cluster2:
kubectl --context "$CLUSTER2" get service nginx -o yaml

Now call the delegate pods in cluster2 from cluster1:

kubectl --context "$CLUSTER1" run foo -it --rm --image alpine --command -- sh -c "apk add curl && curl nginx"

Community

Need help to install/use multicluster-scheduler or integrate it with your stack? Found a bug? Or perhaps you'd like to request or even contribute a feature. Please file an issue or talk to us on Admiralty's community chat.

Directories

Path Synopsis
cmd
pkg
apis
Package apis contains Kubernetes API groups.
Package apis contains Kubernetes API groups.
apis/multicluster
Package multicluster contains multicluster API versions
Package multicluster contains multicluster API versions
apis/multicluster/v1alpha1
Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io
Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io
generated/clientset/versioned
This package has the automatically generated clientset.
This package has the automatically generated clientset.
generated/clientset/versioned/fake
This package has the automatically generated fake clientset.
This package has the automatically generated fake clientset.
generated/clientset/versioned/scheme
This package contains the scheme of the automatically generated clientset.
This package contains the scheme of the automatically generated clientset.
generated/clientset/versioned/typed/multicluster/v1alpha1
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
generated/clientset/versioned/typed/multicluster/v1alpha1/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL