multicluster-scheduler

module

v0.8.2 Latest Latest Go to latest Published: May 20, 2020 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/admiraltyio/multicluster-scheduler

Links

Open Source Insights

README ¶

Multicluster-Scheduler

Multicluster-scheduler is a system of Kubernetes controllers that intelligently schedules workloads across clusters. It is simple to use and simple to integrate with other tools.

Install multicluster-scheduler in each cluster that you want to federate. Configure clusters as sources and/or targets to build a centralized or decentralized topology.
Annotate any pod or pod template (e.g., of a Deployment, Job, or Argo Workflow, among others) in any source cluster with multicluster.admiralty.io/elect="".
Multicluster-scheduler mutates the elected pods into proxy pods scheduled on virtual-kubelet nodes representing target clusters, and creates delegate pods in the remote clusters (actually running the containers).
A feedback loop updates the statuses and annotations of the proxy pods to reflect the statuses and annotations of the delegate pods.
Services that target proxy pods are rerouted to their delegates, replicated across clusters, and annotated with io.cilium/global-service=true to be load-balanced across a Cilium cluster mesh, if installed (other integrations are possible, please tell us about your network setup).

Check out Admiralty's blog post demonstrating how to run an Argo workflow across clusters to combine data from different regions or clouds and better utilize resources, or ITNEXT's blog post describing an integration with Argo CD (scroll down to the relevant section). There are many other use cases: dynamic CDNs, multi-region high availability and disaster recovery, central access control and auditing, cloud bursting, clusters as cattle... Tell us about your use case.

Getting Started

We assume that you are a cluster admin for two clusters, associated with, e.g., the contexts "cluster1" and "cluster2" in your kubeconfig. We're going to install multicluster-scheduler in both clusters, and configure cluster1 as a source and target, and cluster2 as a target only. This topology is typical of a cloud bursting use case. Then, we will deploy a multi-cluster NGINX.

CLUSTER1=cluster1 # change me
CLUSTER2=cluster2 # change me

Note: you can easily create two clusters on your machine with kind.

Installation

Prerequisites

Important! Multicluster-scheduler requires Kubernetes v1.17 or 1.18 (unless you build from source on a fork k8s.io/kubernetes, cf. #19).

Cert-manager v0.11+ must be installed in each cluster:

helm repo add jetstack https://charts.jetstack.io
helm repo update

for CONTEXT in $CLUSTER1 $CLUSTER2
do
  kubectl --context $CONTEXT apply --validate=false -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.12/deploy/manifests/00-crds.yaml
  kubectl --context $CONTEXT create namespace cert-manager
  helm install cert-manager \
    --kube-context $CONTEXT \
    --namespace cert-manager \
    --version v0.12.0 \
    --wait \
    jetstack/cert-manager
done

Optional: Cilium cluster mesh

For cross-cluster service calls, we rely in this guide on a Cilium cluster mesh and global services. If you need this feature, install Cilium and set up a cluster mesh. If you install Cilium later, you may have to restart pods.

Helm

The recommended way to install multicluster-scheduler is with Helm (v3):

helm repo add admiralty https://charts.admiralty.io
helm repo update

kubectl --context "$CLUSTER1" create namespace admiralty
helm install multicluster-scheduler admiralty/multicluster-scheduler \
  --kube-context "$CLUSTER1" \
  --namespace admiralty \
  --version 0.8.2 \
  --set clusterName=c1 \
  --set targetSelf=true \
  --set targets[0].name=c2

kubectl --context "$CLUSTER2" create namespace admiralty
helm install multicluster-scheduler admiralty/multicluster-scheduler \
  --kube-context "$CLUSTER2" \
  --namespace admiralty \
  --version 0.8.2 \
  --set clusterName=c2

Important! At this point, multicluster-scheduler will be stuck at ContainerCreating in cluster1, because it needs a secret from its remote target cluster2, see below. Note: when we move to defining targets at runtime with a CRD, this won't happen.

Service Account Exchange

For cross-cluster source-target communications, i.e., for multicluster-scheduler in a source cluster (here, cluster1) to talk to the Kubernetes API servers of remote target clusters (here, cluster2), we need to create service accounts in the target clusters, extract their tokens as kubeconfig files, and save those files inside secrets in their source clusters.

Note: for a source cluster that targets itself (here, cluster1), multicluster-scheduler simply uses its own service account to talk to its own Kubernetes API server.

In this getting started guide, we use klum to create a service account for cluster1 in cluster2 (there are other ways, contact us while we work on documenting them).

In cluster2, install klum and create a User named c1, bound to the multicluster-scheduler-source cluster role at the cluster scope (you could bind it to one or several namespaces only, and configure multicluster-scheduler with namespaced targets, cf. full installation guide).

kubectl --context "$CLUSTER2" apply -f https://raw.githubusercontent.com/ibuildthecloud/klum/v0.0.1/deploy.yaml

# klum registers the User CRD at runtime so wait a bit, then

cat <<EOF | kubectl --context "$CLUSTER2" apply -f -
kind: User
apiVersion: klum.cattle.io/v1alpha1
metadata:
  name: c1
spec:
  clusterRoles:
    - multicluster-scheduler-source
EOF

The kubemcsa export command of multicluster-service-account makes it easy to prepare a kubeconfig secret. First, install kubemcsa (you don't need to deploy multicluster-service-account):

MCSA_RELEASE_URL=https://github.com/admiraltyio/multicluster-service-account/releases/download/v0.6.1
OS=linux # or darwin (i.e., OS X) or windows
ARCH=amd64 # if you're on a different platform, you must know how to build from source
curl -Lo kubemcsa "$MCSA_RELEASE_URL/kubemcsa-$OS-$ARCH"
chmod +x kubemcsa

Then, run kubemcsa export to generate a template for a secret containing a kubeconfig equivalent to the c1 service account (that was created by klum), and apply the template with kubectl in cluster1:

./kubemcsa export --context "$CLUSTER2" -n klum c1 --as c2 \
  | kubectl --context "$CLUSTER1" -n admiralty apply -f -

Important! kubemcsa export combines a service account token with the Kubernetes API server addresses and associated certificates of the clusters found in your local kubeconfig. The addresses and certificates are routable and valid from your machine, but they need to be routable/valid from pods in the scheduler's cluster as well. For example, if you're using kind, by default the address is 127.0.0.1:SOME_PORT, because kind exposes API servers on random ports of your machine. However, 127.0.0.1 has a different meaning from the scheduler pod. On Linux, you can generate a kubeconfig with kind get kubeconfig --internal that will work from your machine and from pods, because it uses the master node container's IP in the overlay network (e.g., 172.17.0.x), instead of 127.0.0.1. Unfortunately, that won't work on Windows/Mac. In that case, you can either run the commands above from a container, or tweak the result of kubemcsa export before piping it into kubectl apply, to override the secret's server and ca.crt data fields (TODO: support overrides in kubemcsa export).

Verification

After a minute, check that virtual nodes named admiralty-c1 and admiralty-c2 have been created in cluster1:

kubectl --context "$CLUSTER1" get node

Multi-Cluster Deployment

Multicluster-scheduler's pod admission controller operates in namespaces labeled with multicluster-scheduler=enabled. In cluster1, label the default namespace:

kubectl --context "$CLUSTER1" label namespace default multicluster-scheduler=enabled

Then, deploy NGINX in it with the election annotation on the pod template:

cat <<EOF | kubectl --context "$CLUSTER1" apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 10
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        multicluster.admiralty.io/elect: ""
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          requests:
            cpu: 100m
            memory: 32Mi
        ports:
        - containerPort: 80
EOF

Things to check:

The original pods have been transformed into proxy pods "running" on virtual nodes. Notice the original manifest saved as an annotation.
Delegate pods have been created in either cluster. Notice that their spec matches the original manifest.

kubectl --context "$CLUSTER1" get pods -o wide # (-o yaml for details)
kubectl --context "$CLUSTER2" get pods -o wide # (-o yaml for details)

Advanced Scheduling

Multicluster-scheduler supports standard Kubernetes scheduling constraints including node selectors, affinities, etc. It will ensure that delegate pods target clusters that have nodes that match those constraints. For example, your nodes may be labeled with failure-domain.beta.kubernetes.io/region or topology.kubernetes.io/region (among other common labels).

kubectl --context "$CLUSTER1" get nodes --show-labels
kubectl --context "$CLUSTER2" get nodes --show-labels

If your test setup doesn't have region labels, you can add some:

kubectl --context "$CLUSTER1" label nodes -l virtual-kubelet.io/provider!=admiralty topology.kubernetes.io/region=us
kubectl --context "$CLUSTER2" label nodes -l virtual-kubelet.io/provider!=admiralty topology.kubernetes.io/region=eu

To schedule a deployment to a particular region, just add a node selector to its pod template:

kubectl --context "$CLUSTER1" patch deployment nginx -p '{
  "spec":{
    "template":{
      "spec": {
        "nodeSelector": {
          "topology.kubernetes.io/region": "eu"
        }
      }
    }
  }
}'

After a little while, delegate pods in cluster1 (US) will be terminated and more will be created in cluster2 (EU).

Optional: Service Reroute and Globalization

Our NGINX deployment isn't much use without a service to expose it. Kubernetes services route traffic to pods based on label selectors. We could directly create a service to match the labels of the delegate pods, but that would make it tightly coupled with multicluster-scheduler. Instead, let's create a service as usual, targeting the proxy pods. If a proxy pod were to receive traffic, it wouldn't know how to handle it, so multicluster-scheduler will change the service's label selector for us, to match the delegate pods instead, whose labels are similar to those of the proxy pods, except that their keys are prefixed with multicluster.admiralty.io/.

If some or all of the delegate pods are in a different cluster, we also need the service to route traffic to them. For that, we rely in this guide on a Cilium cluster mesh and global services. Multicluster-scheduler will annotate the service with io.cilium/global-service=true and replicate it across clusters. (Multicluster-scheduler replicates any global service across clusters, not just services targeting proxy pods.)

kubectl --context "$CLUSTER1" expose deployment nginx

We just created a service in cluster1, alongside our deployment. However, in the previous step, we rescheduled all NGINX pods to cluster2. Check that the service was rerouted, globalized, and replicated to cluster2:

kubectl --context "$CLUSTER1" get service nginx -o yaml
# Check the annotations and the selector,
# then check that a copy exists in cluster2:
kubectl --context "$CLUSTER2" get service nginx -o yaml

Now call the delegate pods in cluster2 from cluster1:

kubectl --context "$CLUSTER1" run foo -it --rm --image alpine --command -- sh -c "apk add curl && curl nginx"

Community

Need help to install/use multicluster-scheduler or integrate it with your stack? Found a bug? Or perhaps you'd like to request or even contribute a feature. Please file an issue or talk to us on Admiralty's community chat.

Directories ¶

Path	Synopsis
cmd
agent
remove-finalizers
scheduler
hack
pkg
apis Package apis contains Kubernetes API groups.	Package apis contains Kubernetes API groups.
apis/config/v1alpha3
apis/multicluster Package multicluster contains multicluster API versions	Package multicluster contains multicluster API versions
apis/multicluster/v1alpha1 Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io	Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io Package v1alpha1 contains API Schema definitions for the multicluster v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=admiralty.io/multicluster-scheduler/pkg/apis/multicluster +k8s:defaulter-gen=TypeMeta +groupName=multicluster.admiralty.io
common
config/agent
controller
controllers/chaperon
controllers/feedback
controllers/globalsvc
controllers/svcreroute
generated/clientset/versioned This package has the automatically generated clientset.	This package has the automatically generated clientset.
generated/clientset/versioned/fake This package has the automatically generated fake clientset.	This package has the automatically generated fake clientset.
generated/clientset/versioned/scheme This package contains the scheme of the automatically generated clientset.	This package contains the scheme of the automatically generated clientset.
generated/clientset/versioned/typed/multicluster/v1alpha1 This package has the automatically generated typed clients.	This package has the automatically generated typed clients.
generated/clientset/versioned/typed/multicluster/v1alpha1/fake Package fake has the automatically generated clients.	Package fake has the automatically generated clients.
generated/informers/externalversions
generated/informers/externalversions/internalinterfaces
generated/informers/externalversions/multicluster
generated/informers/externalversions/multicluster/v1alpha1
generated/listers/multicluster/v1alpha1
model/delegatepod
model/proxypod
scheduler_plugins/candidate
scheduler_plugins/proxy
vk/node
webhooks/proxypod

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL