dask-operator

command module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 30, 2023 License: Apache-2.0 Imports: 35 Imported by: 0

README

dask-operator

A Kubernetes operator to deploy Daskhttps://www.dask.org/ clusters.

Installation

Apply the manifests in the deploy/ directory. Note this deploys into the current namespace, to deploy into a custom namespace, create the namespace and add a namespace: ... field to kustomization..yaml.

To create a test cluster, apply the manifests in the example/ directory. The cluster can be scaled with (change --replicas as desired):

kubectl scale clusters.dask.charmtx.com/example-dask --replicas=2

### Autoscaling

This implementation supports the Horizontal Pod Autoscaler, with the caveat that scaling to zero requires enabling an alpha feature-gate (see https://github.com/kubernetes/enhancements/issues/2021). This requires a few additional steps to expose the desired pods to the autoscaler.

The recommended solution is using the Prometheus Operator and Prometheus Adapter.

First, this requires that your scheduler pod has the prometheus_client library installed, otherwise Dask does not run the metrics endpoint.

Second, you need to scrape the metrics with a ServiceMonitor:

apiVersion: monitoring.coreos.com/v1,
kind: "ServiceMonitor",
metadata: { name: dask-scheduler },
spec:
    selector: { matchLabels: { "dask.charmtx.com/role": "scheduler" } }
    jobLabel: dask.charmtx.com/cluster
    endpoints: [{ port: "http-dashboard" }]
    namespaceSelector: { any: true }

Finally, you need to configure the prometheus adapter to serve the desired_workers metric, by adding the following item to the adapters rules:

- seriesQuery: dask_scheduler_desired_workers{namespace!="",service!=""}
  name:
    matches: ^(dask)_scheduler_(.*)$
    as: $1_$2
  resources:
    overrides:
      job: { group: dask.charmtx.com, resource: cluster }
      namespace: { resource: namespace }
  metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)

Background

This was created after some bugs in the upstream operator, which were difficult to resolve without a complete re-write. So, here is the re-write!

Documentation

Overview

This file contains the logic required to interact with the Dask API. This is required for graceful scaling down, workers can be retired without losing data stored in them.

This file contains the standard kubernetes boilerplate, very similar to the sample controller. It sets up watches on all of the resources we care about, and calls `syncHandler` whenever anything interesting happens.

This file contains the main business logic of the handler. Any changes to watched resources trigger `syncHandler`, which reconciles the cluster state with the dependent objects.

This file contains all of the template rendering code, which injects required configuration (e.g. env vars) into the templates provided by the cluster configuration.

Directories

Path Synopsis
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL