dedicated-admin-operator

module
v0.0.0-...-a2fc14a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 21, 2020 License: Apache-2.0

README

Dedicated Admin Operator

Summary

The Dedicated Admin Operator was created for the OpenShift Dedicated platform to manage permissions (via k8s RoleBindings) to all the projects/namespaces owned by the clients. The idea is to monitor the creation of new namespaces and add the proper permissions for the dedicated-admins, a group of local admin (not cluster admins) managed by the client.

It contains the following components:

  • Namespace controller: watches for new namespaces and guarantees that the proper RoleBindings are assigned to them.
  • RoleBinding controller: watches for rolebinding changes, if someone removes a dedicated admin RoleBinding, the controller adds it back
  • Operator controller: watches the operator's namespace to install resources that cannot be installed by OLM (service and servicemonitor)

To avoid giving admin permission to infra/cluster-admin related namespaces, a blacklist is used to determine which namespaces should get the RoleBinding assignment.

Metrics

The Dedicated Admin Operator exposes the following Prometheus metrics:

  • dedicated_admin_blacklisted: gauge of blacklisted namespaces

On OLM and Bundling

OLM can deploy a resource that is bundled in a CatalogSource. But OLM won't update it. And OLM won't delete it if it's removed in a future version of the CatalogSource. There are two other options for managing these resources. 1) controller in the operator code or 2) manage it externally. Option #1 is a lot of work, though it does cover the cases where a resource is deleted. Option #2 is less work but has a gap in that fixing a broken config requires external action.

In July 2019 a PR switched to option #2, relying on Hive to manage the resources via SelectorSyncSet. Hive will fix anything that breaks within 2 hours or a human can force it to sync by removing the related SyncSetInstance CR. It means no go code to manage the resources and a simpler deployment. This means ALL resources move out of the bundle.

We can go back to bundling in the future when OLM will manage bundled resources. It's causing pain now. Hive will reconcile resources moving forward.

Building

Dependencies

  • oyaml: pip install oyaml

Makefile

The following make targets are included.

  • clean - remove any generated output
  • build - run docker build
  • push - run docker push
  • gocheck - run go vet
  • gotest - run go test
  • gobuild - run go build
  • env - export useful env vars for use in other scripts

The following variables (with defaults) are available for overriding by the user of make:

  • OPERATOR_NAME - the name of the operator (dedicated-admin-operator)
  • OPERATOR_NAMESPACE - the operator namespace (openshift-dedicated-admin)
  • IMAGE_REGISTRY - target container registry (quay.io)
  • IMAGE_REPOSITORY - target container repository ($USER)
  • IMAGE_NAME - target image name ($OPERATOR_NAME)
  • ALLOW_DIRTY_CHECKOUT - if a dirty local checkout is allowed (false)

Note that IMAGE_REPOSITORY defaults to the current user's name. The default behavior of make build and make push will therefore be to create images in the user's namespace. Automation would override this to push to an organization like this:

IMAGE_REGISTRY=quay.io IMAGE_REPOSITORY=openshift-sre make build push

For local testing you might want to build with dirty checkouts. Keep in mind version is based on the number of commits and the latest git hash, so this is not desired for any officially published image and can cause issues for pulling latest images in some scenarios if tags (based on version) are reused.

ALLOW_DIRTY_CHECKOUT=true make build

Docker

The Dockerfile provided (in build/Dockerfile) takes advantage of the multi-stage feature, so docker version >= 17.05 is required. See make build

OLM

OLM catalog source is not generated by this codebase, but the make env target is created to support this process. See osd-operators [subject to rename / moving].

Testing, Manual

To test a new version of the operator in a cluster you need to:

  1. build a new image
  2. deploy the image to a registry that's available to the cluster
  3. deploy the updated operator to the cluster
  4. do validation

The following steps make some assumptions:

  • you can push images to a repository in quay.io called $USER
  • you are logged into an OCP cluster with enough permissions to deploy the operator and resources

Furthermore, if you have installed this via OLM you'll need to remove it else OLM will replace your deployment:

# remove subscription and operatorgroup
oc -n openshift-dedicated-admin delete subscription dedicated-admin-operator
oc -n openshift-dedicated-admin delete operatorgroup dedicated-admin-operator

Build and deploy updated version of the operator for test purposes with the following:

export IMAGE_REGISTRY=quay.io
export IMAGE_REPOSITORY=$USER
# build & push (with dirty checkout)
ALLOW_DIRTY_CHECKOUT=true make build push
# create deployment with correct image
sed "s|\(^[ ]*image:\).*|\1 $IMAGE_REGISTRY/$IMAGE_REPOSITORY/dedicated-admin-operator:latest|" manifests/10-dedicated-admin-operator.Deployment.yaml > /tmp/dedicated-admin-operator.Deployment.yaml
# deploy operator
find manifests/ -name '*Namespace*.yaml' -exec oc replace -f {} \;
find manifests/ -name '*Role*.yaml' -exec oc replace -f {} \;
find manifests/ -name '*Service.yaml' -exec oc apply -f {} \;
find manifests/ -name '*ServiceMonitor.yaml' -exec oc replace -f {} \;
find manifests/ -name '*Prometheus*.yaml' -exec oc replace -f {} \;
oc replace -f /tmp/dedicated-admin-operator.Deployment.yaml
# cleanup
unset IMAGE_REGISTRY
unset IMAGE_REPOSITORY
rm -f /tmp/dedicated-admin-operator.Deployment.yaml

Controllers

Namespace Controller

Watch for the creation of new Namespaces that are not part of the blacklist. When discovered create RoleBindings in that namespace to the dedicated-admins group for the following ClusterRoles:

  • admin
  • dedicated-admins-project

RoleBinding Controller

Watch for the deletion of RoleBindings owned by this operator. If a RoleBinding owned by this operator is deleted it is recreated.

Operator Controller

OLM currently cannot support creation of arbitrary resources when an operator is installed. Therefore the following are created by this operator by this controller at startup:

  • Service - for exposing metrics
  • ServiceMonitor - for prometheus to scrape metrics from the Service
  • ClusterRoleBinding - for dedicated-admins-cluster ClusterRole to dedicated-admins Group

Note the creation of ClusterRoleBindings is possible via a ClusterServiceVersion CR, used to deploy an operator. But it can only have a ServiceAccount as the subject. At this time you cannot create a ClusterRoleBinding to other subjects in a ClusterServiceVersion.

Create resources in the operator's Namespace

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL