TrustyAI Kubernetes Operator
Overview
The TrustyAI Kubernetes Operator aims at simplifying the deployment and management of the TrustyAI service on Kubernetes and OpenShift clusters by watching for custom resources of kind TrustyAIService
in the trustyai.opendatahub.io
API group and manages deployments, services, and optionally, routes and ServiceMonitors
corresponding to these resources.
The operator ensures the service is properly configured, is discoverable by Prometheus for metrics scraping (on both Kubernetes and OpenShift), and is accessible via a Route on OpenShift.
Prerequisites
- Kubernetes cluster v1.19+ or OpenShift cluster v4.6+
kubectl
v1.19+ or oc
client v4.6+
Installation using pre-built Operator image
This operator is available as an image on Quay.io.
To deploy it on your cluster:
-
Install the Custom Resource Definition (CRD):
Apply the CRD to your cluster (replace the URL with the relevant one, if using another repository):
kubectl apply -f https://raw.githubusercontent.com/trustyai-explainability/trustyai-service-operator/main/config/crd/bases/trustyai.opendatahub.io_trustyaiservices.yaml
-
Deploy the Operator:
Apply the following Kubernetes manifest to deploy the operator:
apiVersion: apps/v1
kind: Deployment
metadata:
name: trustyai-operator
namespace: trustyai-operator-system
spec:
replicas: 1
selector:
matchLabels:
control-plane: trustyai-operator
template:
metadata:
labels:
control-plane: trustyai-operator
spec:
containers:
- name: trustyai-operator
image: quay.io/trustyai/trustyai-service-operator:latest
command:
- /manager
resources:
limits:
cpu: 100m
memory: 30Mi
requests:
cpu: 100m
memory: 20Mi
or run
kubectl apply -f https://raw.githubusercontent.com/trustyai-explainability/trustyai-service-operator/main/artifacts/examples/deploy-operator.yaml
Usage
Once the operator is installed, you can create TrustyAIService
resources, and the operator will create corresponding TrustyAI deployments, services, and (on OpenShift) routes.
Here's an example TrustyAIService
manifest:
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: TrustyAIService
metadata:
name: trustyai-service-example
spec:
storage:
format: "PVC"
folder: "/inputs"
size: "1Gi"
data:
filename: "data.csv"
format: "CSV"
metrics:
schedule: "5s"
batchSize: 5000 # Optional, defaults to 5000
You can apply this manifest with
kubectl apply -f <file-name.yaml> -n $NAMESPACE
to create a service, where $NAMESPACE
is the namespace where you want to deploy it.
Additionally, in that namespace:
- a
ServiceMonitor
will be created to allow Prometheus to scrape metrics from the service.
- (if on OpenShift) a
Route
will be created to allow external access to the service.
Custom Image Configuration using ConfigMap
You can specify a custom TrustyAI-service image via adding parameters to the TrustyAI-Operator KFDef, for example:
apiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
name: trustyai-service-operator
namespace: opendatahub
spec:
applications:
- kustomizeConfig:
repoRef:
name: manifests
path: config
parameters:
- name: trustyaiServiceImage
value: NEW_IMAGE_NAME
name: trustyai-service-operator
repos:
- name: manifests
uri: https://github.com/trustyai-explainability/trustyai-service-operator/tarball/main
version: v1.0.0
If these parameters are unspecified, the default image and tag will be used.
If you'd like to change the service image/tag after deploying the operator, simply change the parameters in the KFDef. Any
TrustyAI service deployed subsequently will use the new image and tag.
TrustyAIService
Status Updates
The TrustyAIService
custom resource tracks the availability of InferenceServices
and PersistentVolumeClaims (PVCs)
through its status
field. Below are the status types and reasons that are available:
InferenceService
Status
Status Type |
Status Reason |
Description |
InferenceServicesPresent |
InferenceServicesNotFound |
InferenceServices were not found. |
InferenceServicesPresent |
InferenceServicesFound |
InferenceServices were found. |
PersistentVolumeClaim
(PVCs) Status
Status Type |
Status Reason |
Description |
PVCAvailable |
PVCNotFound |
PersistentVolumeClaim not found. |
PVCAvailable |
PVCFound |
PersistentVolumeClaim found. |
Status Behavior
- If a PVC is not available, the
Ready
status of TrustyAIService
will be set to False
.
- However, if
InferenceServices
are not found, the Ready
status of TrustyAIService
will not be affected, i.e., it is Ready
by all other conditions, it will remain so.
Contributing
Please see the CONTRIBUTING.md file for more details on how to contribute to this project.
License
This project is licensed under the Apache License Version 2.0 - see the LICENSE file for details.