Cambricon Device Plugin for Kubernetes
About Cambricon Device Plugin
The Cambricon device plugin for Kubernetes is a Daemonset which allows you to automatically:
- Report the quantity of MLU on each nodes of your cluster.
- Monitor the health status of MLUs.
- Be capable to run containers with MLU enabled.
This repository contains Cambricon's official implementation of the Kubernetes device plugin.
Prerequisites
The prerequisites for running the Cambricon Device Plugin:
- MLU270, MLU270-X5K, MLU220, MLU290, MLU370 devices
- MLU driver >= 4.15.11
- cntoolkit on your building machine >= 2.7.0
- cncl on your building machine >= 1.0.1
For MLU driver version 4.9.x, please use release v1.1.3.
For Kubernetes version < 1.19.0, mlulink topology-aware mode can not be used. If you want to use this feature, make sure your Kubernetes version >= 1.19.0.
Quick Start
Preparing your MLU Nodes
It is assumed that Cambricon MLU driver and neuware are installed on your MLU Nodes.
Download and build
git clone https://github.com/Cambricon/cambricon-k8s-device-plugin.git
cd cambricon-k8s-device-plugin/device-plugin
Set the following environment variables if you need.
env |
description |
APT_PROXY |
apt proxy address |
GOPROXY |
golang proxy address |
ARCH |
target platform architecture, amd64 or arm64, amd64 by default |
LIBCNDEV |
absolute path of the libcndev.so binary, neuware installation path by default |
BASE_IMAGE |
device plugin base image |
CNTOPO |
absolute path of the cntopo binary, neuware installation path by default |
Docker should be >= 17.05.0 on your building machine. If you want to cross build, make sure docker version >= 19.03.
For amd64:
GOPROXY=https://goproxy.cn ./build_image.sh
For arm64:
ARCH=arm64 GOPROXY=https://goproxy.cn ./build_image.sh
Please make sure Cambricon neuware and cncl is installed in your compiling environment.
It uses libcndev.so and cntopo binary on your compiling machine and generates docker image in folder ./image
.
Enabling MLU Support in Kubernetes
-
Push the docker image to the docker repo of your cluster or load the docker image on all your MLU nodes by:
docker load -i image/cambricon-k8s-device-plugin-amd64.tar
-
Enable MLU support in your cluster by deploying the daemonset in examples folder:
Set the args in the yaml file
args:
- --mode=default #device plugin mode: default, sriov, env-share, mlu-share or topology-aware
- --virtualization-num=1 # virtualization number for each MLU, used only in sriov mode or env-share mode
- --mlulink-policy=best-effort # MLULink topology policy: best-effort, guaranteed or restricted, used only in topology-aware mode
- --cnmon-path=/usr/bin/cnmon # host machine cnmon path, must be absolute path. comment out this line to avoid mounting cnmon.
#- --enable-console #uncomment to enable UART console device(/dev/ttyMS) in container
#- --disable-health-check #uncomment to disable health check
#- --enable-device-type #uncomment to enable device registration with type info
supported features:
- default: default mode
- sriov: supports SR-IOV. Set
virtualization-num
as number of VFs on host.
- env-share: a whole card can be allocated into multiple containers. A container should use only one card in this mode.
Set
virtualization-num
as maximum number of containers one MLU can be allocated into.
- mlu-share: mlu resources are allocated by memory. Only works when deployed along with Cambricon MLU Scheduler Extender.
- topology-aware: device plugin is aware of MLULink topology and tries to allocate MLUs forming a ring. Set
mlulink-policy
as described below.
MLULink topology policies, guaranteed and restricted only works for 1,2,4,8 requested MLUs:
- best-effort: allocate devices forming maximum number of rings whenever possible
- guaranteed: allocated devices must form at least one ring, otherwise return error
- restricted: for 2 MLUs and MLU290/MLU370-M8 4 MLUs, allocated devices must have 2 mlulink rings, otherwise return error
kubectl create -f examples/cambricon-device-plugin-daemonset.yaml
(Optional) If you do not want the daemonset way of deployment, edit the static pod template in examples folder and
put the file into your configured static pod folder (/etc/kubernetes/manifests
by default).
-
If you want to use topology-aware mode or mlu-share mode, enable device plugin to get and update nodes.
kubectl create -f examples/cambricon-device-plugin-rbac.yaml
And add service account for device plugin as the example
serviceAccount: cambricon-device-plugin
Running MLU Jobs
Cambricon MLUs can now be consumed via container level resource requirements using the resource name cambricon.com/mlu
:
apiVersion: v1
kind: Pod
metadata:
name: pod1
spec:
restartPolicy: OnFailure
containers:
- image: ubuntu:16.04
name: pod1-ctr
command: ["sleep"]
args: ["100000"]
resources:
limits:
cambricon.com/mlu: "1" # use this when device type is not enabled, else delete this line.
#camrbricon.com/mlu270: "1" # use this when device type is enabled, supports mlu220, mlu270, mlu270-x5k, mlu290 and mlu370
Upgrade Notice
Please see changelog for deprecation and breaking changes.