Status: Experimental 🧪
Easily deploy GPU models as serverless APIs
$ kuda deploy -f https://raw.githubusercontent.com/cyrildiagne/kuda/0.4/examples/hello-gpu-flask/kuda.yaml
$ curl https://hello-gpu.default.$your_domain
Hello GPU!
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 37C P8 10W / 70W | 0MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Serverless GPU inference
Kuda builds on Knative to allocate cloud GPUs only when there is traffic
to your app.
This is ideal when you want to share ML projects online without keeping
expensive GPUs allocated all the time.
It tries to reduce cold starts time (gpu nodes allocation and service instanciation)
as much possible and to tries manage cooldown times intelligently.
Turn any model into a serverless API
Kuda deploys APIs as a docker containers, so you can use any language, any
framework, and there is no library to import in your code.
All you need is a Dockerfile.
Here's a minimal example that just prints the result of nvidia-smi
using
Flask:
import os
import flask
app = flask.Flask(__name__)
@app.route('/')
def hello():
return 'Hello GPU:\n' + os.popen('nvidia-smi').read()
FROM nvidia/cuda:10.1-base
RUN apt-get install -y python3 python3-pip
RUN pip3 install setuptools Flask gunicorn
COPY main.py ./main.py
CMD exec gunicorn --bind :80 --workers 1 --threads 8 main:app
name: hello-gpu
deploy:
dockerfile: ./Dockerfile
Running kuda deploy
in this example would build and deploy the API to a url
such as https://hello-gpu.my-namespace.example.com
.
Checkout the full example with annotations in
examples/hello-gpu-flask.
Features
- Provision GPUs & scale based on traffic (from zero to N)
- Interactive development on cloud GPUs from any workstation
- Protect & control access to your APIs using API Keys
- HTTPS with TLS termination & automatic certificate management
Get Started