kuda

module
v0.4.0-preview Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 16, 2020 License: Apache-2.0

README

Status: Experimental 🧪

Easily deploy GPU models as serverless APIs

  • Deploy a template
$ kuda deploy -f https://raw.githubusercontent.com/cyrildiagne/kuda/0.4/examples/hello-gpu-flask/kuda.yaml
  • Call it!
$ curl https://hello-gpu.default.$your_domain
Hello GPU!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8    10W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Serverless GPU inference

Kuda builds on Knative to allocate cloud GPUs only when there is traffic to your app.

This is ideal when you want to share ML projects online without keeping expensive GPUs allocated all the time.

It tries to reduce cold starts time (gpu nodes allocation and service instanciation) as much possible and to tries manage cooldown times intelligently.

Turn any model into a serverless API

Kuda deploys APIs as a docker containers, so you can use any language, any framework, and there is no library to import in your code.

All you need is a Dockerfile.

Here's a minimal example that just prints the result of nvidia-smi using Flask:

  • main.py
import os
import flask

app = flask.Flask(__name__)

@app.route('/')
def hello():
    return 'Hello GPU:\n' + os.popen('nvidia-smi').read()
  • Dockerfile
FROM nvidia/cuda:10.1-base

RUN apt-get install -y python3 python3-pip

RUN pip3 install setuptools Flask gunicorn

COPY main.py ./main.py

CMD exec gunicorn --bind :80 --workers 1 --threads 8 main:app
  • kuda.yaml
name: hello-gpu
deploy:
  dockerfile: ./Dockerfile

Running kuda deploy in this example would build and deploy the API to a url such as https://hello-gpu.my-namespace.example.com.

Checkout the full example with annotations in examples/hello-gpu-flask.

Features

  • Provision GPUs & scale based on traffic (from zero to N)
  • Interactive development on cloud GPUs from any workstation
  • Protect & control access to your APIs using API Keys
  • HTTPS with TLS termination & automatic certificate management

Get Started

Directories

Path Synopsis
cmd
api
cli
pkg
api

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL