envd
Development Environment for Data Scientists
envd is a development environment management tool for data scientists.
π No docker, only python - Write python code to build the development environment, we help you take care of Docker.
π¨οΈ Built-in jupyter/vscode - Jupyter and VSCode remote extension are the first-class support.
β±οΈ Save time - Better cache management to save your time, keep the focus on the model, instead of dependencies
βοΈ Local & cloud - Run the environment locally or in the cloud, without any code change
π³ Container native - Leverage container technologies but no need to learn how to use them, we optimize it for you
π€ Infrastructure as code - Describe your project in a declarative way, 100% reproducible
Why use envd?
It is still too difficult to configure development environments and reproduce results for data scientists and AI/ML researchers.
They have to play with Docker, conda, CUDA, GPU Drivers, and even Kubernetes if the training jobs are running in the cloud, to make things happen.
Thus, researchers have to find infra guys to help them. But the infra guys also struggle to build environments for machine learning. Infra guys love immutable infrastructure. But researchers optimize AI/ML models by trial and error. The environment will be updated, modified, or rebuilt again, and again, in place. Researchers do not have the bandwidth to be the expert on Dockerfile. They prefer docker commit
, then the image is error-prone and hard to maintain, or debug.
envd provides another way to solve the problem. As the infra guys, we accept the reality of the differences between AI/ML and traditional workloads. We do not expect researchers to learn the basics of infrastructure, instead, we build tools to help researchers manage their development environments easily, and in a cloud-native way.
envd provides build language similar to Python and has first-class support for jupyter, vscode, and python dependencies in container technologies.
How does envd work?
Install
From binary
You can download the binary from the latest release page.
After the download, please run envd bootstrap
to bootstrap.
From source code
git clone https://github.com/tensorchord/envd
go mod tidy
make
./bin/envd --version
Quickstart
Checkout the examples, and configure envd with the manifest build.envd
:
vscode(plugins=[
"ms-python.python",
])
base(os="ubuntu20.04", language="python3")
pip_package(name=[
"tensorflow",
"numpy",
])
cuda(version="11.6", cudnn="8")
shell("zsh")
jupyter(password="", port=8888)
Then you can run envd up
to create the development environment.
![](https://asciinema.org/a/498012.svg)
TODO: illustrate that the cache will be persistent.
$ envd up
[+] β parse build.envd and download/cache dependencies 0.0s β
(finished)
=> π½ (cached) download oh-my-zsh 0.0s
=> π½ (cached) download ms-python.python 0.0s
[+] π build envd environment 7.7s (24/25)
=> π½ (cached) (built-in packages) apt-get install curl openssh-client g 0.0s
=> π½ (cached) create user group envd 0.0s
=> π½ (cached) create user envd 0.0s
=> π½ (cached) add user envd to sudoers 0.0s
=> π½ (cached) (user-defined packages) apt-get install screenfetch 0.0s
=> π½ (cached) install system packages 0.0s
=> π½ (cached) pip install jupyter 0.0s
=> π½ (cached) install PyPI packages 0.0s
=> π½ (cached) install envd-ssh 0.0s
=> π½ (cached) install vscode plugin ms-python.python 0.0s
=> π½ (cached) copy /oh-my-zsh /home/envd/.oh-my-zsh 0.0s
=> π½ (cached) mkfile /home/envd/install.sh 0.0s
=> π½ (cached) install oh-my-zsh 0.0s
...
# You are in the docker container for dev
(envd π³) β mnist-dev git:(master) python3 ./main.py
...
Jupyter notebook service and sshd server are running inside the container. You can use jupyter or vscode remote-ssh extension to develop AI/ML models.
$ envd get envs
NAME JUPYTER SSH TARGET CONTEXT IMAGE GPU CUDA CUDNN STATUS CONTAINER ID
mnist http://localhost:9999 mnist.envd /mnist mnist:dev true 11.6 8 Up 23 hours 74a9f1007004
$ envd get images
NAME CONTEXT GPU CUDA CUDNN IMAGE ID CREATED SIZE
mnist:dev /mnist true 11.6 8 034ae55c5f4f 23 hours ago 7.28GB
Features
Pause and resume
$ envd pause --env mnist
mnist
$ env get envs
NAME JUPYTER SSH TARGET CONTEXT IMAGE GPU CUDA CUDNN STATUS CONTAINER ID
mnist http://localhost:9999 mnist.envd /mnist mnist:dev true 11.6 8 Up 23 hours(Paused) 74a9f1007004
$ envd resume --env mnist
$ ssh mnist.envd
(envd π³) $ # The environment is resumed!
envd supports PyPI mirror and apt source configuration. You can configure them in build.env
or $HOME/.config/envd/config.envd
to set up in all environments.
cat ~/.config/envd/config.envd
ubuntu_apt(source="""
deb https://mirror.sjtu.edu.cn/ubuntu focal main restricted
deb https://mirror.sjtu.edu.cn/ubuntu focal-updates main restricted
deb https://mirror.sjtu.edu.cn/ubuntu focal universe
deb https://mirror.sjtu.edu.cn/ubuntu focal-updates universe
deb https://mirror.sjtu.edu.cn/ubuntu focal multiverse
deb https://mirror.sjtu.edu.cn/ubuntu focal-updates multiverse
deb https://mirror.sjtu.edu.cn/ubuntu focal-backports main restricted universe multiverse
deb http://archive.canonical.com/ubuntu focal partner
deb https://mirror.sjtu.edu.cn/ubuntu focal-security main restricted universe multiverse
""")
pip_index(url = "https://mirror.sjtu.edu.cn/pypi/web/simple")
vscode(plugins = [
"ms-python.python",
"github.copilot"
])
Join Us
envd is backed by TensorChord and licensed under Apache-2.0. We are actively hiring engineers to build developer tools for machine learning practitioners in open source.
Contribute
We welcome all kinds of contributions from the open-source community, individuals, and partners.