gomlx

module

v0.11.3 Latest Latest Go to latest Published: Aug 29, 2024 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/gomlx/gomlx

Links

Open Source Insights

README ¶

GoMLX, an Accelerated ML and Math Framework

📖 About GoMLX

GoMLX is a fast and easy-to-use set of Machine Learning and generic math libraries and tools. It can be seen as a PyTorch/Jax/TensorFlow for Go.

It uses just-in-time compilation to CPU and GPU (hopefully soon TPUs also) and is built on top of OpenXLA/PJRT, which itself uses LLVM to JIT-compile code. It's the same engine that powers Google's Jax and TensorFlow, and it has the same speed in many cases.

🎓 Quick Start: see our tutorial, or a guided example for Kaggle Dogs Vs Cats.

It was developed to be full-featured ML platform for Go, and to easily experiment with ML ideas -- see Long-Term Goals below.

It strives to be simple to read and reason about, leading the user to a correct and transparent mental model of what is going on (no surprises) -- aligned with Go philosophy. At the cost of more typing (more verbose) at times.

It is also incredibly flexible, and easy to extend and try non-conventional things: use it to experiment with new optimizer ideas, complex regularizers, funky multi-tasking, etc.

Documentation is kept up-to-date (if it is not well documented, it is as if the code is not there) and error messages are useful and try to make it easy to solve issues.

GoMLX is still under development, and should be considered experimental.

🗺️ Overview

GoMLX has many important components of an ML framework in place, from the bottom to the top of the stack. But it is still only a slice of what a major ML library/framework should provide (like TensorFlow, Jax or PyTorch).

It includes:

Examples:
- Adult/Census model;
- Cifar-10 demo;
- Dogs & Cats classifier demo;
- IMDB Movie Review demo;
- Diffusion model for Oxford Flowers 102 dataset (generates random flowers);
- GNN model for OGBN-MAG (experimental).
- Last, a trivial synthetic linear model, for those curious to see a barebones simple model.
Pre-Trained models to use: InceptionV3 (image model) -- more to come.
Docker with integrated JupyterLab and GoNB (a Go kernel for Jupyter notebooks)
Just-In-Time (JIT) compilation using OpenXLA for CPUs and GPUs -- hopefully soon TPUs.
Autograd: automatic differentiation -- only gradients for now, no jacobian.
Context: automatic variable management for ML models.
ML layers library with some of the most popular machine learning "layers": FFN layers,
activation functions, layer and batch normalization, convolutions, pooling, dropout, Multi-Head-Attention (for transformer layers), KAN (with B-Splines), PiecewiseLinear (for calibration and normalization), regularizations, FFT (reverse/differentiable), etc.
Training library, with some pretty-printing. Including plots for Jupyter notebook, using GoNB, a Go Kernel.
- Also, various debugging tools: collecting values for particular nodes for plotting, simply logging the value of nodes during training, stack-trace of the code where nodes are created (TODO: automatic printing stack-trace when a first NaN appears during training).
SGD and Adam (AdamW and Adamax) optimizers.
Various losses and metrics.

👥 Support

Q&A and discussions
Issues
Random brainstorming on projects: just start a Q&A and I'm happy to meet in discord somewhere or VC.

🛠️ + ⚙️ Installation

TLDR;: Two options: (1) Use the Docker; (2) Pre-built only for Linux (works in Windows WSL): install gopjrt (see installation instructions) (optional: Nvidia's cuda support) and sudo apt install install hdf5-tools.

GoMLX is mostly a normal Go library, but it depends on gopjrt, which includes C wrappers to XLA (itself C++ code base). Installing gopjrt is relatively straight forward, follow the installation instructions (notice the optional Nvidia CUDA support, if you are interested).

Releases are for Linux only for now. They do work well with WSL (Windows Subsystem for Linux) in Windows. I don't have a Mac, but XLA works for Mac/DarwinOS (on arm64), so gopjrt should compile as well and GoMLX should work (contributions are welcome 😃).

🐳 Pre-built Docker

The easiest to start playing with it, it's just pulling the docker image that includes GoMLX + JupyterLab + GoNB (a Go kernel for Jupyter) and Nvidia's CUDA runtime (for optional support of GPU) pre-installed -- it is ~5Gb to download.

From a directory you want to make visible in Jupyter, do:

For GPU support add the flag --gpus all to the docker run command bellow.

docker pull janpfeifer/gomlx_jupyterlab:latest
docker run -it --rm -p 8888:8888 -v "${PWD}":/home/jupyter/work janpfeifer/gomlx_jupyterlab:latest

It will display a URL starting with 127.0.0.1:8888 in the terminal (it will include a secret token needed) that you can open in your browser.

You can open and interact with the tutorial from there, it is included in the docker under the directory Projects/gomlx/examples/tutorial.

More details on the docker here.

🧭 Tutorial

See the tutorial here. It covers a bit of everything.

After that look at the demos in the examples/ directory.

The library itself is well documented (pls open issues if something is missing), and the code is not too hard to read. Godoc available in pkg.go.dev.

Finally, feel free to ask questions: time allowing (when not in work) I'm always happy to help -- I created groups.google.com/g/gomlx-discuss, or use GitHub discussions page.

🎯 Long-term Goals

Building and training models in Go -- as opposed to Python (or some other language) -- with focus on:
- Being simple to read and reason about, leading the user to a correct and transparent mental model of what is going on. Even if that means being more verbose when writing.
- Clean, separable APIs: individual APIs should be self-contained and decoupled where possible.
- Composability: Any component should be replaceable, so they can be customized and experimented. That means sometimes more coding (there is not one magic train object that does everything), but it makes it clear what is happening, and it's easy to replace parts with a third party versions or something custom.
- Up-to-date documentation: if the documentation is not there or if it's badly written, it's as if the code was not there either.
- Clear and actionable error reporting
To be a productive research and educational platform to experiment with new ML ideas and learn.
- Support mirrored training on multiple devices and various forms of distributed training (model and/or data parallelism) in particular to support for large language models and similarly large model training.
To be a robust and reliable platform for production. Some sub-goals:
- Support modern accelerator hardware like TPUs and GPUs.
- Save models to industry tools like TensorFlow Serving.
- Import pre-trained models from Hugging Face Hub and TensorFlow Hub where possible.
- Compile models to binary as in C-libraries and/or WebAssembly, to be linked and consumed (inference) anywhere (any language).

🤝 Collaborating

The project is looking forward contributions for anyone interested. Many parts are not yet set in stone, so there is plenty of space for improvements and re-designs for those interested and with good experience in Go, Machine Learning and APIs in general. See the TODO file for inspiration.

No governance guidelines have been established yet.

🚀 Advanced Topics

⚖️ License

Copyright 2024 Jan Pfeifer

GoMLX is distributed under the terms of the Apache License Version 2.0. Unless it is explicitly stated otherwise, any contribution intentionally submitted for inclusion in this project shall be licensed under Apache License Version 2.0 without any additional terms or conditions.

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
backends Package backends defines the interface to a computation building and execution system needs to implement to be used by GoMLX.	Package backends defines the interface to a computation building and execution system needs to implement to be used by GoMLX.
xla Package xla implements the XLA/PJRT (https://openxla.org/) based backend for GoMLX.	Package xla implements the XLA/PJRT (https://openxla.org/) based backend for GoMLX.
cmd
backends_generator backends_generator generates parts of the backends.Builder interface based on the github.com/gomlx/gopjrt/xlabuilder implementation.	backends_generator generates parts of the backends.Builder interface based on the github.com/gomlx/gopjrt/xlabuilder implementation.
backends_generator/parsexlabuilder Package parsexlabuilder parses the xlabuilder API to enumerate graph building functions, and the `op_types.txt` file to get a list of the supported ops.	Package parsexlabuilder parses the xlabuilder API to enumerate graph building functions, and the `op_types.txt` file to get a list of the supported ops.
constraint_generator constraint_generator prints out various lists of constraints used by generics, which can then be copy&pasted into the code.	constraint_generator prints out various lists of constraints used by generics, which can then be copy&pasted into the code.
gomlx_checkpoints gomlx_checkpoints reports back on model size (and memory) usage (--summary), individual variables shapes and sizes (--vars), hyperparameters used with the model (--params) or metrics collected during model training (--metrics, --metrics_labels).	gomlx_checkpoints reports back on model size (and memory) usage (--summary), individual variables shapes and sizes (--vars), hyperparameters used with the model (--params) or metrics collected during model training (--metrics, --metrics_labels).
graph_generator
graph_generator/parsebackends Package parsebackends parses the backends.Builder API to enumerate graph building methods.	Package parsebackends parses the backends.Builder API to enumerate graph building methods.
xla_generator xla_generator generates the xla.Backend implementation based on the github.com/gomlx/gopjrt/xlabuilder implementation.	xla_generator generates the xla.Backend implementation based on the github.com/gomlx/gopjrt/xlabuilder implementation.
examples
adult Package adult provides a `InMemoryDataset` implementation for UCI Adult Census dataset.	Package adult provides a `InMemoryDataset` implementation for UCI Adult Census dataset.
adult/demo UCI-Adult demo trainer.	UCI-Adult demo trainer.
cifar Package cifar provides a library of tools to download and manipulate Cifar-10 dataset.	Package cifar provides a library of tools to download and manipulate Cifar-10 dataset.
cifar/demo CIFAR-10 demo trainer.	CIFAR-10 demo trainer.
discretekan
dogsvscats
dogsvscats/demo demo for Dogs vs Cats library: you can run this program in 3 different ways:	demo for Dogs vs Cats library: you can run this program in 3 different ways:
imdb Package imdb contains code to download and prepare datasets with IMDB Dataset of 50k Movie Reviews.	Package imdb contains code to download and prepare datasets with IMDB Dataset of 50k Movie Reviews.
imdb/demo IMDB Movie Review library (imdb) demo: you can run this program in 4 different ways:	IMDB Movie Review library (imdb) demo: you can run this program in 4 different ways:
linear Linear generates random synthetic data, based on some linear mode + noise.	Linear generates random synthetic data, based on some linear mode + noise.
notebook Package notebook allows one to check if running within a notebook.	Package notebook allows one to check if running within a notebook.
notebook/bashkernel Package bashkernel implements tools to output rich content to a Jupyter notebook running the bash_kernel (https://github.com/takluyver/bash_kernel).	Package bashkernel implements tools to output rich content to a Jupyter notebook running the bash_kernel (https://github.com/takluyver/bash_kernel).
notebook/bashkernel/chartjs
notebook/gonb/margaid Package margaid implements automatic plotting of all metrics registered in a trainer, using the Margaid library (https://github.com/erkkah/margaid/) to draw SVG, and GoNB (https://github.com/janpfeifer/gonb/) to display it in a Jupyter Notebook.	Package margaid implements automatic plotting of all metrics registered in a trainer, using the Margaid library (https://github.com/erkkah/margaid/) to draw SVG, and GoNB (https://github.com/janpfeifer/gonb/) to display it in a Jupyter Notebook.
notebook/gonb/plotly Package plotly uses GoNB plotly support (`github.com/janpfeifer/gonb/gonbui/plotly`) to plot both on dynamic plots while training or to quickly plot the results of a previously saved plot results in a checkpoints directory.	Package plotly uses GoNB plotly support (`github.com/janpfeifer/gonb/gonbui/plotly`) to plot both on dynamic plots while training or to quickly plot the results of a previously saved plot results in a checkpoints directory.
notebook/gonb/plots Package plots define common types and utilities to the different plot libraries.	Package plots define common types and utilities to the different plot libraries.
ogbnmag Package ogbnmag provides `Download` method for the corresponding dataset, and some dataset tools	Package ogbnmag provides `Download` method for the corresponding dataset, and some dataset tools
ogbnmag/demo
ogbnmag/fnn Package fnn implements a feed-forward neural network for the OGBN-MAG problem.	Package fnn implements a feed-forward neural network for the OGBN-MAG problem.
ogbnmag/gnn Package gnn implements a generic GNN modeling based on [TF-GNN MtAlbis].	Package gnn implements a generic GNN modeling based on [TF-GNN MtAlbis].
ogbnmag/sampler
oxfordflowers102 Package oxfordflowers102 provides tools to download and cache the dataset and a `train.Dataset` implementation that can be used to train models using GoMLX (http://github.com/gomlx/gomlx/).	Package oxfordflowers102 provides tools to download and cache the dataset and a `train.Dataset` implementation that can be used to train models using GoMLX (http://github.com/gomlx/gomlx/).
oxfordflowers102/diffusion Package diffusion contains an example diffusion model, trained on Oxford Flowers 102 dataset.	Package diffusion contains an example diffusion model, trained on Oxford Flowers 102 dataset.
oxfordflowers102/diffusion/train
graph Package graph is the core package for GoMLX.	Package graph is the core package for GoMLX.
graphtest Package graphtest holds test utilities for packages that depend on the graph package.	Package graphtest holds test utilities for packages that depend on the graph package.
nanlogger Package nanlogger collects `graph.Node` objects to monitor for `NaN` ("not-a-number") or `Inf` (infinity) values.	Package nanlogger collects `graph.Node` objects to monitor for `NaN` ("not-a-number") or `Inf` (infinity) values.
ml
context Package context defines the Context and Variable types: Context organizes variablesMap and variablesMap manages the storage of values typically used as variablesMap.	Package context defines the Context and Variable types: Context organizes variablesMap and variablesMap manages the storage of values typically used as variablesMap.
context/checkpoints Package checkpoints implements checkpoint management: saving and loading of checkpoints.	Package checkpoints implements checkpoint management: saving and loading of checkpoints.
context/ctxtest Package ctxtest holds test utilities for packages that depend on context package.	Package ctxtest holds test utilities for packages that depend on context package.
context/initializers Package initializers include several weight initializers, to be used with context.	Package initializers include several weight initializers, to be used with context.
data Package data is a collection of tools that facilitate data loading and preprocessing.	Package data is a collection of tools that facilitate data loading and preprocessing.
data/hdf5 Package hdf5 provides a trivial API to access HDF5 file contents.	Package hdf5 provides a trivial API to access HDF5 file contents.
layers Package layers holds a collection of common modeling layers.	Package layers holds a collection of common modeling layers.
layers/activations Package activations implements several common activations, and includes a generic Apply method to apply an activation by its type.	Package activations implements several common activations, and includes a generic Apply method to apply an activation by its type.
layers/batchnorm Package batchnorm implements a batch normalization layer, and associated tools.	Package batchnorm implements a batch normalization layer, and associated tools.
layers/bsplines Package bsplines provide a GoMLX version of github.com/gomlx/bsplines: it provides evaluation of bsplines curves, that can be used as layers.	Package bsplines provide a GoMLX version of github.com/gomlx/bsplines: it provides evaluation of bsplines curves, that can be used as layers.
layers/fnn Package fnn implements a generic FNN (Feedforward Neural Network) with various configurations.	Package fnn implements a generic FNN (Feedforward Neural Network) with various configurations.
layers/kan Package kan implements a generic Kolmogorov–Arnold Networks, as described in https://arxiv.org/pdf/2404.19756	Package kan implements a generic Kolmogorov–Arnold Networks, as described in https://arxiv.org/pdf/2404.19756
layers/regularizers Package regularizers adds tools to facilitate add regularization to the weights learned.	Package regularizers adds tools to facilitate add regularization to the weights learned.
train Package train holds tools to help run a training loop.	Package train holds tools to help run a training loop.
train/commandline Package commandline contains convenience UI training tools for the command line.	Package commandline contains convenience UI training tools for the command line.
train/losses Package losses have several standard losses that implement train.LossFn interface.	Package losses have several standard losses that implement train.LossFn interface.
train/metrics Package metrics holds a library of metrics and defines	Package metrics holds a library of metrics and defines
train/optimizers Package optimizers implements a collection of ML optimizers, that can be used by train.Trainer, or by themselves.	Package optimizers implements a collection of ML optimizers, that can be used by train.Trainer, or by themselves.
models
inceptionv3 Package inceptionv3 provides a pre-trained InceptionV3 model, or simply it's structure.	Package inceptionv3 provides a pre-trained InceptionV3 model, or simply it's structure.
types Package types is mostly a top level directory for GoMLX important types.	Package types is mostly a top level directory for GoMLX important types.
shapes Package shapes defines Shape and DType and associated tools.	Package shapes defines Shape and DType and associated tools.
tensors Package tensors implements a `Tensor`, a representation of a multi-dimensional array.	Package tensors implements a `Tensor`, a representation of a multi-dimensional array.
tensors/images Package images provides several functions to transform images back and forth from tensors.	Package images provides several functions to transform images back and forth from tensors.
xslices Package xslices provide missing functionality to the slices package.	Package xslices provide missing functionality to the slices package.