spago

command module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 9, 2020 License: BSD-2-Clause Imports: 5 Imported by: 0

README

alt text

Go Go Report Card Maintainability codecov License Unstable PRs Welcome

If you like the project, please ★ star this repository to show your support! 🤩

A beautiful and maintainable machine learning library written in Go. It is designed to support relevant neural architectures in Natural Language Processing.

spaGO is compatible with 🤗 BERT-like Transformers and with the Flair sequence labeler architecture.

Features

Automatic differentiation
  • You write the forward(), it does all backward() derivatives for you:
    • Define-by-Run (default, just like PyTorch does)
    • Define-and-Run (similar to the static graph of TensorFlow)
Optimization methods
  • Gradient descent:
    • Adam, RAdam, RMS-Prop, AdaGrad, SGD
  • Differential Evolution
Neural networks
  • Feed-forward models (Linear, Highway, Convolution, ...)
  • Recurrent models (LSTM, GRU, BiLSTM...)
  • Attention mechanisms (Self-Attention, Multi-Head Attention, ...)
  • Recursive auto-encoders
Natural Language Processing
  • Memory-efficient Word Embeddings (with badger key–value store)
  • Character Language Models
  • Recurrent Sequence Labeler with CRF on top (e.g. Named Entities Recognition)
  • Transformer models (BERT-like)
    • Masked language model
    • Next sentence prediction
    • Tokens Classification
    • Text Classification (e.g. Sentiment Analysis)
    • Question Answering
    • Textual Entailment
    • Text Similarity
Compatible with pre-trained state-of-the-art neural models:

Documentation

Usage

Requirements:

Clone this repo or get the library:

go get -u github.com/nlpodyssey/spago

To get started, you can find some tutorials on the Wiki about the Machine Learning Framework.

Several demo programs can be leveraged to tour the current capabilities in spaGO. The demos are documented on this page of the Wiki. A list of the demos now follows.

There is also a repo with handy examples, such as MNIST classification.

Project Goals

Is spaGO right for me?

Are you looking for a highly optimized, scalable, battle-tested, production-ready machine-learning/NLP framework? Are you also a Python lover and enjoy manipulating tensors? If yes, you won't find much to your satisfaction here.

PyTorch plus the wonders of the friends of Hugging Face is the answer you seek!

If instead you prefer statically typed, compiled programming language, and a simpler yet well-structured machine-learning framework almost ready to use is what you need, then you are in the right place!

The idea is that you could have written spaGO. Most of it, from the computational graph to the LSTM is straightforward Go code :)

Why spaGO?

I've been writing more or less the same software for almost 20 years. I guess it's my way of learning a new language. Now it's Go's turn, and spaGO is the result of a few days of pure fun!

Let me explain a little further. It's not precisely the very same software I've been writing now for 20 years: I've been working in the NLP for this long, experimenting with different approaches and techniques, and therefore software of the same field. I've always felt satisfied to limit the use of third-party dependencies, writing firsthand the algorithms that interest me most. So, I took the opportunity to speed up my understanding of the deep learning techniques and methodologies underlying cutting-edge NLP results, implementing them almost from scratch in straightforward Go code. I'm aware that reinventing the wheel is an anti-pattern; nevertheless, I wanted to build something with my own concepts in my own (italian) style: that's the way I learn best, and it could be your best chance to understand what's going on under the hood of the artificial intelligence :)

When I start programming in a new language, I usually do not know much of it. I often combine the techniques I have acquired by writing in other languages and other paradigms, so some choices may not be the most idiomatic... but who cares, right?

It's with this approach that I jumped on Go and created spaGo: a work in progress, (hopefully) understandable, easy to use library for machine learning and natural language processing.

What direction did you take for the development of spaGO?

I started spaGO to deepen first-hand the mechanisms underlying a machine learning framework. In doing this, I thought it was an excellent opportunity to set up the library so to enable the use and understanding of such algorithms to non-experts as well.

In my experience, the first barrier to (deep) machine learning for developers who do not enjoy mathematics, at least not too much, is getting familiar with the use of tensors rather than understanding neural architecture. Well, in spaGO, we only use well-known 2D Matrices, by which we can represent vectors and scalars too. That's all we need (performance aside). You won't lose sleep anymore by watching tensor axes to figure out how to do math operations.

Since it's a counter-trend decision, let me argue some more. It happened a few times that friends and colleagues, who are super cool full-stack developers, tried to understand the NLP algorithms I was programming in PyTorch. Sometimes they gave up just because "the forward() method doesn't look like the usual code" to them.

Honestly, I don't find it hard to believe that by combining Python's dynamism with the versatility of tensors, the flow of a program can become hard to digest. It is undoubtedly essential to devote a good time reading the documentation, which may not be immediately available. Hence, you find yourself forced to inspect the content of the variables at runtime with your favorite IDE (PyCharm, of course). It happens in general, but I believe in machine learning in particular.

In other words, I wanted to limit as much as possible the use of tensors larger than two dimensions, preferring the use of built-in types such as slices and maps. For example, batches are explicit as slices of nodes, not part of the same forward() computation. Too much detail here, sorry. At the end, I guess we do gain static code analysis this way, by shifting the focus from the tensor operations back to traditional control-flows. Of course, the type checker still can't verify the correct shapes of matrices and the like. That still requires runtime panics etc. I agree that it is hard to see where to draw the line, but so far, I'm pretty happy with my decision.

Does spaGO support GPU?

Sadly, not using tensors, spaGO is not GPU or TPU friendly by design. You bet, I'm going to do some experiments integrating CUDA, but I can already tell you that I will not reach satisfactory levels.

In spaGO, using slices of (slices of) matrices, we have to "loop" often to do mathematical operations, whereas they are performed in one go using tensors. Any time your code has a loop that is not GPU or TPU friendly.

Mainstream machine-learning tensor-based frameworks such as PyTorch and TensorFlow, the first thing they want to do, is to convert whatever you're doing into a big matrix multiplication problem, which is where the GPU does its best. Yeah, that's an overstatement, but not so far from reality. Storing all data in tensors and applying batched operations to them is the way to go for hardware acceleration. On GPU, it's a must, and even on CPU, that could give a 10x speedup or more with cache-aware BLAS libraries.

Beyond that, I think there's a lot of basic design improvements that would be necessary before spaGO could fit for mainstream use. Many boilerplates could go away using reflection, or more simply by careful engineering. It's perfectly normal; the more I program in Go, the more I would review some choices.

Is spaGO stable?

We're not at a v1.0.0 yet, so spaGO is currently an experimental work-in-progress. It's pretty easy to get your hands on through, so you might want to use it in your real applications. Early adopters may make use of it for production use today as long as they understand and accept that spaGO is not fully tested and that APIs will change (maybe extensively).

If you're wondering, I haven't used spaGO in production yet, but I plan to do the first integration tests soon.

spaGO has been running smoothly for a couple of months now in a system that analyzes thousands of news items a day!

Contact

I encourage you to write an issue. This would help the community grow.

If you really want to write to me privately, please email Matteo Grella with your questions or comments.

Acknowledgments

spaGO is a personal project that is part of the open-source NLP Odyssey initiative initiated by members of the EXOP team. I would therefore like to thank EXOP GmbH here, which is providing full support for development by promoting the project and giving it increasing importance.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
cmd
ner
This is the first attempt to launch a sequence labeling server from the command line.
This is the first attempt to launch a sequence labeling server from the command line.
embeddings
graphviz module
nn
approxlinear Module
pkg
mat
mat/internal/asm/f64
Package f64 provides float64 vector primitives.
Package f64 provides float64 vector primitives.
ml/ag/fn
SparseMax implementation based on https://github.com/gokceneraslan/SparseMax.torch
SparseMax implementation based on https://github.com/gokceneraslan/SparseMax.torch
ml/nn/birnncrf
Bidirectional Recurrent Neural Network (BiRNN) with a Conditional Random Fields (CRF) on top.
Bidirectional Recurrent Neural Network (BiRNN) with a Conditional Random Fields (CRF) on top.
ml/nn/bls
Implementation of the Broad Learning System (BLS) described in "Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture" by C. L. Philip Chen and Zhulin Liu, 2017.
Implementation of the Broad Learning System (BLS) described in "Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture" by C. L. Philip Chen and Zhulin Liu, 2017.
ml/nn/gnn/slstm
slstm Reference: "Sentence-State LSTM for Text Representation" by Zhang et al, 2018.
slstm Reference: "Sentence-State LSTM for Text Representation" by Zhang et al, 2018.
ml/nn/gnn/startransformer
StarTransformer is a variant of the model introduced by Qipeng Guo, Xipeng Qiu et al.
StarTransformer is a variant of the model introduced by Qipeng Guo, Xipeng Qiu et al.
ml/nn/lshattention
LSH-Attention as in `Reformer: The Efficient Transformer` by N. Kitaev, Ł. Kaiser, A. Levskaya.
LSH-Attention as in `Reformer: The Efficient Transformer` by N. Kitaev, Ł. Kaiser, A. Levskaya.
ml/nn/normalization/adanorm
Reference: "Understanding and Improving Layer Normalization" by Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin (2019).
Reference: "Understanding and Improving Layer Normalization" by Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin (2019).
ml/nn/normalization/fixnorm
Reference: "Improving Lexical Choice in Neural Machine Translation" by Toan Q. Nguyen and David Chiang (2018) (https://arxiv.org/pdf/1710.01329.pdf)
Reference: "Improving Lexical Choice in Neural Machine Translation" by Toan Q. Nguyen and David Chiang (2018) (https://arxiv.org/pdf/1710.01329.pdf)
ml/nn/normalization/layernorm
Reference: "Layer normalization" by Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton (2016).
Reference: "Layer normalization" by Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton (2016).
ml/nn/normalization/layernormsimple
Reference: "Understanding and Improving Layer Normalization" by Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin (2019).
Reference: "Understanding and Improving Layer Normalization" by Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin (2019).
ml/nn/normalization/rmsnorm
Reference: "Root Mean Square Layer Normalization" by Biao Zhang and Rico Sennrich (2019).
Reference: "Root Mean Square Layer Normalization" by Biao Zhang and Rico Sennrich (2019).
ml/nn/rae
Implementation of the recursive auto-encoder strategy described in "Towards Lossless Encoding of Sentences" by Prato et al., 2019.
Implementation of the recursive auto-encoder strategy described in "Towards Lossless Encoding of Sentences" by Prato et al., 2019.
ml/nn/rc
This package contains built-in Residual Connections (RC).
This package contains built-in Residual Connections (RC).
ml/nn/rec/horn
Higher Order Recurrent Neural Networks (HORN)
Higher Order Recurrent Neural Networks (HORN)
ml/nn/rec/lstmsc
LSTM enriched with a PolicyGradient to enable Dynamic Skip Connections.
LSTM enriched with a PolicyGradient to enable Dynamic Skip Connections.
ml/nn/rec/mist
Implementation of the MIST (MIxed hiSTory) recurrent network as described in "Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies" by Di Pietro et al., 2018 (https://arxiv.org/pdf/1702.07805.pdf).
Implementation of the MIST (MIxed hiSTory) recurrent network as described in "Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies" by Di Pietro et al., 2018 (https://arxiv.org/pdf/1702.07805.pdf).
ml/nn/rec/nru
Implementation of the NRU (Non-Saturating Recurrent Units) recurrent network as described in "Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies" by Chandar et al., 2019.
Implementation of the NRU (Non-Saturating Recurrent Units) recurrent network as described in "Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies" by Chandar et al., 2019.
ml/nn/rec/rla
RLA (Recurrent Linear Attention) "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" by Katharopoulos et al., 2020.
RLA (Recurrent Linear Attention) "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" by Katharopoulos et al., 2020.
ml/nn/rec/srnn
srnn implements the SRNN (Shuffling Recurrent Neural Networks) by Rotman and Wolf, 2020.
srnn implements the SRNN (Shuffling Recurrent Neural Networks) by Rotman and Wolf, 2020.
ml/nn/syntheticattention
This is an implementation of the Synthetic Attention described in: "SYNTHESIZER: Rethinking Self-Attention in Transformer Models" by Tay et al., 2020.
This is an implementation of the Synthetic Attention described in: "SYNTHESIZER: Rethinking Self-Attention in Transformer Models" by Tay et al., 2020.
nlp/charlm
CharLM implements a character-level language model that uses a recurrent neural network as its backbone.
CharLM implements a character-level language model that uses a recurrent neural network as its backbone.
nlp/contextualstringembeddings
Implementation of the "Contextual String Embeddings" of words (Akbik et al., 2018).
Implementation of the "Contextual String Embeddings" of words (Akbik et al., 2018).
nlp/evolvingembeddings
A word embedding model that evolves itself by dynamically aggregating contextual embeddings over time during inference.
A word embedding model that evolves itself by dynamically aggregating contextual embeddings over time during inference.
nlp/sequencelabeler
Implementation of a sequence labeling architecture composed by Embeddings -> BiRNN -> Scorer -> CRF.
Implementation of a sequence labeling architecture composed by Embeddings -> BiRNN -> Scorer -> CRF.
nlp/stackedembeddings
StackedEmbeddings is a convenient module that stacks multiple word embedding representations by concatenating them.
StackedEmbeddings is a convenient module that stacks multiple word embedding representations by concatenating them.
nlp/tokenizers
This package is an interim solution while developing `gotokenizers` (https://github.com/nlpodyssey/gotokenizers).
This package is an interim solution while developing `gotokenizers` (https://github.com/nlpodyssey/gotokenizers).
nlp/tokenizers/basetokenizer
BaseTokenizer is a very simple tokenizer that splits per white-spaces (and alike) and punctuation symbols.
BaseTokenizer is a very simple tokenizer that splits per white-spaces (and alike) and punctuation symbols.
nlp/transformers/bert
Reference: "Attention Is All You Need" by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin (2017) (http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf).
Reference: "Attention Is All You Need" by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin (2017) (http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL