gonet

package module
v0.0.0-...-5b2c132 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 11, 2026 License: MIT Imports: 9 Imported by: 0

README

gonet

Motivation

gonet is a neural network library written in Go as a project for learning, experimentation, and fun. It is not intended for production use. The goal is to make neural-network internals easier to inspect by implementing the core pieces directly: fully connected layers, embeddings, attention, normalization, losses, training loops, and automatic differentiation.

The repository contains two independent implementations:

  • The root gonet package builds dynamic computation graphs and performs reverse-mode automatic differentiation.
  • The arrimpl package contains an array-based fully connected network with explicit forward and backward propagation (which is meant for double check).

Examples

The examples/ directory contains small demonstrational mini-projects that exercise different parts of the framework:

  • Binary classifier - trains a small neural-net binary classifier inspired by Karpathy's micrograd introduction. It can run with either the computation-graph implementation or the array-based MLP.
  • Digit OCR - trains a digit classifier on sklearn digits or MNIST-style data, again with both graph and array-based training modes.
  • Word embedding - trains a tiny word embedding model and shows how similar words can converge toward similar learned vectors.
  • Makemore neural bigram - implements a character-level neural bigram language model, with both one-hot linear and embedding-based variants.
  • Makemore neural quadgram - implements an MLP character language model in the style of Bengio et al. 2003, using multiple previous characters as context.
  • Makemore WaveNet - builds a character language model with a WaveNet-like hierarchical structure.
  • Makemore decoder-only transformer - trains a small character-level GPT-style model with masked self-attention, multi-head attention, attention blocks, and token generation.

Decoder-Only Transformer Highlight

The decoder-only transformer example is the most sophisticated example in this repository for now. It demonstrates that this small Go deep-learning framework can express and train a relatively complex model: token embeddings, positional/context handling, masked self-attention, multi-head attention, stacked transformer blocks, and autoregressive character generation.

The example is still primarily educational. Performance is not competitive with production deep-learning frameworks (eg, PyTorch), but that is also the point: the model is implemented in a way that keeps the mechanics visible and hackable instead of hiding them behind highly optimized kernels (tensor operations).

Documentation

Overview

Package gonet provides a computational graph based implementation of neural net forward and backward propagation algorithm.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NodeValues

func NodeValues(ns []*Node) []float64

func SingleLinear

func SingleLinear(inputSize int, bias bool) *singleLinear

func Train

func Train(model Model, samples []util.Sample, cfg *util.TrainConfig, lf LossFunction) time.Duration

Types

type E2ELoss

type E2ELoss func([]util.Sample) *Node

func TrainLossFunc

func TrainLossFunc(model FeedForwarder, lf LossFunction) E2ELoss

type E2EPredictLoss

type E2EPredictLoss func([]util.Sample) float64

func PredictLossFunc

func PredictLossFunc(model FeedForwarder, lf LossFunction) E2EPredictLoss

type Embedding

type Embedding struct {
	// contains filtered or unexported fields
}

func NewEmbedding

func NewEmbedding(vocabSize, dim int, initNorm bool) *Embedding

func (*Embedding) E

func (e *Embedding) E(index int) []*Node

func (*Embedding) EmbeddingFeed

func (e *Embedding) EmbeddingFeed(in []*Node) (out []*Node)

func (*Embedding) S

func (e *Embedding) S(index int) string

func (*Embedding) Sub

func (e *Embedding) Sub(i, j int) (diff []float64)

func (*Embedding) UnembeddingFeed

func (e *Embedding) UnembeddingFeed(in, bias []*Node) (out []*Node)

type FeedForwarder

type FeedForwarder interface {
	Feed([]*Node) []*Node
}

type Layer

type Layer interface {
	FeedForwarder
	Parameters() []util.Parameter
	Name() string
}

func AttentionBlockLayer

func AttentionBlockLayer(embDim, headNum int, buildAttention func(int, int) Layer) Layer

func EmbeddingLayer

func EmbeddingLayer(vocabSize, dim int) Layer

func EmbeddingLayerFrom

func EmbeddingLayerFrom(emb *Embedding) Layer

func KQVLayer

func KQVLayer(embDim, headSize int) Layer

func LayerNormLayer

func LayerNormLayer(dim int) Layer

func LinearLayer

func LinearLayer(fanIn, fanOut int, bias bool) Layer

func MaskedSelfAttentionLayer

func MaskedSelfAttentionLayer(embDim, headSize int) Layer

func MultiHeadAttentionLayer

func MultiHeadAttentionLayer(embDim, headNum int, buildAttention func(int, int) Layer) Layer

func PositionalEmbeddingLayer

func PositionalEmbeddingLayer(vocabSize, ctxLen, dim int) Layer

func ReluLayer

func ReluLayer() Layer

func SigmoidLayer

func SigmoidLayer() Layer

func SoftmaxLayer

func SoftmaxLayer(t float64) Layer

func TanhLayer

func TanhLayer() Layer

func UnembeddingLayer

func UnembeddingLayer(dim, vocabSize int, bias bool) Layer

func UnembeddingLayerFrom

func UnembeddingLayerFrom(emb *Embedding, bias bool) Layer

type LossFunction

type LossFunction func(actual, predicted []*Node) *Node

type Model

type Model interface {
	Layer
	util.Predictor
}

func DecoderOnlyTransformer

func DecoderOnlyTransformer(vocabSize, ctxLen, layerNum, headNum, embDim int) Model

func EmbeddingModel

func EmbeddingModel(vocabSize, dim int) Model

func LinearModel

func LinearModel(fanIn, fanOut int, bias bool) Model

func SequentialModel

func SequentialModel(layers ...Layer) Model

type Node

type Node struct {
	// contains filtered or unexported fields
}

func CrossEntropyLoss

func CrossEntropyLoss(actual, predicted []*Node) *Node

CrossEntropyLoss takes softmax activation into account already, so the values in `predicted` nodes are logits rather than probabilities actaully. If you would like to use the cross entropy function without softmax fused into it, use RawCrossEntoryLoss instead. Also note that, different from RawCrossEntoryLoss, all values in `actual` nodes are indexes of ground-truth. So do expect `actual` (indexes) and `predicted` (logits) to be of different lengths.

func DotProduct

func DotProduct(left, right []*Node) *Node

func Identity

func Identity(n *Node) *Node

func LayerNorm

func LayerNorm(xs, gamma, beta []*Node, eps float64) (ys []*Node)

func Linear

func Linear(ws, xs []*Node, bias *Node) *Node

func MaskedAttention

func MaskedAttention(ks, qs, vs [][]*Node) []*Node

func MaxMarginLoss

func MaxMarginLoss(actual, predicted []*Node) *Node

func Mean

func Mean(xs ...*Node) *Node

func MeanVariance

func MeanVariance(xs ...*Node) (mean, variance *Node)

func Multiply

func Multiply(prev ...*Node) *Node

func NewInputNode

func NewInputNode(v float64, name string) *Node

func NewInputNodeBatch

func NewInputNodeBatch(size int, nameFmt string, noGrad bool) []*Node

func NewInputNodeNoGrad

func NewInputNodeNoGrad(v float64, name string) *Node

func NewNode

func NewNode(v float64, name string) *Node

func Normalize

func Normalize(eps float64, xs ...*Node) (ys []*Node)

func Plus

func Plus(prev ...*Node) *Node

func RawCrossEntropyLoss

func RawCrossEntropyLoss(actual, predicted []*Node) *Node

RawCrossEntropyLoss defines the cross-entropy loss function. `actual` represents the actual probability of each predefined class which should contain only one non-vanishing entry with value 1, meaning this class is observed in the data set sample (hence probability equals 1).

func Relu

func Relu(prev *Node) *Node

func ResidualSumSquaredLoss

func ResidualSumSquaredLoss(actual, predicted []*Node) *Node

ResidualSumSquaredLoss is the Residual Sum of Squared (RSS) or Sum of Squared Errors (SSE).

func Sigmoid

func Sigmoid(prev *Node) *Node

func Softmax

func Softmax(t float64, prev ...*Node) []*Node

Softmax also accepts a parameter `t`, which is sometimes called temperature.

func Tanh

func Tanh(prev *Node) *Node

func VectorAdd

func VectorAdd(left, right []*Node) (out []*Node)

func (*Node) Backward

func (n *Node) Backward()

func (*Node) Forward

func (n *Node) Forward()

func (*Node) ForwardBackward

func (n *Node) ForwardBackward()

func (*Node) G

func (n *Node) G() float64

func (*Node) Learn

func (n *Node) Learn(delta float64)

func (*Node) Name

func (n *Node) Name() string

func (*Node) SetName

func (n *Node) SetName(name string)

func (*Node) SetV

func (n *Node) SetV(v float64)

func (*Node) String

func (n *Node) String() string

func (*Node) V

func (n *Node) V() float64

func (*Node) ZeroG

func (n *Node) ZeroG()

type Sample

type Sample struct {
	X []*Node
	Y []*Node
}

func NewSample

func NewSample(inputSize, outputSize int, noGrad bool) *Sample

func (*Sample) Update

func (s *Sample) Update(us util.Sample)

type SampleBatch

type SampleBatch []*Sample

func NewSampleBatch

func NewSampleBatch(inputSize, outputSize, batchSize int) SampleBatch

func (SampleBatch) Update

func (sb SampleBatch) Update(samples []util.Sample)

Directories

Path Synopsis
Package arrimpl provides an array based implementation of Fully Connected Neural Network, which is much faster for large and deep networks.
Package arrimpl provides an array based implementation of Fully Connected Neural Network, which is much faster for large and deep networks.
examples
digit_ocr command
makemore
Package makemore define the common utilities that are shared by all "makemore" examples.
Package makemore define the common utilities that are shared by all "makemore" examples.
word_embedding command
Package util ...
Package util ...

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL