goformer

package module

v0.1.0 Latest Latest Go to latest Published: Mar 8, 2026 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/MichaelAyles/goformer

Links

Open Source Insights

README ¶

goformer

Pure Go BERT-family transformer inference. No CGO. No ONNX. No native dependencies.

import "github.com/MichaelAyles/goformer"

model, err := goformer.Load("./bge-small-en-v1.5")
if err != nil {
    log.Fatal(err)
}

embedding, err := model.Embed("DMA channel configuration")
// embedding is a []float32 of length model.Dims()

What This Is

A Go library that loads BERT-family model weights directly from HuggingFace safetensors format and runs inference to produce embeddings. Point it at a model directory downloaded from HuggingFace, call Embed(), get a float32 slice. No Python export step, no ONNX conversion, no native libraries.

Reference model: BGE-small-en-v1.5 (384-dim, 6 layers, 33M params). Any BERT-family model in safetensors format with a compatible tokeniser should work.

Why

Every existing pure Go option for running transformer models requires ONNX format. To get ONNX, you need a Python environment with transformers, optimum, torch, and onnx — a non-trivial dependency chain just to produce the artifact your Go binary needs. If the point of writing in Go is to escape Python in production, requiring Python in your build pipeline undermines the argument.

goformer loads directly from the canonical safetensors weights published by model authors on HuggingFace. Same files, same format, same config. No intermediate conversion.

API

// Load reads model weights from a HuggingFace model directory
// (config.json, tokenizer.json, model.safetensors).
func Load(path string) (*Model, error)

// Embed produces a normalised embedding vector for the input text.
func (m *Model) Embed(text string) ([]float32, error)

// EmbedBatch produces embeddings for multiple texts, padded to the
// longest sequence and processed together.
func (m *Model) EmbedBatch(texts []string) ([][]float32, error)

// Dims returns the embedding dimensionality (e.g. 384 for BGE-small).
func (m *Model) Dims() int

// MaxSeqLen returns the maximum sequence length the model supports.
func (m *Model) MaxSeqLen() int

That is the entire public surface.

Usage

Download a model from HuggingFace:

# Using git (requires git-lfs)
git clone https://huggingface.co/BAAI/bge-small-en-v1.5

# Or download files manually: config.json, tokenizer.json, model.safetensors

Load and embed:

model, err := goformer.Load("./bge-small-en-v1.5")
if err != nil {
    log.Fatal(err)
}

// Single embedding
vec, err := model.Embed("What is a DMA controller?")

// Batch embedding
vecs, err := model.EmbedBatch([]string{
    "What is a DMA controller?",
    "How does SPI communication work?",
    "Configure the UART baud rate",
})

Compute similarity:

func cosineSimilarity(a, b []float32) float32 {
    var dot float32
    for i := range a {
        dot += a[i] * b[i]
    }
    return dot // vectors are already L2-normalised
}

Performance

Benchmarked with BGE-small-en-v1.5 on Apple M1:

Inference comparison

Input	goformer (pure Go)	PyTorch (CPU)	ONNX Runtime (CPU)
Short (~5 tokens)	154ms	12.9ms	3.0ms
Medium (~11 tokens)	287ms	13.4ms	4.2ms
Long (~40 tokens)	1.1s	13.8ms	8.8ms
Batch of 8	2.4s	22.5ms	17.6ms

goformer is ~10-50x slower than optimised native runtimes. The trade-off is zero native dependencies — no CGO, no ONNX conversion pipeline, no Python in your build. For applications where embedding latency is not the bottleneck (offline indexing, RAG pipelines, document processing), this is an acceptable cost.

Component breakdown

Operation	Time	Allocs
Model load	91ms	261MB
MatMul 384×384	31ms	0.6MB
MatMul 384×1536	158ms	2.3MB
LayerNorm (128×384)	139µs	0
Softmax (12×128×128)	1.0ms	0
GELU (128×1536)	436µs	0
Tokenise	1.9µs	1KB

MatMul dominates inference time. There is headroom for tiling optimisation and SIMD.

How It Works

Safetensors parser reads the binary weight file directly — 8-byte header length, JSON metadata, raw float32 data at specified offsets.
WordPiece tokeniser parses HuggingFace tokenizer.json and produces token IDs + attention masks.
BERT forward pass: embedding lookup → N transformer layers (self-attention + FFN + layer norm with residual connections) → mean pooling → L2 normalisation.
All math is pure Go float32 operations. Matrix multiplication uses loop tiling for cache locality.

Correctness

All outputs validated against the HuggingFace Python transformers library:

Test case	Cosine similarity	Max element-wise diff
`DMA channel configuration`	1.000000	0.000292
`The quick brown fox jumps over the lazy dog`	0.999999	0.000389
`Hello`	0.999999	0.000213
Long paragraph (40 tokens)	0.999999	0.000241
`café résumé naïve` (unicode)	0.999999	0.000211
`Hello, world! How's it going?`	1.000000	0.000199

Token IDs: exact match against Python for all test cases
Batch embeddings match single embeddings

Limitations

CPU only. No GPU acceleration.
Inference only. No training or fine-tuning.
BERT-family only. Encoder models with safetensors weights. Not GPT, T5, or other architectures.
F32 and F16 weights. Float16 weights are converted to float32 at load time.

License

MIT

Documentation ¶

Overview ¶

Package goformer provides pure Go BERT-family transformer inference.

It loads model weights directly from HuggingFace safetensors format and runs inference to produce embeddings. No CGO, no ONNX, no native dependencies.

Quick Start ¶

Point it at a HuggingFace model directory containing config.json, tokenizer.json, and model.safetensors:

model, err := goformer.Load("./bge-small-en-v1.5")
if err != nil {
    log.Fatal(err)
}

embedding, err := model.Embed("DMA channel configuration")
// embedding is a []float32 of length model.Dims()

Supported Models ¶

Any BERT-family encoder model published in safetensors format on HuggingFace should work. The reference model is BGE-small-en-v1.5 (384-dim, 6 layers, 33M params). Both F32 and F16 safetensors weights are supported (F16 is converted to F32 at load time).

Embeddings ¶

Embed and EmbedBatch produce L2-normalised embeddings using mean pooling over non-padding tokens. The output vectors can be compared directly using dot product (equivalent to cosine similarity for unit vectors).

Example ¶

package main

import (
	"fmt"
	"log"

	"github.com/MichaelAyles/goformer"
)

func main() {
	model, err := goformer.Load("./bge-small-en-v1.5")
	if err != nil {
		log.Fatal(err)
	}

	embedding, err := model.Embed("DMA channel configuration")
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("dims: %d\n", len(embedding))
}

Example (Batch) ¶

package main

import (
	"fmt"
	"log"

	"github.com/MichaelAyles/goformer"
)

func main() {
	model, err := goformer.Load("./bge-small-en-v1.5")
	if err != nil {
		log.Fatal(err)
	}

	embeddings, err := model.EmbedBatch([]string{
		"What is a DMA controller?",
		"How does SPI communication work?",
		"Configure the UART baud rate",
	})
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("batch size: %d, dims: %d\n", len(embeddings), len(embeddings[0]))
}

Example (Similarity) ¶

package main

import (
	"fmt"
	"log"

	"github.com/MichaelAyles/goformer"
)

func main() {
	model, err := goformer.Load("./bge-small-en-v1.5")
	if err != nil {
		log.Fatal(err)
	}

	a, _ := model.Embed("What is a DMA controller?")
	b, _ := model.Embed("Direct memory access configuration")

	// Dot product of L2-normalised vectors equals cosine similarity.
	var similarity float32
	for i := range a {
		similarity += a[i] * b[i]
	}

	fmt.Printf("similarity: %.4f\n", similarity)
}

Examples ¶

Package
Package (Batch)
Package (Similarity)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

Model holds a loaded BERT-family transformer model ready for inference. A Model is safe for concurrent use by multiple goroutines.

func Load ¶

func Load(path string) (*Model, error)

Load reads model weights, config, and tokeniser from a HuggingFace model directory. The directory must contain config.json, tokenizer.json, and a .safetensors weight file. Both F32 and F16 safetensors are supported.

func (*Model) Dims ¶

func (m *Model) Dims() int

Dims returns the embedding dimensionality (e.g. 384 for BGE-small-en-v1.5).

func (*Model) Embed ¶

func (m *Model) Embed(text string) ([]float32, error)

Embed produces a normalised embedding vector for the input text. The returned slice has length Model.Dims. Texts longer than Model.MaxSeqLen tokens are truncated.

func (*Model) EmbedBatch ¶

func (m *Model) EmbedBatch(texts []string) ([][]float32, error)

EmbedBatch produces embeddings for multiple texts. All inputs are padded to the longest sequence in the batch and processed together. Each returned slice has length Model.Dims.

func (*Model) MaxSeqLen ¶

func (m *Model) MaxSeqLen() int

MaxSeqLen returns the maximum sequence length the model supports (e.g. 512 for BERT). Inputs longer than this are truncated during tokenisation.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL