goformer

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2026 License: MIT Imports: 9 Imported by: 0

README

goformer logo

goformer

Pure Go BERT-family transformer inference. No CGO. No ONNX. No native dependencies.

import "github.com/MichaelAyles/goformer"

model, err := goformer.Load("./bge-small-en-v1.5")
if err != nil {
    log.Fatal(err)
}

embedding, err := model.Embed("DMA channel configuration")
// embedding is a []float32 of length model.Dims()

What This Is

A Go library that loads BERT-family model weights directly from HuggingFace safetensors format and runs inference to produce embeddings. Point it at a model directory downloaded from HuggingFace, call Embed(), get a float32 slice. No Python export step, no ONNX conversion, no native libraries.

Reference model: BGE-small-en-v1.5 (384-dim, 6 layers, 33M params). Any BERT-family model in safetensors format with a compatible tokeniser should work.

Why

Every existing pure Go option for running transformer models requires ONNX format. To get ONNX, you need a Python environment with transformers, optimum, torch, and onnx — a non-trivial dependency chain just to produce the artifact your Go binary needs. If the point of writing in Go is to escape Python in production, requiring Python in your build pipeline undermines the argument.

goformer loads directly from the canonical safetensors weights published by model authors on HuggingFace. Same files, same format, same config. No intermediate conversion.

API

// Load reads model weights from a HuggingFace model directory
// (config.json, tokenizer.json, model.safetensors).
func Load(path string) (*Model, error)

// Embed produces a normalised embedding vector for the input text.
func (m *Model) Embed(text string) ([]float32, error)

// EmbedBatch produces embeddings for multiple texts, padded to the
// longest sequence and processed together.
func (m *Model) EmbedBatch(texts []string) ([][]float32, error)

// Dims returns the embedding dimensionality (e.g. 384 for BGE-small).
func (m *Model) Dims() int

// MaxSeqLen returns the maximum sequence length the model supports.
func (m *Model) MaxSeqLen() int

That is the entire public surface.

Usage

  1. Download a model from HuggingFace:

    # Using git (requires git-lfs)
    git clone https://huggingface.co/BAAI/bge-small-en-v1.5
    
    # Or download files manually: config.json, tokenizer.json, model.safetensors
    
  2. Load and embed:

    model, err := goformer.Load("./bge-small-en-v1.5")
    if err != nil {
        log.Fatal(err)
    }
    
    // Single embedding
    vec, err := model.Embed("What is a DMA controller?")
    
    // Batch embedding
    vecs, err := model.EmbedBatch([]string{
        "What is a DMA controller?",
        "How does SPI communication work?",
        "Configure the UART baud rate",
    })
    
  3. Compute similarity:

    func cosineSimilarity(a, b []float32) float32 {
        var dot float32
        for i := range a {
            dot += a[i] * b[i]
        }
        return dot // vectors are already L2-normalised
    }
    

Performance

Benchmarked with BGE-small-en-v1.5 on Apple M1:

Inference comparison
Input goformer (pure Go) PyTorch (CPU) ONNX Runtime (CPU)
Short (~5 tokens) 154ms 12.9ms 3.0ms
Medium (~11 tokens) 287ms 13.4ms 4.2ms
Long (~40 tokens) 1.1s 13.8ms 8.8ms
Batch of 8 2.4s 22.5ms 17.6ms

goformer is ~10-50x slower than optimised native runtimes. The trade-off is zero native dependencies — no CGO, no ONNX conversion pipeline, no Python in your build. For applications where embedding latency is not the bottleneck (offline indexing, RAG pipelines, document processing), this is an acceptable cost.

Component breakdown
Operation Time Allocs
Model load 91ms 261MB
MatMul 384×384 31ms 0.6MB
MatMul 384×1536 158ms 2.3MB
LayerNorm (128×384) 139µs 0
Softmax (12×128×128) 1.0ms 0
GELU (128×1536) 436µs 0
Tokenise 1.9µs 1KB

MatMul dominates inference time. There is headroom for tiling optimisation and SIMD.

How It Works

  1. Safetensors parser reads the binary weight file directly — 8-byte header length, JSON metadata, raw float32 data at specified offsets.
  2. WordPiece tokeniser parses HuggingFace tokenizer.json and produces token IDs + attention masks.
  3. BERT forward pass: embedding lookup → N transformer layers (self-attention + FFN + layer norm with residual connections) → mean pooling → L2 normalisation.
  4. All math is pure Go float32 operations. Matrix multiplication uses loop tiling for cache locality.

Correctness

All outputs validated against the HuggingFace Python transformers library:

Test case Cosine similarity Max element-wise diff
DMA channel configuration 1.000000 0.000292
The quick brown fox jumps over the lazy dog 0.999999 0.000389
Hello 0.999999 0.000213
Long paragraph (40 tokens) 0.999999 0.000241
café résumé naïve (unicode) 0.999999 0.000211
Hello, world! How's it going? 1.000000 0.000199
  • Token IDs: exact match against Python for all test cases
  • Batch embeddings match single embeddings

Limitations

  • CPU only. No GPU acceleration.
  • Inference only. No training or fine-tuning.
  • BERT-family only. Encoder models with safetensors weights. Not GPT, T5, or other architectures.
  • F32 and F16 weights. Float16 weights are converted to float32 at load time.

License

MIT

Documentation

Overview

Package goformer provides pure Go BERT-family transformer inference.

It loads model weights directly from HuggingFace safetensors format and runs inference to produce embeddings. No CGO, no ONNX, no native dependencies.

Quick Start

Point it at a HuggingFace model directory containing config.json, tokenizer.json, and model.safetensors:

model, err := goformer.Load("./bge-small-en-v1.5")
if err != nil {
    log.Fatal(err)
}

embedding, err := model.Embed("DMA channel configuration")
// embedding is a []float32 of length model.Dims()

Supported Models

Any BERT-family encoder model published in safetensors format on HuggingFace should work. The reference model is BGE-small-en-v1.5 (384-dim, 6 layers, 33M params). Both F32 and F16 safetensors weights are supported (F16 is converted to F32 at load time).

Embeddings

Embed and EmbedBatch produce L2-normalised embeddings using mean pooling over non-padding tokens. The output vectors can be compared directly using dot product (equivalent to cosine similarity for unit vectors).

Example
package main

import (
	"fmt"
	"log"

	"github.com/MichaelAyles/goformer"
)

func main() {
	model, err := goformer.Load("./bge-small-en-v1.5")
	if err != nil {
		log.Fatal(err)
	}

	embedding, err := model.Embed("DMA channel configuration")
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("dims: %d\n", len(embedding))
}
Example (Batch)
package main

import (
	"fmt"
	"log"

	"github.com/MichaelAyles/goformer"
)

func main() {
	model, err := goformer.Load("./bge-small-en-v1.5")
	if err != nil {
		log.Fatal(err)
	}

	embeddings, err := model.EmbedBatch([]string{
		"What is a DMA controller?",
		"How does SPI communication work?",
		"Configure the UART baud rate",
	})
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("batch size: %d, dims: %d\n", len(embeddings), len(embeddings[0]))
}
Example (Similarity)
package main

import (
	"fmt"
	"log"

	"github.com/MichaelAyles/goformer"
)

func main() {
	model, err := goformer.Load("./bge-small-en-v1.5")
	if err != nil {
		log.Fatal(err)
	}

	a, _ := model.Embed("What is a DMA controller?")
	b, _ := model.Embed("Direct memory access configuration")

	// Dot product of L2-normalised vectors equals cosine similarity.
	var similarity float32
	for i := range a {
		similarity += a[i] * b[i]
	}

	fmt.Printf("similarity: %.4f\n", similarity)
}

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model holds a loaded BERT-family transformer model ready for inference. A Model is safe for concurrent use by multiple goroutines.

func Load

func Load(path string) (*Model, error)

Load reads model weights, config, and tokeniser from a HuggingFace model directory. The directory must contain config.json, tokenizer.json, and a .safetensors weight file. Both F32 and F16 safetensors are supported.

func (*Model) Dims

func (m *Model) Dims() int

Dims returns the embedding dimensionality (e.g. 384 for BGE-small-en-v1.5).

func (*Model) Embed

func (m *Model) Embed(text string) ([]float32, error)

Embed produces a normalised embedding vector for the input text. The returned slice has length Model.Dims. Texts longer than Model.MaxSeqLen tokens are truncated.

func (*Model) EmbedBatch

func (m *Model) EmbedBatch(texts []string) ([][]float32, error)

EmbedBatch produces embeddings for multiple texts. All inputs are padded to the longest sequence in the batch and processed together. Each returned slice has length Model.Dims.

func (*Model) MaxSeqLen

func (m *Model) MaxSeqLen() int

MaxSeqLen returns the maximum sequence length the model supports (e.g. 512 for BERT). Inputs longer than this are truncated during tokenisation.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL