Documentation
¶
Overview ¶
Package model provides weight loading and model serialization for gorch.
Index ¶
- func CausalLMLoss(model *nn.GPT, tokens []int) *g.Tensor
- func DownloadGPT2(modelName, dir string) error
- func ExportLinearToONNX(l *nn.Linear, batchSize int, path string) error
- func ExportSequentialToONNX(seq *nn.Sequential, inputShape []int, path string) error
- func Generate(model *nn.GPT, tokenIDs []int, maxNewTokens int) []int
- func GenerateText(model *nn.GPT, tok *BPETokenizer, prompt string, cfg GenerateConfig) string
- func GenerateWithConfig(model *nn.GPT, tokenIDs []int, cfg GenerateConfig) []int
- func LoadGPT2(dir string, cfg GPT2Config) (*nn.GPT, error)
- func LoadGPT2Verbose(dir string, cfg GPT2Config) (*nn.GPT, error)
- func LoadModelWeights(path string, params []*g.Tensor, nameMap map[string]int) error
- func SaveModelWeights(path string, params []*g.Tensor, nameMap map[int]string) error
- func SaveSafetensors(path string, tensors map[string]*g.Tensor) error
- type BPETokenizer
- type GPT2Config
- type GenerateConfig
- type KVCache
- type ONNXFile
- type ONNXNodeInfo
- type SafetensorsFile
- type SafetensorsHeader
- type SimpleTokenizer
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CausalLMLoss ¶
CausalLMLoss runs the model forward on tokens[:n-1] and returns the next-token cross-entropy loss against tokens[1:]. The returned tensor is a scalar with autograd attached, ready for Backward() followed by an optimiser step.
This is a thin convenience for the standard LM training pattern where the input and target sequences differ by a one-position shift. It does not handle batching across sequences (the rest of gorch operates one sequence at a time), so for multi-sequence training, sum or average losses across sequences in the caller.
All parameters created by nn.NewGPT already have requires_grad=true, and LoadGPT2 preserves that, so fine-tuning a pretrained model requires nothing more than calling this loss + optim.Step in a loop.
func DownloadGPT2 ¶
DownloadGPT2 downloads model files for a HuggingFace GPT-2 model.
func ExportLinearToONNX ¶
ExportLinearToONNX is a convenience wrapper for a single Linear layer (for tests and minimal models).
func ExportSequentialToONNX ¶
func ExportSequentialToONNX(seq *nn.Sequential, inputShape []int, path string) error
ExportSequentialToONNX serialises a Sequential of supported layers to an ONNX model file at path. inputShape is the static shape of the input tensor — typically (batch, features) for MLPs or (batch, channels, H, W) for CNNs. A symbolic batch dimension is emitted as -1 so downstream tools handle dynamic batch.
Supported layers:
- nn.Linear → Gemm (with transB=1)
- nn.ReLUModule → Relu
- nn.SigmoidModule → Sigmoid
- nn.TanhModule → Tanh
- nn.Conv2d → Conv
- nn.MaxPool2d → MaxPool
- nn.Flatten → Flatten (axis=1)
Returns an error if a layer type is unsupported.
func Generate ¶
Generate produces text by autoregressively sampling from a GPT model. Uses greedy decoding (argmax).
func GenerateText ¶
func GenerateText(model *nn.GPT, tok *BPETokenizer, prompt string, cfg GenerateConfig) string
GenerateText produces text from a prompt using the given model and config.
func GenerateWithConfig ¶
func GenerateWithConfig(model *nn.GPT, tokenIDs []int, cfg GenerateConfig) []int
GenerateWithConfig generates tokens with temperature, top-k, and top-p sampling.
func LoadGPT2 ¶
func LoadGPT2(dir string, cfg GPT2Config) (*nn.GPT, error)
LoadGPT2 loads a pretrained GPT-2 model from safetensors. Handles the GPT-2 Conv1D convention (transposed weights) and fused QKV.
func LoadGPT2Verbose ¶
func LoadGPT2Verbose(dir string, cfg GPT2Config) (*nn.GPT, error)
LoadGPT2Verbose is like LoadGPT2 but prints a one-line summary to log after a successful load. Existing tools that rely on the old log line can switch to this; new code should prefer LoadGPT2 (silent).
func LoadModelWeights ¶
LoadModelWeights loads a safetensors file and maps weights to a named parameter map. nameMap maps safetensors tensor names to model parameter indices.
func SaveModelWeights ¶
SaveModelWeights saves model parameters to a safetensors file.
Types ¶
type BPETokenizer ¶
type BPETokenizer struct {
Encoder map[string]int // token string → ID
Decoder map[int]string // ID → token string
BPERanks map[[2]string]int // merge pair → priority rank
VocabSize int
ByteEncode map[byte]rune // byte → unicode char mapping
ByteDecode map[rune]byte // unicode char → byte mapping
}
BPETokenizer implements byte-pair encoding tokenization. Compatible with GPT-2/GPT-NeoX style vocab.json + merges.txt.
func LoadTokenizer ¶
func LoadTokenizer(vocabPath, mergesPath string) (*BPETokenizer, error)
LoadTokenizer loads a BPE tokenizer from vocab.json and merges.txt.
func (*BPETokenizer) Decode ¶
func (t *BPETokenizer) Decode(ids []int) string
Decode converts token IDs back to text.
func (*BPETokenizer) Encode ¶
func (t *BPETokenizer) Encode(text string) []int
Encode converts text to token IDs.
func (*BPETokenizer) EncodeBatch ¶
func (t *BPETokenizer) EncodeBatch(texts []string) [][]int
EncodeBatch encodes texts in parallel using GOMAXPROCS workers. Output order matches input order. The tokenizer's read-only state (Encoder, ByteEncode, BPE merges) makes per-text Encode safe for concurrent use.
For inputs of more than a few short strings this is a noticeable win over a sequential loop in caller code, especially when the caller is embedding many small chunks (RAG, retrieval, classification).
type GPT2Config ¶
GPT2Config holds GPT-2 architecture parameters.
func GPT2Small ¶
func GPT2Small() GPT2Config
GPT2Small returns the config for openai-community/gpt2 (124M params).
func TinyStories1M ¶
func TinyStories1M() GPT2Config
TinyStories1M returns the config for roneneldan/TinyStories-1M.
type GenerateConfig ¶
type GenerateConfig struct {
MaxNewTokens int // maximum tokens to generate
Temperature float32 // 0 = greedy, >0 = sample with temperature
TopK int // 0 = disabled, >0 = sample from top-K
TopP float32 // 0 = disabled, >0 = nucleus sampling threshold
StopToken int // -1 = disabled, otherwise stop at this token
UseKVCache bool // true = incremental decoding via KV cache
}
GenerateConfig controls text generation behavior.
func DefaultGenerateConfig ¶
func DefaultGenerateConfig() GenerateConfig
DefaultGenerateConfig returns sensible defaults for text generation.
func GreedyConfig ¶
func GreedyConfig(maxTokens int) GenerateConfig
GreedyConfig returns config for deterministic greedy decoding.
type KVCache ¶
type KVCache struct {
Keys [][][]float32 // [layer][head] → flat (seqSoFar * headDim)
Values [][][]float32 // [layer][head] → flat (seqSoFar * headDim)
Layers int
Heads int
HeadDim int
SeqLen int // number of tokens cached so far
}
KVCache stores precomputed key-value pairs for efficient autoregressive generation. This avoids recomputing attention for all previous tokens.
func NewKVCache ¶
NewKVCache creates an empty KV cache for a model.
type ONNXFile ¶
type ONNXFile struct {
Tensors map[string]*g.Tensor // initializer name → tensor
Names []string // initializer names in file order
Nodes []ONNXNodeInfo // graph node summary (for inspection)
IRVer int64
Producer string
}
ONNXFile is a partial parse of an ONNX model. We only decode the pieces gorch can act on today — initializer tensors keyed by name — so users can load weights from any ONNX producer without implementing the full graph spec. Op nodes are recorded as a flat list for inspection but not executed.
type ONNXNodeInfo ¶
ONNXNodeInfo is a lightweight summary of a graph node for inspection. We don't reconstruct execution from these.
type SafetensorsFile ¶
SafetensorsFile represents a loaded safetensors file.
func LoadSafetensors ¶
func LoadSafetensors(path string) (*SafetensorsFile, error)
LoadSafetensors loads a .safetensors file and returns all tensors.
Safetensors format:
- 8 bytes: little-endian uint64 header length
- N bytes: JSON header mapping tensor name → {dtype, shape, data_offsets}
- Remaining: raw tensor data
Supports F32, F16 (converted to F32), and BF16 (converted to F32).
Streams tensor data: only the JSON header and one tensor's raw bytes are alive at any time, plus the running set of decoded F32 tensors. For a 622 MB file this drops peak transient RSS from ~1.24 GB (raw bytes + decoded floats both alive) to roughly the size of the largest single tensor + decoded total. See issue #10.
type SafetensorsHeader ¶
type SafetensorsHeader struct {
DType string `json:"dtype"`
Shape []int `json:"shape"`
Offsets [2]int `json:"data_offsets"`
}
SafetensorsHeader represents the metadata for one tensor in a safetensors file.
type SimpleTokenizer ¶
SimpleTokenizer is a minimal character-level tokenizer for testing. Maps each unique byte to a token ID.
func NewSimpleTokenizer ¶
func NewSimpleTokenizer(text string) *SimpleTokenizer
NewSimpleTokenizer creates a character-level tokenizer from a text corpus.
func (*SimpleTokenizer) Decode ¶
func (t *SimpleTokenizer) Decode(ids []int) string
func (*SimpleTokenizer) Encode ¶
func (t *SimpleTokenizer) Encode(text string) []int
func (*SimpleTokenizer) VocabSize ¶
func (t *SimpleTokenizer) VocabSize() int