poly

package

v0.74.0 Latest Latest Go to latest Published: Mar 17, 2026 License: Apache-2.0 Imports: 21 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/openfluke/loom

Links

README ¶

M-POLY-VTD Architecture

Multi-numerical POLYmorphic Volumetric Tiled-tensor Dispatcher

M-POLY-VTD is a next-generation neural inference engine designed for high-performance, mixed-precision workloads. It treats the neural network not as a sequential stack, but as a spatial 3D grid where layers can morph their numerical precision on-the-fly.

Core Pillars

I. Multi-Numerical Architecture (M-POLY)

The engine provides a "Universal Dispatcher" supporting native forward and backward passes across 21 distinct numerical types.

Supported Types:
- High-Precision: Float64, Int64, Uint64
- Standard: Float32, Int32, Uint32, Int16, Uint16
- Optimized: Float16, BFloat16, Int8, Uint8
- Low-Bit: FP8 (E4M3/E5M2), Int4, Uint4, FP4 (E2M1)
- Extreme: Int2, Uint2, Ternary (-1, 0, 1), Binary (1-bit)
CNN/ConvTransposed Support: Native support for LayerCNN1-3 and LayerConvTransposed1-3 (1D, 2D, and 3D) across all numerical types.
Transformer Support: Native LayerMultiHeadAttention (with RoPE, GQA/MQA, and Causal Masking), LayerSwiGLU, LayerRMSNorm, and LayerLayerNorm.
RNN Support: Native support for LayerRNN and LayerLSTM with full polymorphism.
Universal Softmax Engine: Exhaustive LayerSoftmax support including Standard, Grid, Hierarchical, Gumbel, Masked, Sparsemax, and Entmax (1.5) across all 21 types.
Universal Nesting & Training: Support for LayerParallel (add, avg, concat, filter/MoE) and LayerSequential with recursive Activation/Gradient Trees for deep, trainable hierarchies.
Embedding/KMeans Support: Efficient LayerEmbedding lookups and differentiable LayerKMeans clustering.
Bandwidth Optimization: Targets a 75-80% reduction in weight size, specifically designed to break the memory bandwidth bottleneck on consumer hardware (e.g., Turing/GTX 1650 Super).

II. Polymorphic Layer-Morphing (POLY)

Every layer is a polymorphic unit capable of metamorphosis.

Dynamic DType Management: Uses a WeightStore system with an FP32 "Master" source of truth.
Metamorphosis: Layers can swap between active numerical representations (e.g., FP32 -> INT8 -> FP4) instantly during Quantization-Aware Training (QAT) or inference benchmarks.
Native Fast-Paths: The dispatcher automatically selects specialized arithmetic paths for standard Go types to ensure "actual" performance gains rather than mere simulation.

III. Volumetric Tensor Dispatch (VTD)

Replaces the traditional 2D sequential execution with a 3D Volumetric Coordinate System (Depth, Row, Col, Layer).

Spatial Hopping: Enables recursive passing and 3D spatial routing via IsRemoteLink. Any layer can "hop" across coordinates, simulating biological feedback loops.
Recursive Backpropagation: A hierarchical training system that caches intermediates in a "Neural Tree," allowing signals to flow bidirectionally through arbitrary nesting.
Tiling Strategy: Built for future GPU integration where each 3D coordinate maps to a Shared Memory workgroup tile, aiming for a 70+ token/s performance ceiling for models like SmolLM2.

IV. Hierarchical Spatial Correlation Engine (DNA)

The DNA engine converts neural structures into topological "signatures," enabling high-fidelity comparison across disparate numerical families (e.g., FP64 vs. Binary).

Topological Reconstruction: Generates a 3D genetic blueprint of the network via ExtractDNA.
Similarity Index (SI): Quantifies model overlap using directional geometry (Cosine Similarity) rather than raw weight parity.
Logic Drift Detection: Automatically tracks "Logic Shifts" where functional behavior has migrated across 3D coordinates.
Comparative Evolution: Designed to map neural development down to the neuron level, identifying overlapping structures in heterogeneous models.

V. Native Bit-Packed Persistence

The framework provides an Idempotent Serialization Tunnel designed for extreme storage efficiency.

Transparent Bit-Packing: Low-bit models (FP4, Binary, etc.) are natively packed into bit-streams during I/O, achieving up to 98.4% compression on disk.
Automated Unpacking: Models are stored in their native DType but automatically Unpack into RAM-compatible formats during deserialization, ensuring high-speed inference.
Bit-Perfect Identity: Verified across 378/378 permutations (18 Layers x 21 DTypes) with 0.000000% mathematical divergence.
Idempotency Verified: Serializing a reloaded model produces a byte-for-byte identical JSON to the original.

VI. Neural Target Propagation (TargetProp)

A bidirectional learning alternative to traditional backpropagation that bridges the gap beTargetProp actual activations and idealized targets.

True Target Estimation: Heuristically estimates what a layer should have produced by aggregating importance signals through weights (high-fidelity support for RNN/LSTM weight mappings).
Gap-Based Learning: Updates weights using a Hebbian-style delta = learningRate * input * gap logic, bypassing the chain rule for localized, non-differentiable optimization.
Mesh Fidelity (Link Budgets): Accurately calculates info-preservation (Cosine Similarity) across the mesh.
Gated Learning: Automatically prevents weight corruption in "dead layers" (Alignment < 0.2) via dynamic Link Budget gating.

Performance & Verification

A comprehensive suite is provided to measure the speed, memory, and bit-level fidelity of the polymorphic dispatcher.

Running the Verification Demo

To see bit-perfect parity and view the 98% compression metrics in seconds:

go run tva/poly/helpers/serialization_demo.go

Running the Benchmarks

To view the raw performance/memory throughput for all 21 types:

go run tva/poly/example.go

To run the WebGPU versus CPU Tiling showdown:

go run tva/poly/benchmark_tiling.go

To run the end-to-end GPU training showdown (all supported layer architectures):

go run tva/poly/benchmark_training_comparison.go

TypeScript / WASM Implementation Verification

To verify the @openfluke/welvet isomorphic (Browser/Node.js) bridge, a comprehensive 36-count diagnostic and performance suite is provided.

Run verification:

cd welvet/typescript
npm test

Verified Results (Loom v0.74.0):

[PASS] Internal WASM Exports (8/8)
[PASS] Network Wrapper Methods (16/16)
[PASS] NEAT Population Methods (8/8)
[PASS] Functional Smoke Tests (Sequential, DNA, SwiGLU) (4/4)

TS/WASM Training Showdown Benchmark

Measured in test.ts (Node.js/tsx) using the isomorphic @openfluke/welvet bridge:

Layer	Fwd ms/it	Train ms	Init Loss	Final Loss	Sanity
Dense (Linear)	0.985	166.4	0.1488	0.1488	REAL
RMSNorm	0.269	19.7	0.0063	0.0063	REAL
SwiGLU (MLP)	5.375	1713.9	0.0000	0.0000	REAL
Embedding	0.498	44.6	0.0067	0.0067	REAL
Residual Add	0.204	17.1	0.0859	0.0859	REAL
MHA (Fused)	0.457	32.8	0.0216	0.0101	REAL

Key Performance Insights

98.4% Storage Compression: Binary models are compressed from multi-byte pointers down to 1-bit payloads, breaking the memory bandwidth wall.
0.000000% Divergence: Verified bit-perfect parity across 378 model permutations.
GPU Inference: WebGPU kernels deliver massive speedups over CPU tiling, especially for volumetric operations like CNNs (up to 7000x+).
GPU Training: Full end-to-end GPU backward training is live. 17x–65x training speedups over CPU on real workloads, with a single command buffer submission per batch (BeginFrame/FlushFrame pattern).

GPU Forward / Inference (CPU Tiling vs GPU)

=== M-POLY-VTD Performance Showdown: CPU Tiling vs GPU Acceleration ===
| Layer type      | CPU (Simple) | CPU (Tiled)  | GPU (WebGPU) | Speedup (vs Tiled) | Deterministic | Sanity        |
|-----------------|--------------|--------------|--------------|-------------------|---------------|---------------|
| Dense (Linear)  | 4.79952ms    | 5.42286ms    | 400.08µs     | 13.55x            | SLIGHTLY OFF ⚠️ | REAL 💎       |
| RNN Cell        | 2.09993ms    | 2.61017ms    | 231µs        | 11.30x            | EXACT ⭐       | REAL 💎       |
| LSTM Cell       | 8.14321ms    | 7.03973ms    | 153.46µs     | 45.87x            | EXACT ⭐       | REAL 💎       |
| CNN 1D          | 8.12412ms    | 4.33881ms    | 194.54µs     | 22.30x            | EXACT ⭐       | REAL 💎       |
| CNN 2D          | 362.33425ms  | 182.6935ms   | 100.07µs     | 1825.66x          | EXACT ⭐       | REAL 💎       |
| CNN 3D          | 10.07534167s | 1.5223089s   | 200.24µs     | 7602.42x          | EXACT ⭐       | REAL 💎       |
| Embedding       | 320.86µs     | 217.05µs     | 109.77µs     | 1.98x             | EXACT ⭐       | REAL 💎       |
| RMSNorm         | 1.16247ms    | 1.15767ms    | 102.77µs     | 11.26x            | INDUSTRY ✅    | REAL 💎       |
| MHA (Attn)      | 210.01µs     | 417.27µs     | 258.55µs     | 1.61x             | BROKEN ❌      | REAL 💎       |
| SwiGLU (MLP)    | 11.48634ms   | 7.83584ms    | 3.08049ms    | 2.54x             | BROKEN ❌      | REAL 💎       |
| Residual Add    | 0s           | 0s           | 953.41µs     | N/A               | BROKEN ❌      | REAL 💎       |

GPU End-to-End Training (20 epochs, CPU vs GPU)

All runs share a single pre-initialised WGPUContext. Weights are copied CPU→GPU before each GPU run for a fair starting-point comparison.

=== M-POLY-VTD Multi-Architecture Training Showdown ===
| Architecture                         | CPU Time | GPU Time | Speedup | CPU Loss Δ | GPU Loss Δ |
|--------------------------------------|----------|----------|---------|------------|------------|
| Dense MLP  (128→512→512→8)           | 12.1s    | 693ms    | 17.5x   | –72.3%     | –71.8%     |
| CNN 1D     (3ch×128 → 32f→64f → 8)  | 29.7s    | 811ms    | 36.6x   | –68.4%     | –67.9%     |
| CNN 2D     (3ch×32×32 → 16f→32f→8)  | 1m57s    | 1.81s    | 64.8x   | –61.2%     | –60.5%     |
| CNN 3D     (2ch×8×8×8 → 8f → 8)     | 3.2s     | 461ms    | 6.9x    | –55.1%     | –54.7%     |
| RMSNorm MLP (128→Dense512→Norm→512→8)| 12.6s    | 711ms    | 17.7x   | –73.1%     | –72.6%     |
| Deep Dense (128→512×4→8)             | 31.7s    | 1.23s    | 25.7x   | –69.8%     | –69.2%     |

Measured on GTX 1650 Super (Vulkan/WebGPU), Windows 10. Batch sizes: 64 (Dense/RMSNorm), 32 (CNN1D), 16 (CNN2D), 8 (CNN3D).

Per-Layer Gradient Correctness (DX / DW parity, CPU vs GPU)

| Layer          | DX (input grad)     | DW (weight grad)    | Notes                                    |
|----------------|---------------------|---------------------|------------------------------------------|
| Dense          | EXACT ⭐             | EXACT ⭐             | Tiling bug fixed (dyTile indexing)       |
| RMSNorm        | EXACT ⭐             | EXACT ⭐             |                                          |
| CNN 1D         | EXACT ⭐             | EXACT ⭐             |                                          |
| CNN 2D         | EXACT ⭐             | EXACT ⭐             |                                          |
| CNN 3D         | EXACT ⭐             | EXACT ⭐             |                                          |
| Embedding      | — (discrete)        | EXACT ⭐             | DX intentionally zero (index lookup)    |
| MHA            | OK ✅ (dQ)          | — (pending)         | Writes separate dQ/dK/dV buffers        |
| SwiGLU         | BROKEN ❌           | —                   | Not yet in DispatchBackwardLayer         |

GPU backward training support status:

Full end-to-end GPU training: Dense · RMSNorm · CNN 1D/2D/3D
Pending wiring into DispatchBackwardLayer: SwiGLU · MHA · Embedding

The Bedrock Philosophy

M-POLY-VTD is a "Bedrock Edition" neural engine. Unlike standard frameworks that build on top of high-level abstractions, this architecture is designed at the bit-level to bypass the physical memory limitations of consumer hardware.

Shader-First Design: The Go implementation is a direct blueprint for GPU kernels.
Hardware-Agnostic: By supporting 21 numerical types, we can run on anything from a GTX 1650 to an H100 by simply "Morphing" the precision to what the specific silicon prefers.

Architectural Design Choices

1. Unified Package Structure (`poly/`)

Decision: Keeping all layers (dense.go, mha.go, etc.) in the same poly package.

Rationale: To avoid circular dependency hell common in Go polymorphic systems. All layers share a unified view of the VolumetricLayer and WeightStore types, allowing for seamless, fast internal dispatch without the overhead of public interfaces.

2. The Morphic WeightStore (`WeightStore`)

Decision: Using a master float32 weight-set with a map[DType]any versioning system.

Rationale: This is the heart of "Metamorphosis." It allows a layer to hold multiple numerical personalities at once. We can keep "Master" weights for training and instantly swap to packed FP4 for an inference burst without re-allocating buffers.

3. Volumetric 3D Dispatch (`VTD`)

Decision: Replacing 1D sequential stacks with a 3D coordinate-based grid (Depth, Row, Col).

Rationale: Standard 1D stacks are a bottleneck. The 3D grid maps directly to GPU workgroup tiles. It also enables "Spatial Hopping"—recursive feedback loops that mimic biological neural firing. By treating the network as a mesh, we unlock non-linear data flows (Parallel Expert Gating, Skip-Connections) that are impossible in sequential pipelines.

4. Systolic Grid Propagation (Neural Mesh)

Unlike the standard sequential flow, the Systolic Engine treats the 3D grid as a cycle-accurate discrete-time mesh.

Neural Clock: Every coordinate fires simultaneously in a single "pulse" or clock cycle.
Double Buffering: Prevents race conditions, ensuring a stable wave of data through space-time.
Spatial Feedback: Remote links can hop signals backwards in coordinates, creating dynamic recurrence (RNN-like behavior) across the 3D mesh.
BPTT (Backpropagation Through Time): Gradients are unrolled through clock cycles and spatial junctions, allowing the grid to learn complex temporal patterns.
Dynamic Learning Bridge: Supports poly.SystolicApplyTargetProp for localized, gap-based learning that updates the mesh in real-time based on temporal performance. o_O

[!TIP] Use poly.SystolicForward and poly.SystolicApplyTargetProp when you need a "living network" that evolves and learns over time rather than a static pipeline. o_O

5. Recursive Neural Trees (`Tensor.Nested`)

Decision: Implementing a recursive Nested field in the Tensor struct.

Rationale: To support nesting (Parallel/Sequential) without losing the ability to train. This creates an Activation Tree during the forward pass and a Gradient Tree during the backward pass, establishing a "Plug-and-Learn" bedrock where any complex sub-architecture is automatically differentiable.

6. Explicit Numerical Fast-Paths

Decision: Using manual switch statements and type-casting instead of reflection.

Rationale: In high-speed inference, reflection is too slow. We write the INT8 and FLOAT32 loops explicitly to ensure the compiler generates the fastest possible arithmetic for the "Reference Logic."

7. The "Simulation vs. Throughput" Strategy

Decision: Supporting types the CPU doesn't natively have (like FP4, 2-bit, 1-bit).

Rationale: We are building the Logic Bedrock first. On CPU, these incur a "Simulation Tax," but on GPU they become Native Bit-Packed Payloads, which is where the 10x performance leap occurs.

The 3 Planes of Polymorphism (Hardcore Edition)

M-POLY-VTD pushes Go’s type system into a realm of fluid identity that exceeds standard AI frameworks. It operates across three distinct planes:

1. Parametric Polymorphism (Generics)

Utilizes the [T Numeric] constraint system. The engine is "Tensor-Blind"; it doesn't care if the underlying signal is float32, int16, or uint8. It processes the mesh math as a universal operation, enabling a single codebase to support any tensor format.

2. Ad-hoc Polymorphism (The Dispatcher)

The DispatchLayer registry acts as a high-speed Runtime Jump Table. A 3D coordinate in the mesh only assumes its "Functional Identity" (Dense, MHA, SwiGLU) at the moment of execution, allowing for infinite spatial variety within the same volumetric structure.

3. Numerical Metamorphosis (Dynamic Identity)

This is the "Bedrock" secret. Unlike static frameworks where a layer has a fixed type, our layers exhibit Metamorphosis. A single layer can exist as FP32 (for precision), morph to INT8 (for training stability), and project into FP4/Binary (for inference throughput) instantly without re-allocating memory.

The GPU "Fusion" Secret: Why the Dispatcher Refactor Matters

You might wonder why we moved the switch statement into a DispatchLayer registry. On CPU, it looks like a simple "cleanup," but on GPU, it is a Mission-Critical Optimization:

1. Avoiding "Thread Divergence"

On a GPU, thousands of threads run in blocks. If those threads hit a messy, nested switch statement inside a loop, they will "diverge" (some threads wait while others branch). By isolating the dispatch, we enable Kernel Fusion—the GPU can launch one massive shader that handles an entire "Tile" of the 3D grid if the layers are the same type.

2. Batched Metamorphosis

When a block of layers needs to "Morph" (e.g., FP32 -> FP4), the GPU is most efficient when it does this in Parallel Batches. The DispatchLayer structure allows the engine to group these memory switches together, performing a single "Massive Bit-Pack" rather than 100 small ones.

3. Asynchronous Predispatch

Because the Dispatcher is decoupled from the 3D Coordinate loop, the GPU driver can "look ahead." While it calculates the math for Layer (Z=1), it can already be "Predispatching" the weights for Layer (Z=2) into the fast Shared Memory (SRAM).

⚡ Performance Roadmap: Bridging the "Ollama" Speed Gap

Currently, M-POLY-VTD utilizes Naive Global Offloading. This translates to massive volumetric speedups (like the ~7600x boost on CNN 3D) and blazing fast prefill speeds where thousands of tokens process simultaneously (e.g., 260+ tok/s on an Apple M4).

However, during autoregressive decoding (generating one token at a time), the engine is required to bounce back and forth between the Go CPU coordinator and the WebGPU driver to queue up over 100 individual kernels per token. This introduces CPU overhead that vendor-specific engines like llama.cpp bypass.

What is implemented today:

Zero-Dependency GPU Shaders: No CUDA, no CGO, native hardware acceleration across Metal (Mac), Vulkan (Windows/Linux), and DX12 using WebGPU.
Massive Prefill Throughput: Unleashes GPU bandwidth for prompt processing, far outperforming CPU implementations like quick_talk.go.
Workgroup / Register Tiling: Explicit unrolling of logic directly into GPU registers to bypass shared memory barrier bottlenecks on heterogeneous WebGPU backends.

What is coming next to achieve 70+ Tok/s Decoding:

True Kernel Fusion (DispatchQKV_And_Attention): Currently, Poly executes distinct shaders for Q, K, V, and Attention to maintain its morphic flexibility. Fusing these into a single monolithic shader will prevent the GPU from writing intermediate activations back to global VRAM, keeping data tightly locked in ultra-fast SRAM.
FlashAttention Integration: Rewriting the attnOut score calculation to calculate Softmax incrementally in tiny "tiles" inside the GPU's registers, mathematically eliminating the need to allocate the massive (SeqLen * SeqLen) attention matrix in global memory.
Command Graph Buffering: Refactoring the Go runtime queue to compile the entire forward pass into a single Command Graph (an executable GPU node tree). This allows Poly to submit a single dispatch call ("Render 1 token") and put the CPU to sleep, entirely eliminating the kernel submission driver bottleneck.

The Path to 70+ Tokens/Sec

This architecture is specifically optimized for Turing-class GPUs (like the GTX 1650 Super).

Stage 1 (Current): Build the Universal Dispatcher, bit-logic in Go, and native WebGPU register tiling.
Stage 2: Implement True Kernel Fusion to merge linear projections with activation functions.
Stage 3: Move the "Unpacking Logic" (e.g., eight FP4 values per U32) entirely into WebGPU Shaders to break the memory wall.
Stage 4: Implement Command Graph Buffering to eliminate Go-to-Driver queue overhead, matching vendor-specific C++ inference speeds.

M-POLY-VTD: Universal precision. Volumetric freedom. Bedrock performance.

Omni-Neural Framework: The Road to v1.0.0

To build a true "Universal AI Framework" from first principles, we must map out every theoretical and practical requirement across the entire AI industry.

Version 1.0.0 will only be achieved when EVERY SINGLE ITEM on this exhaustive checklist is natively supported.

Our semantic version number directly reflects our progress against this absolute, industry-scale roadmap. By calculating the ratio of completed features to the total required features, we derive our exact technical version.

1. Core Engine & Numerical Precision

1.1 Standard Floating-Point Types

FP64 (Double Precision - Scientific / Accumulation)
FP32 (Single Precision - Baseline)
FP16 (Half Precision)
BF16 (Brain Float - ML Standard)

1.2 Low-Precision & Bit-Level Types

FP8 E4M3 (Activations / Weights)
INT8 WebGPU Inference Kernels (quantized matmul natively in WGSL shader)
FP4 E2M1 (Standard Bitwise Extreme Compression)
NVFP4 (NVIDIA-flavor FP4 Compatibility)

1.3 Integer & Fixed-Point Infrastructure

INT64, INT32, INT16, INT8
UINT64, UINT32, UINT16, UINT8
INT4 / UINT4 (Packed Weight Storage)
Bit-Packed Nibble Tensors (4-bit representation)
Quantization-Aware Scaling (Fixed-point factor logic)

1.4 GPU Numerical Acceleration

FP16/BF16 GPU Training (native half-precision WGSL forward + backward kernels)
Mixed Precision Training Loop (FP16 forward, FP32 gradient accumulation)
On-Device Weight Dequant Shader (FP4/INT8 weight unpacking inside WGSL, no CPU roundtrip)

1.5 Quantization & Numerical Deep-Dive

Bitwise MAC (Multiply-Accumulate) for E2M1 CPU
Bitwise MatMul for E2M1 GPU (WebGPU)
On-the-fly Max/Min Statistics Collection (Layer Observers)
Dynamic Scale Calibration (Row-wise quantization)
Gradient Checkpointing (recompute activations to reduce peak VRAM)
Post-Training Quantization (PTQ) weight conversion passes
Truncated BPTT (windowed gradient unroll for systolic long-sequence training)

1.6 GPU Backward Pass Completion

Real-valued Automatic Differentiation
SwiGLU GPU Backward Wiring (resolve BROKEN status in benchmark table)
MHA GPU Backward Wiring (resolve PENDING status in benchmark table)

Numerical Progress: 20 / 32

2. Architectural Components & Layers

2.1 Foundational Layers

Linear / Dense / Fully Connected
Convolutional 1D
Convolutional 2D
Convolutional 3D / Volumetric
Embeddings & Lookup Tables

2.2 Sequence & Temporal Layers

Basic RNN (Recurrent Neural Network)
LSTM (Long Short-Term Memory)
GRU (Gated Recurrent Unit)

2.3 Attention & Transformer Mechanisms

Multi-Head Attention (MHA)
Grouped-Query Attention (GQA) & Multi-Query Attention (MQA)
RoPE (Rotary Position Embedding)
Sliding Window / Sparse Attention (O(n) local attention for long contexts)
GPU Command Graph Buffering (compile full forward pass into a single dispatch call)

2.4 Feed-Forward & Activations

Standard Activations (ReLU, GELU, Tanh, Sigmoid, Swish, Mish)
Softmax (10 variants: Standard, Grid, Hierarchical, Temperature, Gumbel, Masked, Sparsemax, Entmax, Adaptive, Mixture)
SwiGLU / Gated Linear Units

2.5 Normalization & Modern Layer Architectures

LayerNorm
RMSNorm
Depthwise Separable Conv (1D/2D/3D) (edge-optimized mobile convolutions)
Mamba / SSM Layer (state space model, O(n) sequence modeling alternative to transformers)

2.6 Advanced Topological Structures

Residual & Skip Connections
Sequential & Parallel Branching
Mixture of Experts (MoE) Routing Mechanisms
Parallel Grid Scattering (Spatial Distribution)
Layer Ensembles & Complementary Match Discovery
LoRA Adapter Layer (low-rank fine-tuning primitive wrapping existing Dense layer)
DNA Splice / Genetic Crossover (merge two trained network DNAs into a child architecture)
NEAT-style Topology Evolution (structural NAS: add/remove nodes and edges genetically)
K-Means / Differentiable Clustering Layers

2.7 Introspection & Telemetry

Network Blueprint Extraction (Structure & Parameter Counts)
Recursive Layer Inspection
Memory Usage Analysis
Dynamic Grid Topology Visualization
Reflection-based Method Discovery (JSON API Export)
Observer-pattern Layer Monitoring

Architectural Progress: 30 / 35

3. Edge-First Orchestration & Efficiency

3.1 Device-Aware Compute

Thermal-Throttling Aware Scheduling (Dynamic load balancing)
Power-Profile Execution Modes (Low-power / Balanced / Performance)
Background Task Lifecycle Management (Mobile OS compatibility)

3.2 Memory & I/O Optimization

Unified Memory (UMA) Buffer Pinning (Apple Silicon/Snapdragon optimizations)
Memory-Mapped (mmap) Model Weights (Zero-copy loading)
Circular/Evicting KV-Cache (VRAM-efficient infinite context)
Asynchronous IO/Compute Overlap (UI responsiveness)

3.3 Hardware Acceleration & Adaptation

NPU / Apple Neural Engine (ANE) / NNAPI Backend support
On-Device Low-Rank Adaptation (LoRA-lite fine-tuning)
Low-Bit Inference Kernels (Non-standard 2-bit/1-bit targets)

Edge Optimization Progress: 0 / 10

4. Advanced Training Logic & Automation

4.1 Execution Flow

Static Computation Graphs
Dynamic Computation Graphs (Define-by-run)
Atomic Time-Step execution (StepForward/StepBackward)
Neural Tweening / Hybrid Geometric Training
Neural Tweening Chain Rule Support
Gradient Explosion Detection & Damping

4.2 Optimizers & Schedulers

Standard Optimizers (SGD, AdamW, RMSProp)
Higher-order Optimizers (L-BFGS, K-FAC)
8 Variants of Learning Rate Schedulers
Adaptive Rate Calculation (VGStepBP)
Tweening Momentum & Link-Budgeting
Adaptation Performance Tracking (Recovery Metrics)
GPU Accelerated Training Loop (FP32 end-to-end WebGPU: forward + backward + weight update in a single command buffer submission)

4.3 Automated Evolutionary Logic

DARTS (gradient-based architecture search via differentiable mixed-op supernet)
Neural Architecture Search (NAS)
Random Architecture Generation & Mutation
Speculative Decoding (draft model + verify for faster autoregressive token generation)

Automation Progress: 13 / 16

5. Deployment, Compilation & Ecosystem

5.1 Backends

Deterministic Pure CPU Backend (Go framework)
WebGPU JIT Compiled Backend (WGPU)
Native CUDA Backend
Metal / ROCm Backends
Specialized Edge/AI Accelerator / NPU Backend

5.2 Compiler Integration

Kernel Fusion (Translating sequential operations into single SRAM-bound kernels to eliminate memory bottleneck)
Triton eDSL / WGSL AST transpilation
MLIR (Multi-Level Intermediate Representation) Lowering passes

5.3 Polyglot Ecosystem & I/O

Universal C-ABI Core API
Python Bindings (welvet) — Published to PyPI
Node.js / TypeScript Bindings (@openfluke/welvet)
C# / .NET Bindings
Java Bindings
Dart Bindings
WebAssembly (WASM) browser execution
Universal SafeTensors Support (Load / Save / V2 Multi-type)
HuggingFace Checkpoint Interoperability (Weight Extraction)

5.4 Benchmarks & Validation

ARC-AGI Task Benchmark (K-Means Implementation)
Numerical Deviation Metrics (Accuracy Heatmaps)
Task-Switching Adaptation Benchmarks
Model Ensemble Diversity Metrics
Training Method Comparison Analysis

Ecosystem Progress: 16 / 22

6. LLM Engine & Tokenization

6.1 Tokenization Core

BPE (Byte-Pair Encoding) Implementation
HuggingFace tokenizer.json Compatibility
ChatML & Prompt Template Engine
Recursive Multi-turn Turn Tracking

6.2 Generation Logic

KV Cache Optimization (Stateful incremental inference)
Batched Prefill & Autoregressive Decoding
Sampling Suite (Top-K, Temperature, Nucleus Placeholder)
Repetition Penalty & Windowed Logit Bias
Deterministic vs Stochastic Inference Modes
Real-time Token Streaming (Streamer primitives)

6.3 LLM Tooling & Profiling

HuggingFace Hub Cache Auto-Discovery
FP4 Quantized Specialist Chat Implementation
WebGPU LM-Head Offloading
VRAM Usage Profiling & Distribution Metrics

LLM Progress: 15 / 15

📊 True Version Calculation

Instead of arbitrarily bumping version numbers, we derive our exact semantic version by measuring the framework's strictly verified capabilities against the absolute "Universal Version 1.0.0" checklist.

Category	Completed	Total
1. Numerical Core	20	32
2. Architectural Layers	30	35
3. Edge Orchestration	0	10
4. Training Automation	13	16
5. Deployment Ecosystem	19	22
6. LLM & Tokenization	15	15
GRAND TOTAL	97	130

Completion Ratio: 74.6%

Version 0.74.0 — Complete

(Status: 0.74.0 "Polyglot Bridge" is now fully shipped. Mathematical tensor representations and local architectural structures are robustly established up to transformer scale. The full polyglot bridge is now live: TypeScript/WASM, Python (welvet on PyPI), Go, C#, Java, Dart, and Browser (WASM/WebGPU) bindings are all stable and verified. Numerical precision support is exceptionally deep, with native FP4 acceleration on both CPU (Dense/SwiGLU) and GPU (MHA/RoPE/CNN). WebGPU offloading is fully verified with 7000x+ spatial speedups on inference and 17x–65x on end-to-end GPU training (Dense/CNN/RMSNorm). The GPU training backend batches the entire forward pass + backward pass + weight updates into a single command buffer submission per batch. Local LLM token generation is cross-platform via WebGPU. Next milestone: v0.8.0 — wiring SwiGLU/MHA/Embedding into DispatchBackwardLayer and transitioning to specialized Edge-First orchestration (Thermal-Awareness, UMA, Command Buffer Graphing) required for mobile and wearable deployment.)

Documentation ¶

Rendered for

Overview ¶

DNA Engine: Hierarchical Spatial Correlation Engine --------------------------------------------------- A topological reconstruction system for neural networks. Converts structural signatures (LayerType, DType, weights) into 3D directional geometry for high-fidelity comparison across diverse numerical families.

Evolution Engine: DNA Splice & NEAT-style Topology Evolution ------------------------------------------------------------ Extends the DNA Engine (dna.go) with two capabilities:

DNA Splice / Genetic Crossover Takes two trained parent networks, compares their NetworkDNA, and produces a child network whose weights are blended from both parents, guided by per-layer cosine similarity scores.
NEAT-style Topology Evolution Mutates a network's topology (layer types, activations, remote-link connections) and weights without destroying learned structure. Supports a full population-based evolution loop via NEATPopulation.

Index ¶

Constants
Variables
func Activate[T Numeric](v T, act ActivationType) T
func ActivateDerivative[T Numeric](v T, act ActivationType) T
func AlignedFloat32(n int) []float32
func ApplyRecursiveGradients(layer *VolumetricLayer, gradWeights *Tensor[float32], lr float32)
func ApplyTargetPropGaps[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], lr float32)
func BindGroupKeyHash(pipeline *wgpu.ComputePipeline, buffers ...*wgpu.Buffer) uint64
func CalculateLoss[T Numeric](output, target *Tensor[T], lossType string) float64
func CalculateOptimalGPUTileSizeFromLimits(sharedMemBytes, maxInvocations uint32, headDim int) int
func CalculateOptimalTileSize(headDim int) int
func CastWeights[T Numeric](weights any) []T
func ComputeSilhouetteScore[T Numeric](data []*Tensor[T], assignments []int) float32
func ConvertSlice[In Numeric, Out Numeric](in []In) []Out
func CosineDistance[T Numeric](a []T, b []float32) float32
func CosineSimilarity(s1, s2 LayerSignature) float32
func DequantizeQ4_0(blocks []Q4_0Block, n int) []float32
func EuclideanDistance[T Numeric](a []T, b []float32) float32
func EuclideanDistanceT[T Numeric](a, b []T) float32
func GetDeviceDescription(net *VolumetricNetwork) string
func GetLogits[T Numeric](data []T, temp float64, dtype DType) []float32
func GroupRelatedTensors(detected []DetectedTensor) map[string][]DetectedTensor
func HierarchicalGroup[T Numeric](data []*Tensor[T], threshold float32) []int
func KMeansCluster[T Numeric](data []*Tensor[T], k int, maxIter int, parallel bool) (centroids [][]float32, assignments []int)
func LoadSafetensors(filepath string) (map[string][]float32, error)
func LoadSafetensorsFromBytes(data []byte) (map[string][]float32, error)
func LoadSafetensorsWithShapes(data []byte) (map[string]TensorWithShape, error)
func LoadUniversalDetailed(path string) (int, []LayerArchetype, []int, []TensorMeta, error)
func LoadWithPrefixes(net *VolumetricNetwork, tensors map[string][]float32) error
func MajorityVote(outputs [][]int) []int
func MorphLayer(layer *VolumetricLayer, target DType) error
func MultiNetworkEvaluation[T Numeric](models map[string]*VolumetricNetwork, inputs []*Tensor[T], expected []float64) (map[string]*DeviationMetrics, error)
func Normalize(v []float32) []float32
func PerformanceSimilarity(mA, mB ModelPerformance) float64
func PrintEnsembleReport(matches []EnsembleMatch, topN int)
func PrintMultiNetworkSummary(results map[string]*DeviationMetrics)
func SampleTopK(logits []float32, topK int, temperature float32, deterministic bool) int
func SerializeNetwork(net *VolumetricNetwork) ([]byte, error)
func ShaderDenseBackwardDW(tileSize int) string
func ShaderDenseBackwardDX(tileSize int) string
func ShaderTiledDenseN(tileSize int) string
func ShaderTiledDenseQ4(tileSize int) string
func ShaderTiledMHAN(tileSize, headDim int) string
func ShaderTiledSwiGLUN(tileSize int) string
func ShaderTiledSwiGLUQ4(tileSize int) string
func SimulatePrecision(wVal float32, dtype DType, scale float32) float32
func Softmax(logits []float32) []float32
func SoftmaxBackward(gradOutput, softmaxOutput []float32) []float32
func SoftmaxEntmaxHelper(logits []float32, alpha float32) []float32
func SoftmaxSparseHelper(logits []float32) []float32
func SystolicApplyTargetProp[T Numeric](n *VolumetricNetwork, s *SystolicState[T], globalTarget *Tensor[T], lr float32)
func SystolicForward[T Numeric](n *VolumetricNetwork, s *SystolicState[T], captureHistory bool) time.Duration
func TargetPropBackward[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], target *Tensor[T])
func TargetPropBackwardChainRule[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], target *Tensor[T])
func TargetPropBackwardTargetProp[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], target *Tensor[T])
type ActivationType
- func ParseActivationType(s string) ActivationType
- func (a ActivationType) String() string
type AdaptationResult
type AdaptationTracker
- func NewAdaptationTracker(winDur, totalDur time.Duration) *AdaptationTracker
- func (at *AdaptationTracker) Finalize() *AdaptationResult
- func (at *AdaptationTracker) RecordOutput(correct bool)
- func (at *AdaptationTracker) Start(initialTask string, initialTaskID int)
type AggregatingObserver
- func NewAggregatingObserver(windowSize int) *AggregatingObserver
- func (o *AggregatingObserver) OnBackward(e PolyLayerEvent)
- func (o *AggregatingObserver) OnForward(e PolyLayerEvent)
type ArchConfig
type BindGroupKey
type BrainType
- func (bt BrainType) String() string
type ComparisonResult
- func NewComparisonResult(name string, numLayers int) *ComparisonResult
- func (cr *ComparisonResult) DetermineBest() string
type ConsoleObserver
- func (o *ConsoleObserver) OnBackward(e PolyLayerEvent)
- func (o *ConsoleObserver) OnForward(e PolyLayerEvent)
type DType
- func ParseDType(s string) DType
- func (d DType) String() string
type DetectedTensor
type DeviationBucket
type DeviationMetrics
- func EvaluateNetworkPolymorphic[T Numeric](n *VolumetricNetwork, inputs []*Tensor[T], expected []float64) (*DeviationMetrics, error)
- func NewDeviationMetrics() *DeviationMetrics
- func (dm *DeviationMetrics) ComputeFinalMetrics()
- func (dm *DeviationMetrics) PrintSummary()
- func (dm *DeviationMetrics) UpdateMetrics(result PredictionResult)
type EnsembleMatch
- func FindComplementaryMatches(models []ModelPerformance, minCoverage float64) []EnsembleMatch
type GenOptions
type HTTPObserver
- func NewHTTPObserver(url string) *HTTPObserver
- func (o *HTTPObserver) OnBackward(e PolyLayerEvent)
- func (o *HTTPObserver) OnForward(e PolyLayerEvent)
type HardwareInfo
- func GetHardwareInfo() HardwareInfo
type LayerArchetype
- func ProbeDeepGeometry(geoms []TensorMeta) ([]LayerArchetype, []int)
type LayerSignature
type LayerSpec
type LayerStats
- func ComputeLayerStats[T Numeric](t *Tensor[T]) LayerStats
type LayerTelemetry
- func ExtractLayerTelemetry(l VolumetricLayer) LayerTelemetry
type LayerType
- func ParseLayerType(s string) LayerType
- func (t LayerType) String() string
type LogicShift
type MergePair
type MethodInfo
type ModelPerformance
type ModelTelemetry
- func ExtractNetworkBlueprint(n *VolumetricNetwork, modelID string) ModelTelemetry
type NEATConfig
- func DefaultNEATConfig(dModel int) NEATConfig
type NEATPopulation
- func NewNEATPopulation(seed *VolumetricNetwork, size int, cfg NEATConfig) *NEATPopulation
- func (p *NEATPopulation) Best() *VolumetricNetwork
- func (p *NEATPopulation) BestFitness() float64
- func (p *NEATPopulation) Evolve(fitnessFn func(*VolumetricNetwork) float64)
- func (p *NEATPopulation) Summary(generation int) string
type NetworkBlueprint
type NetworkComparisonResult
- func CompareNetworks(dna1, dna2 NetworkDNA) NetworkComparisonResult
type NetworkDNA
- func ExtractDNA(n *VolumetricNetwork) NetworkDNA
type NetworkSpec
type Numeric
type PairWithIndex
type ParameterInfo
type PersistenceLayerSpec
type PersistenceNetworkSpec
type PolyGradientObserver
type PolyLayerEvent
type PolyObserver
type PreTokenizer
- func (pt *PreTokenizer) SplitWithSpecialTokens(text string, specialTokens map[string]int) []string
type PredictionResult
- func EvaluatePrediction(sampleIndex int, expected, actual float64) PredictionResult
type PrefixWeightMapper
- func NewPrefixWeightMapper() *PrefixWeightMapper
- func (m *PrefixWeightMapper) Find(tensors map[string][]float32, role string) []float32
- func (m *PrefixWeightMapper) MapWeights(tensors map[string][]float32) (embeddings, lmHead, finalNorm []float32, hasFinalNorm bool)
type Q4_0Block
- func QuantizeQ4_0(weights []float32) []Q4_0Block
type SafetensorsHeader
type SoftmaxType
- func ParseSoftmaxType(s string) SoftmaxType
- func (s SoftmaxType) String() string
type SpliceConfig
- func DefaultSpliceConfig() SpliceConfig
type SpliceResult
- func SpliceDNAWithReport(parentA, parentB *VolumetricNetwork, cfg SpliceConfig) SpliceResult
type Streamer
- func NewStreamer(decode func(tokens []uint32) string, promptTokens []uint32) *Streamer
- func (s *Streamer) HasNewUserTurn(allTokens []uint32) bool
- func (s *Streamer) Push(allTokens []uint32)
- func (s *Streamer) String() string
type SystolicState
- func NewSystolicState[T Numeric](n *VolumetricNetwork) *SystolicState[T]
- func (s *SystolicState[T]) SetInput(input *Tensor[T])
type TargetPropConfig
- func DefaultTargetPropConfig() *TargetPropConfig
type TargetPropState
- func NewTargetPropState[T Numeric](n *VolumetricNetwork, config *TargetPropConfig) *TargetPropState[T]
- func (s *TargetPropState[T]) CalculateLinkBudgets()
type TaskChange
type Template
- func (t Template) BuildNextTurnSegment(userMsg string) string
- func (t Template) BuildPrompt(turns []Turn, systemPrompt string, userMsg string) string
type Tensor
- func BackwardPolymorphic[T Numeric](n *VolumetricNetwork, gradOutput *Tensor[T], inputs, preActs []*Tensor[T]) (gradInput *Tensor[T], layerGradients [][2]*Tensor[T], ...)
- func CNN1BackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func CNN1BackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func CNN1ForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func CNN1ForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func CNN2BackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func CNN2BackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func CNN2ForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func CNN2ForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func CNN3BackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func CNN3BackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func CNN3ForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func CNN3ForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func ComputeLossGradient[T Numeric](output, target *Tensor[T], lossType string) *Tensor[T]
- func ConvTransposed1DBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func ConvTransposed1DForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func ConvTransposed2DBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func ConvTransposed2DForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func ConvTransposed3DBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func ConvTransposed3DForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func ConvertTensor[In Numeric, Out Numeric](in *Tensor[In]) *Tensor[Out]
- func DenseBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func DenseForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func DenseForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func DispatchLayer[T Numeric](layer *VolumetricLayer, input, skip *Tensor[T]) (preAct, postAct *Tensor[T])
- func DispatchLayerBackward[T Numeric](layer *VolumetricLayer, gradOutput, input, skip, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func EmbeddingBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func EmbeddingBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func EmbeddingForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func EmbeddingForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func ForwardPolymorphic[T Numeric](n *VolumetricNetwork, input *Tensor[T]) (*Tensor[T], time.Duration, []time.Duration)
- func KMeansBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func KMeansForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func LSTMBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func LSTMBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func LSTMForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func LSTMForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func LayerNormBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func LayerNormForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func MHABackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func MHAForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func MHAForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func NewTensor[T Numeric](shape ...int) *Tensor[T]
- func NewTensorFromSlice[T Numeric](data []T, shape ...int) *Tensor[T]
- func ParallelBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func ParallelForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func RMSNormBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func RMSNormForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func RNNBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func RNNBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func RNNForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func RNNForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func ResidualBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func ResidualBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func ResidualForwardPolymorphic[T Numeric](layer *VolumetricLayer, input, skip *Tensor[T]) (preAct, postAct *Tensor[T])
- func ResidualForwardTiled[T Numeric](layer *VolumetricLayer, input, skip *Tensor[T]) (preAct, postAct *Tensor[T])
- func SequentialBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func SequentialForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func SoftmaxBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, postAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func SoftmaxForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func SwiGLUBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func SwiGLUBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])
- func SwiGLUForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func SwiGLUForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])
- func SystolicBackward[T Numeric](n *VolumetricNetwork, s *SystolicState[T], gradOutput *Tensor[T]) (gradIn *Tensor[T], layerGradients [][2]*Tensor[T], err error)
- func TargetPropForward[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], input *Tensor[T]) *Tensor[T]
- func (t *Tensor[T]) Add(other *Tensor[T])
- func (t *Tensor[T]) Clone() *Tensor[T]
type TensorInfo
type TensorMeta
type TensorWithShape
type TimeWindow
type Tokenizer
- func LoadTokenizer(path string) (*Tokenizer, error)
- func (t *Tokenizer) Decode(ids []uint32, skipSpecialTokens bool) string
- func (t *Tokenizer) Encode(text string, addSpecialTokens bool) []uint32
type TokenizerJSON
type TrainingBatch
type TrainingConfig
- func DefaultTrainingConfig() *TrainingConfig
type TrainingMetrics
- func NewTrainingMetrics() TrainingMetrics
type TrainingResult
- func Train[T Numeric](n *VolumetricNetwork, batches []TrainingBatch[T], config *TrainingConfig) (*TrainingResult, error)
type Transformer
- func NewTransformer[T Numeric](network *VolumetricNetwork, embeddings, lmHead, finalNorm []float32, ...) *Transformer[T]
- func (t *Transformer[T]) EnableTiling(tileSize int)
- func (t *Transformer[T]) ForwardTokenIDsWGPU(tokens []uint32, input *Tensor[T], computeLogits bool, onlyLast bool) (*Tensor[T], error)
- func (t *Transformer[T]) ForwardWGPU(input *Tensor[T]) (*Tensor[T], error)
- func (t *Transformer[T]) Generate(encode func(text string) []uint32, decode func(tokens []uint32) string, ...) string
- func (t *Transformer[T]) Reset()
- func (t *Transformer[T]) SyncToGPU() error
type Turn
type VolumetricLayer
- func CreateResidualGraft(main *VolumetricNetwork) *VolumetricLayer
- func GraftNetworksPolymorphic(networks []*VolumetricNetwork, combineMode string) (*VolumetricLayer, error)
- func ReconstructCNNLayer(name string, tensors []DetectedTensor, ltype LayerType) (*VolumetricLayer, error)
- func ReconstructLayerNormLayer(name string, tensors []DetectedTensor, dModel int) (*VolumetricLayer, error)
- func ReconstructMHALayer(name string, tensors []DetectedTensor, dModel int, numHeads int) (*VolumetricLayer, error)
- func ReconstructRMSNormLayer(name string, tensors []DetectedTensor, dModel int) (*VolumetricLayer, error)
- func ReconstructSwiGLULayer(name string, tensors []DetectedTensor, dModel int) (*VolumetricLayer, error)
- func (l *VolumetricLayer) SyncToCPU()
- func (l *VolumetricLayer) SyncToGPU() error
type VolumetricNetwork
- func BuildCNN(inputSize, numClasses int, dtype DType) *VolumetricNetwork
- func BuildNetworkFromJSON(jsonData []byte) (*VolumetricNetwork, error)
- func BuildRandomNetwork(depth, rows, cols, lpc int, dModel int) *VolumetricNetwork
- func BuildSequentialNetwork(numLayers int, dModel int, act ActivationType, dtype DType) *VolumetricNetwork
- func BuildTransformerNetwork(numBlocks int, dModel int, numHeads int, dtype DType) *VolumetricNetwork
- func DeserializeNetwork(jsonData []byte) (*VolumetricNetwork, error)
- func LoadUniversal(path string) (*VolumetricNetwork, error)
- func MountGeometrically(archs []LayerArchetype, geoms []TensorMeta) *VolumetricNetwork
- func NEATMutate(n *VolumetricNetwork, cfg NEATConfig) *VolumetricNetwork
- func NewVolumetricNetwork(depth, rows, cols, layersPerCell int) *VolumetricNetwork
- func SpliceDNA(parentA, parentB *VolumetricNetwork, cfg SpliceConfig) *VolumetricNetwork
- func (n *VolumetricNetwork) CalculateTotalMemory() int
- func (n *VolumetricNetwork) GetIndex(z, y, x, l int) int
- func (n *VolumetricNetwork) GetLayer(z, y, x, l int) *VolumetricLayer
- func (n *VolumetricNetwork) GetMethodSignature(methodName string) (string, error)
- func (n *VolumetricNetwork) GetMethods() ([]MethodInfo, error)
- func (n *VolumetricNetwork) GetMethodsJSON() (string, error)
- func (n *VolumetricNetwork) HasMethod(methodName string) bool
- func (n *VolumetricNetwork) InitCNNCell(z, y, x, l int, ltype LayerType, inChannels, filters, kSize int, dtype DType, ...)
- func (n *VolumetricNetwork) InitConvTransposedCell(z, y, x, l int, ltype LayerType, inChannels, filters, kSize int, dtype DType, ...)
- func (n *VolumetricNetwork) InitDenseCell(z, y, x, l int, dModel int, act ActivationType, scale float32)
- func (n *VolumetricNetwork) InitEmbeddingCell(z, y, x, l int, vocabSize, dModel int, dtype DType)
- func (n *VolumetricNetwork) InitKMeansCell(z, y, x, l int, numClusters, dModel int, dtype DType)
- func (n *VolumetricNetwork) InitLSTMCell(z, y, x, l int, dModel int, scale float32)
- func (n *VolumetricNetwork) InitLayerNormCell(z, y, x, l int, size int, dtype DType)
- func (n *VolumetricNetwork) InitMHACell(z, y, x, l int, dModel, numHeads int, scale float32)
- func (n *VolumetricNetwork) InitRNNCell(z, y, x, l int, dModel int, scale float32)
- func (n *VolumetricNetwork) InitWGPU() error
- func (n *VolumetricNetwork) ListMethods() []string
- func (n *VolumetricNetwork) SyncAllToGPU() error
- func (n *VolumetricNetwork) SyncToGPU() error
type WGPUActivationParams
type WGPUApplyGradientsParams
type WGPUCNN1BackwardParams
type WGPUCNN1Params
type WGPUCNN2BackwardParams
type WGPUCNN2Params
type WGPUCNN3BackwardParams
type WGPUCNN3Params
type WGPUContext
- func (c *WGPUContext) BeginFrame() error
- func (c *WGPUContext) CreateComputePipeline(shaderSource string) (*wgpu.ComputePipeline, error)
- func (c *WGPUContext) CreatePersistentBuffer(data []float32, label string) (*wgpu.Buffer, error)
- func (c *WGPUContext) DispatchActivation(size int, act ActivationType, inputBuf, outputBuf *wgpu.Buffer) error
- func (c *WGPUContext) DispatchActivationBackward(size int, act ActivationType, gradOutBuf, preActBuf, gradInBuf *wgpu.Buffer) error
- func (c *WGPUContext) DispatchApplyGradients(size int, lr float32, weightBuf, gradBuf *wgpu.Buffer) error
- func (c *WGPUContext) DispatchBackwardLayer(l *VolumetricLayer, batchSize int, ...) error
- func (c *WGPUContext) DispatchCNN1(batchSize, inC, inL, outC, outL, kSize, stride, padding int, ...) error
- func (c *WGPUContext) DispatchCNN1BackwardDW(batchSize, inC, inL, filters, outL, kSize, stride, padding int, ...) error
- func (c *WGPUContext) DispatchCNN1BackwardDX(batchSize, inC, inL, filters, outL, kSize, stride, padding int, ...) error
- func (c *WGPUContext) DispatchCNN2(...) error
- func (c *WGPUContext) DispatchCNN2BackwardDW(batchSize, inC, inH, inW, filters, outH, outW, kSize, stride, padding int, ...) error
- func (c *WGPUContext) DispatchCNN2BackwardDX(batchSize, inC, inH, inW, filters, outH, outW, kSize, stride, padding int, ...) error
- func (c *WGPUContext) DispatchCNN3(...) error
- func (c *WGPUContext) DispatchCNN3BackwardDW(...) error
- func (c *WGPUContext) DispatchCNN3BackwardDX(...) error
- func (c *WGPUContext) DispatchDense(batchSize, inputSize, outputSize int, ...) error
- func (c *WGPUContext) DispatchDenseBackwardDW(batchSize, inputSize, outputSize int, ...) error
- func (c *WGPUContext) DispatchDenseBackwardDX(batchSize, inputSize, outputSize int, ...) error
- func (c *WGPUContext) DispatchDenseQ4(batchSize, inputSize, outputSize int, ...) error
- func (c *WGPUContext) DispatchEmbedding(vocabSize, hiddenSize, numTokens int, ...) error
- func (c *WGPUContext) DispatchEmbeddingBackward(vocabSize, hiddenSize, numTokens int, ...) error
- func (c *WGPUContext) DispatchForwardLayer(l *VolumetricLayer, batchSize int, inputBuf, outBuf *wgpu.Buffer) error
- func (c *WGPUContext) DispatchKVUpdate(offset, headDim, maxSeqLen, numKVHeads, numTokens int, ...) error
- func (c *WGPUContext) DispatchLSTMStep(batchSize, inputSize, hiddenSize int, ...) error
- func (c *WGPUContext) DispatchMHA(numHeads, numKVHeads, headDim, seqLen, kvOffset, maxSeqLen int, ...) error
- func (c *WGPUContext) DispatchMHABackward(batchSize, numHeads, numKVHeads, headDim, seqLen int, scale float32, ...) error
- func (c *WGPUContext) DispatchMSEGradPartialLoss(size int, outputBuf, targetBuf, gradBuf, partialsBuf *wgpu.Buffer) error
- func (c *WGPUContext) DispatchRMSNorm(batchSize, size int, epsilon float32, ...) error
- func (c *WGPUContext) DispatchRMSNormBackward(batchSize, size int, epsilon float32, ...) error
- func (c *WGPUContext) DispatchRNNStep(batchSize, inputSize, hiddenSize int, ...) error
- func (c *WGPUContext) DispatchResidual(size int, inputBuf, residualBuf *wgpu.Buffer) error
- func (c *WGPUContext) DispatchResidualBackward(size int, gradOutputBuf, gradInputBuf, gradResidualBuf *wgpu.Buffer) error
- func (c *WGPUContext) DispatchRoPE(seqLen, headDim, numHeads, offset int, theta float32, targetBuf *wgpu.Buffer) error
- func (c *WGPUContext) DispatchSwiGLU(batchSize, inputSize, outputSize int, ...) error
- func (c *WGPUContext) DispatchSwiGLUBackward(batchSize, inputSize, outputSize int, ...) error
- func (c *WGPUContext) DispatchSwiGLUQ4(batchSize, inputSize, outputSize int, ...) error
- func (c *WGPUContext) FlushFrame()
- func (c *WGPUContext) GetActivationBuffer(name string, size uint64, usage wgpu.BufferUsage) *wgpu.Buffer
- func (c *WGPUContext) GetBindGroup(pipeline *wgpu.ComputePipeline, buffers ...*wgpu.Buffer) (*wgpu.BindGroup, error)
- func (c *WGPUContext) GetUniformBuffer(size uint64) *wgpu.Buffer
- func (c *WGPUContext) ReadBuffer(buf *wgpu.Buffer) ([]float32, error)
- func (c *WGPUContext) Release()
- func (c *WGPUContext) ResetCache()
type WGPUDenseParams
type WGPUEmbeddingParams
type WGPUKVParams
type WGPULSTMParams
type WGPULossParams
type WGPUMHABackwardParams
type WGPUMHAParams
type WGPURMSNormParams
type WGPURNNParams
type WGPURoPEParams
type WeightStore
- func NewWeightStore(size int) *WeightStore
- func (ws *WeightStore) ApplyGradients(gradWeights *Tensor[float32], lr float32)
- func (ws *WeightStore) GetActive(dtype DType) any
- func (ws *WeightStore) Morph(dtype DType)
- func (ws *WeightStore) Randomize(seed int64, scale float32)
- func (ws *WeightStore) SetVersion(dtype DType, data any)
- func (ws *WeightStore) SizeInBytes(dtype DType) int
- func (ws *WeightStore) Unpack(dtype DType)

Constants ¶

View Source

const ShaderActivationBackward = `` /* 972-byte string literal not displayed */

View Source

const ShaderActivationForward = `` /* 789-byte string literal not displayed */

View Source

const ShaderApplyGradients = `` /* 486-byte string literal not displayed */

View Source

const ShaderCNN1 = `` /* 1288-byte string literal not displayed */

View Source

const ShaderCNN1BackwardDW = `
struct Params {
    batchSize: u32,
    inC: u32,
    inL: u32,
    filters: u32,
    outL: u32,
    kSize: u32,
    stride: u32,
    padding: u32,
    activation: u32,
};
@group(0) @binding(0) var<uniform> params: Params;
@group(0) @binding(1) var<storage, read> gradOutput: array<f32>;
@group(0) @binding(2) var<storage, read> input: array<f32>;
@group(0) @binding(3) var<storage, read> preAct: array<f32>;
@group(0) @binding(4) var<storage, read_write> gradWeights: array<f32>;

` + wgslActivateDerivative + `

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
    let tid = global_id.x;
    if (tid >= params.filters * params.inC * params.kSize) { return; }

    let f = tid / (params.inC * params.kSize);
    let rem = tid % (params.inC * params.kSize);
    let ic = rem / params.kSize;
    let k = rem % params.kSize;

    var sum: f32 = 0.0;
    for (var b: u32 = 0u; b < params.batchSize; b++) {
        for (var o: u32 = 0u; o < params.outL; o++) {
            let inPos = i32(o * params.stride) + i32(k) - i32(params.padding);
            if (inPos >= 0 && inPos < i32(params.inL)) {
                let outIdx = b * params.filters * params.outL + f * params.outL + o;
                let dy = gradOutput[outIdx] * activateDerivative(preAct[outIdx], params.activation);
                let inIdx = b * params.inC * params.inL + ic * params.inL + u32(inPos);
                sum += dy * input[inIdx];
            }
        }
    }
    gradWeights[tid] += sum;
}
`

View Source

const ShaderCNN1BackwardDX = `
struct Params {
    batchSize: u32,
    inC: u32,
    inL: u32,
    filters: u32,
    outL: u32,
    kSize: u32,
    stride: u32,
    padding: u32,
    activation: u32,
};
@group(0) @binding(0) var<uniform> params: Params;
@group(0) @binding(1) var<storage, read> gradOutput: array<f32>;
@group(0) @binding(2) var<storage, read> weights: array<f32>;
@group(0) @binding(3) var<storage, read> preAct: array<f32>;
@group(0) @binding(4) var<storage, read_write> gradInput: array<f32>;

` + wgslActivateDerivative + `

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
    let tid = global_id.x;
    if (tid >= params.batchSize * params.inC * params.inL) { return; }

    let b = tid / (params.inC * params.inL);
    let rem = tid % (params.inC * params.inL);
    let ic = rem / params.inL;
    let ip = rem % params.inL;

    var sum: f32 = 0.0;
    for (var f: u32 = 0u; f < params.filters; f++) {
        for (var k: u32 = 0u; k < params.kSize; k++) {
            let val = i32(ip) + i32(params.padding) - i32(k);
            if (val >= 0 && val % i32(params.stride) == 0) {
                let o = u32(val / i32(params.stride));
                if (o < params.outL) {
                    let outIdx = b * params.filters * params.outL + f * params.outL + o;
                    let dy = gradOutput[outIdx] * activateDerivative(preAct[outIdx], params.activation);
                    let kWIdx = f * params.inC * params.kSize + ic * params.kSize + k;
                    sum += dy * weights[kWIdx];
                }
            }
        }
    }
    gradInput[tid] += sum;
}
`

View Source

const ShaderCNN2 = `` /* 2150-byte string literal not displayed */

View Source

const ShaderCNN2BackwardDW = `
struct Params {
    batchSize: u32,
    inC: u32,
    inH: u32,
    inW: u32,
    filters: u32,
    outH: u32,
    outW: u32,
    kSize: u32,
    stride: u32,
    padding: u32,
    activation: u32,
};
@group(0) @binding(0) var<uniform> params: Params;
@group(0) @binding(1) var<storage, read> gradOutput: array<f32>;
@group(0) @binding(2) var<storage, read> input: array<f32>;
@group(0) @binding(3) var<storage, read> preAct: array<f32>;
@group(0) @binding(4) var<storage, read_write> gradWeights: array<f32>;

` + wgslActivateDerivative + `

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
    let tid = global_id.x;
    let weightSize = params.filters * params.inC * params.kSize * params.kSize;
    if (tid >= weightSize) { return; }

    let f = tid / (params.inC * params.kSize * params.kSize);
    let rem = tid % (params.inC * params.kSize * params.kSize);
    let ic = rem / (params.kSize * params.kSize);
    let rem2 = rem % (params.kSize * params.kSize);
    let kh = rem2 / params.kSize;
    let kw = rem2 % params.kSize;

    var sum: f32 = 0.0;
    for (var b: u32 = 0u; b < params.batchSize; b++) {
        for (var oh: u32 = 0u; oh < params.outH; oh++) {
            for (var ow: u32 = 0u; ow < params.outW; ow++) {
                let ih = i32(oh * params.stride) + i32(kh) - i32(params.padding);
                let iw = i32(ow * params.stride) + i32(kw) - i32(params.padding);
                if (ih >= 0 && ih < i32(params.inH) && iw >= 0 && iw < i32(params.inW)) {
                    let outIdx = ((b * params.filters + f) * params.outH + oh) * params.outW + ow;
                    let dy = gradOutput[outIdx] * activateDerivative(preAct[outIdx], params.activation);
                    let inIdx = ((b * params.inC + ic) * params.inH + u32(ih)) * params.inW + u32(iw);
                    sum += dy * input[inIdx];
                }
            }
        }
    }
    gradWeights[tid] += sum;
}
`

View Source

const ShaderCNN2BackwardDX = `
struct Params {
    batchSize: u32,
    inC: u32,
    inH: u32,
    inW: u32,
    filters: u32,
    outH: u32,
    outW: u32,
    kSize: u32,
    stride: u32,
    padding: u32,
    activation: u32,
};
@group(0) @binding(0) var<uniform> params: Params;
@group(0) @binding(1) var<storage, read> gradOutput: array<f32>;
@group(0) @binding(2) var<storage, read> weights: array<f32>;
@group(0) @binding(3) var<storage, read> preAct: array<f32>;
@group(0) @binding(4) var<storage, read_write> gradInput: array<f32>;

` + wgslActivateDerivative + `

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
    let tid = global_id.x;
    let size = params.batchSize * params.inC * params.inH * params.inW;
    if (tid >= size) { return; }

    let b = tid / (params.inC * params.inH * params.inW);
    let rem = tid % (params.inC * params.inH * params.inW);
    let ic = rem / (params.inH * params.inW);
    let rem2 = rem % (params.inH * params.inW);
    let ih = rem2 / params.inW;
    let iw = rem2 % params.inW;

    var sum: f32 = 0.0;
    for (var f: u32 = 0u; f < params.filters; f++) {
        for (var kh: u32 = 0u; kh < params.kSize; kh++) {
            for (var kw: u32 = 0u; kw < params.kSize; kw++) {
                let vh = i32(ih) + i32(params.padding) - i32(kh);
                let vw = i32(iw) + i32(params.padding) - i32(kw);
                if (vh >= 0 && vh % i32(params.stride) == 0 && vw >= 0 && vw % i32(params.stride) == 0) {
                    let oh = u32(vh / i32(params.stride));
                    let ow = u32(vw / i32(params.stride));
                    if (oh < params.outH && ow < params.outW) {
                        let outIdx = ((b * params.filters + f) * params.outH + oh) * params.outW + ow;
                        let dy = gradOutput[outIdx] * activateDerivative(preAct[outIdx], params.activation);
                        let kWIdx = ((f * params.inC + ic) * params.kSize + kh) * params.kSize + kw;
                        sum += dy * weights[kWIdx];
                    }
                }
            }
        }
    }
    gradInput[tid] += sum;
}
`

View Source

const ShaderCNN3 = `` /* 2547-byte string literal not displayed */

View Source

const ShaderCNN3BackwardDW = `
struct Params {
    batchSize: u32,
    inC: u32,
    inD: u32,
    inH: u32,
    inW: u32,
    filters: u32,
    outD: u32,
    outH: u32,
    outW: u32,
    kSize: u32,
    stride: u32,
    padding: u32,
    activation: u32,
};
@group(0) @binding(0) var<uniform> params: Params;
@group(0) @binding(1) var<storage, read> gradOutput: array<f32>;
@group(0) @binding(2) var<storage, read> input: array<f32>;
@group(0) @binding(3) var<storage, read> preAct: array<f32>;
@group(0) @binding(4) var<storage, read_write> gradWeights: array<f32>;

` + wgslActivateDerivative + `

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
    let tid = global_id.x;
    let kVol = params.kSize * params.kSize * params.kSize;
    let weightSize = params.filters * params.inC * kVol;
    if (tid >= weightSize) { return; }

    let f = tid / (params.inC * kVol);
    let rem = tid % (params.inC * kVol);
    let ic = rem / kVol;
    let rem2 = rem % kVol;
    let kd = rem2 / (params.kSize * params.kSize);
    let rem3 = rem2 % (params.kSize * params.kSize);
    let kh = rem3 / params.kSize;
    let kw = rem3 % params.kSize;

    var sum: f32 = 0.0;
    for (var b: u32 = 0u; b < params.batchSize; b++) {
        for (var od: u32 = 0u; od < params.outD; od++) {
            for (var oh: u32 = 0u; oh < params.outH; oh++) {
                for (var ow: u32 = 0u; ow < params.outW; ow++) {
                    let id = i32(od * params.stride) + i32(kd) - i32(params.padding);
                    let ih = i32(oh * params.stride) + i32(kh) - i32(params.padding);
                    let iw = i32(ow * params.stride) + i32(kw) - i32(params.padding);
                    if (id >= 0 && id < i32(params.inD) &&
                        ih >= 0 && ih < i32(params.inH) &&
                        iw >= 0 && iw < i32(params.inW)) {
                        let outIdx = (((b * params.filters + f) * params.outD + od) * params.outH + oh) * params.outW + ow;
                        let dy = gradOutput[outIdx] * activateDerivative(preAct[outIdx], params.activation);
                        let inIdx = (((b * params.inC + ic) * params.inD + u32(id)) * params.inH + u32(ih)) * params.inW + u32(iw);
                        sum += dy * input[inIdx];
                    }
                }
            }
        }
    }
    gradWeights[tid] += sum;
}
`

View Source

const ShaderCNN3BackwardDX = `
struct Params {
    batchSize: u32,
    inC: u32,
    inD: u32,
    inH: u32,
    inW: u32,
    filters: u32,
    outD: u32,
    outH: u32,
    outW: u32,
    kSize: u32,
    stride: u32,
    padding: u32,
    activation: u32,
};
@group(0) @binding(0) var<uniform> params: Params;
@group(0) @binding(1) var<storage, read> gradOutput: array<f32>;
@group(0) @binding(2) var<storage, read> weights: array<f32>;
@group(0) @binding(3) var<storage, read> preAct: array<f32>;
@group(0) @binding(4) var<storage, read_write> gradInput: array<f32>;

` + wgslActivateDerivative + `

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
    let tid = global_id.x;
    let inVol = params.inD * params.inH * params.inW;
    let size = params.batchSize * params.inC * inVol;
    if (tid >= size) { return; }

    let b = tid / (params.inC * inVol);
    let rem = tid % (params.inC * inVol);
    let ic = rem / inVol;
    let rem2 = rem % inVol;
    let id = rem2 / (params.inH * params.inW);
    let rem3 = rem2 % (params.inH * params.inW);
    let ih = rem3 / params.inW;
    let iw = rem3 % params.inW;

    var sum: f32 = 0.0;
    for (var f: u32 = 0u; f < params.filters; f++) {
        for (var kd: u32 = 0u; kd < params.kSize; kd++) {
            for (var kh: u32 = 0u; kh < params.kSize; kh++) {
                for (var kw: u32 = 0u; kw < params.kSize; kw++) {
                    let vd = i32(id) + i32(params.padding) - i32(kd);
                    let vh = i32(ih) + i32(params.padding) - i32(kh);
                    let vw = i32(iw) + i32(params.padding) - i32(kw);
                    if (vd >= 0 && vd % i32(params.stride) == 0 && 
                        vh >= 0 && vh % i32(params.stride) == 0 && 
                        vw >= 0 && vw % i32(params.stride) == 0) {
                        let od = u32(vd / i32(params.stride));
                        let oh = u32(vh / i32(params.stride));
                        let ow = u32(vw / i32(params.stride));
                        if (od < params.outD && oh < params.outH && ow < params.outW) {
                            let outIdx = (((b * params.filters + f) * params.outD + od) * params.outH + oh) * params.outW + ow;
                            let dy = gradOutput[outIdx] * activateDerivative(preAct[outIdx], params.activation);
                            let kWIdx = (((f * params.inC + ic) * params.kSize + kd) * params.kSize + kh) * params.kSize + kw;
                            sum += dy * weights[kWIdx];
                        }
                    }
                }
            }
        }
    }
    gradInput[tid] += sum;
}
`

View Source

const ShaderEmbedding = `` /* 806-byte string literal not displayed */

View Source

const ShaderEmbeddingBackward = `` /* 1300-byte string literal not displayed */

View Source

const ShaderKVUpdate = `` /* 953-byte string literal not displayed */

View Source

const ShaderLSTMStep = `` /* 2344-byte string literal not displayed */

View Source

const ShaderMHABackward = `` /* 3201-byte string literal not displayed */

View Source

const ShaderMSEGradPartialLoss = `` /* 1177-byte string literal not displayed */

ShaderMSEGradPartialLoss computes MSE gradients and partial loss sums entirely on GPU. Each workgroup of 256 threads reduces its elements, writing one partial sum to partials[wg_id.x]. CPU sums the partials array (ceil(N/256) floats) for the total loss — no full-output readback needed.

View Source

const ShaderRMSNorm = `` /* 1228-byte string literal not displayed */

View Source

const ShaderRMSNormBackward = `` /* 1992-byte string literal not displayed */

View Source

const ShaderRNNStep = `` /* 1270-byte string literal not displayed */

View Source

const ShaderResidualAdd = `` /* 425-byte string literal not displayed */

View Source

const ShaderResidualBackward = `` /* 537-byte string literal not displayed */

View Source

const ShaderRoPE = `` /* 1067-byte string literal not displayed */

View Source

const ShaderSwiGLUBackward = `` /* 955-byte string literal not displayed */

Variables ¶

View Source

var (
	// ChatML is used by Qwen, SmolLM2, etc.
	ChatML = Template{
		Name: "chatml",
		RolePrefixes: map[string]string{
			"system":    "<|im_start|>system\n",
			"user":      "<|im_start|>user\n",
			"assistant": "<|im_start|>assistant\n",
		},
		RoleSuffixes: map[string]string{
			"system":    "<|im_end|>\n",
			"user":      "<|im_end|>\n",
			"assistant": "<|im_end|>\n",
		},
	}

	// Llama3 markers
	Llama3 = Template{
		Name: "llama3",
		RolePrefixes: map[string]string{
			"system":    "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n",
			"user":      "<|start_header_id|>user<|end_header_id|>\n\n",
			"assistant": "<|start_header_id|>assistant<|end_header_id|>\n\n",
		},
		RoleSuffixes: map[string]string{
			"system":    "<|eot_id|>",
			"user":      "<|eot_id|>",
			"assistant": "<|eot_id|>",
		},
	}
)

Preset templates

View Source

var BrainTypeNames = []string{
	"Dense", "MHA", "SwiGLU", "RMSNorm", "RNN", "LSTM", "LayerNorm",
	"Embedding", "KMeans", "Softmax", "Parallel", "Sequential",
}

View Source

var UserHints = make(map[int]LayerType)

UserHints allows manual mapping for ambiguous tensor indices.

Functions ¶

func Activate ¶

func Activate[T Numeric](v T, act ActivationType) T

Activate applies the activation function to a value.

func ActivateDerivative ¶

func ActivateDerivative[T Numeric](v T, act ActivationType) T

ActivateDerivative returns the derivative of the activation function.

func AlignedFloat32 ¶

func AlignedFloat32(n int) []float32

AlignedFloat32 allocates a slice of float32 aligned to 64-byte boundaries.

func ApplyRecursiveGradients ¶

func ApplyRecursiveGradients(layer *VolumetricLayer, gradWeights *Tensor[float32], lr float32)

ApplyRecursiveGradients traverses the layer hierarchy and updates weights in all nested WeightStores.

func ApplyTargetPropGaps ¶

func ApplyTargetPropGaps[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], lr float32)

ApplyTargetPropGaps assigns weight updates based on configuration.

func BindGroupKeyHash ¶

func BindGroupKeyHash(pipeline *wgpu.ComputePipeline, buffers ...*wgpu.Buffer) uint64

BindGroupKeyHash generates a stable hash for a set of buffers and a pipeline.

func CalculateLoss ¶

func CalculateLoss[T Numeric](output, target *Tensor[T], lossType string) float64

CalculateLoss computes the loss between output and target.

func CalculateOptimalGPUTileSizeFromLimits ¶

func CalculateOptimalGPUTileSizeFromLimits(sharedMemBytes, maxInvocations uint32, headDim int) int

CalculateOptimalGPUTileSizeFromLimits derives the best GPU tiling size from raw WebGPU Limits.

sharedMemBytes = adapter.GetLimits().Limits.MaxComputeWorkgroupStorageSize
maxInvocations = adapter.GetLimits().Limits.MaxComputeInvocationsPerWorkgroup
headDim        = model head dimension (e.g. 64, 128)

Logic: each tile row costs headDim*2*4 bytes (K+V, float32). We use at most half of shared mem so the driver has spill room. Result is clamped to [8, 64] and aligned to 8 to match the WGSL shader workgroup size.

func CalculateOptimalTileSize ¶

func CalculateOptimalTileSize(headDim int) int

CalculateOptimalTileSize returns a tile size that fits the working set in L1/L2. For MHA: Working set = TileSize * headDim * 2 * 4 (K and V tiles in float32)

func CastWeights ¶

func CastWeights[T Numeric](weights any) []T

CastWeights is a universal utility to extract and cast weight slices from the polymorphic WeightStore. It is the "Universal Converter" that allows any layer type (Dense, CNN, MHA) to access weights in their required numeric type on-the-fly.

func ComputeSilhouetteScore ¶

func ComputeSilhouetteScore[T Numeric](data []*Tensor[T], assignments []int) float32

ComputeSilhouetteScore calculates the mean Silhouette Coefficient of all samples.

func ConvertSlice ¶

func ConvertSlice[In Numeric, Out Numeric](in []In) []Out

convertSlice is a private helper for the CastWeights generic engine.

func CosineDistance ¶

func CosineDistance[T Numeric](a []T, b []float32) float32

CosineDistance computes the semantic distance (1 - cosine similarity) between vectors.

func CosineSimilarity ¶

func CosineSimilarity(s1, s2 LayerSignature) float32

CosineSimilarity acts as the "slider" (-1.0 to 1.0) for comparing two layer signatures.

func DequantizeQ4_0 ¶

func DequantizeQ4_0(blocks []Q4_0Block, n int) []float32

DequantizeQ4_0 converts Q4_0 blocks back to f32.

func EuclideanDistance ¶

func EuclideanDistance[T Numeric](a []T, b []float32) float32

EuclideanDistance computes the distance between a Numeric slice and a float32 centroid.

func EuclideanDistanceT ¶

func EuclideanDistanceT[T Numeric](a, b []T) float32

EuclideanDistanceT computes distance between two Numeric slices.

func GetDeviceDescription ¶

func GetDeviceDescription(net *VolumetricNetwork) string

GetDeviceDescription returns a human-readable string of the running OS, CPU, RAM, and GPU.

func GetLogits ¶

func GetLogits[T Numeric](data []T, temp float64, dtype DType) []float32

GetLogits extracts float32 logits from any tensor type with temperature scaling

func GroupRelatedTensors ¶

func GroupRelatedTensors(detected []DetectedTensor) map[string][]DetectedTensor

GroupRelatedTensors identifies groups of tensors that belong to the same complex layer.

func HierarchicalGroup ¶

func HierarchicalGroup[T Numeric](data []*Tensor[T], threshold float32) []int

HierarchicalGroup performs a simple agglomerative grouping until a distance threshold is met.

func KMeansCluster ¶

func KMeansCluster[T Numeric](data []*Tensor[T], k int, maxIter int, parallel bool) (centroids [][]float32, assignments []int)

KMeansCluster performs K-means clustering on a set of tensors.

func LoadSafetensors ¶

func LoadSafetensors(filepath string) (map[string][]float32, error)

LoadSafetensors reads a safetensors file and returns tensors by name

func LoadSafetensorsFromBytes ¶

func LoadSafetensorsFromBytes(data []byte) (map[string][]float32, error)

LoadSafetensorsFromBytes reads safetensors data from a byte slice and returns tensors by name

func LoadSafetensorsWithShapes ¶

func LoadSafetensorsWithShapes(data []byte) (map[string]TensorWithShape, error)

LoadSafetensorsWithShapes loads safetensors and returns both values and shapes

func LoadUniversalDetailed ¶

func LoadUniversalDetailed(path string) (int, []LayerArchetype, []int, []TensorMeta, error)

LoadUniversalDetailed performs a deep analysis of a safetensors file.

func LoadWithPrefixes ¶

func LoadWithPrefixes(net *VolumetricNetwork, tensors map[string][]float32) error

LoadWithPrefixes loads weights into a VolumetricNetwork by interpreting layer indices and prefixes

func MajorityVote ¶

func MajorityVote(outputs [][]int) []int

MajorityVote performs hard-voting across multiple model outputs (class indices).

func MorphLayer ¶

func MorphLayer(layer *VolumetricLayer, target DType) error

MorphLayer performs an on-the-fly conversion of a layer's weights to a new DType.

func MultiNetworkEvaluation ¶

func MultiNetworkEvaluation[T Numeric](models map[string]*VolumetricNetwork, inputs []*Tensor[T], expected []float64) (map[string]*DeviationMetrics, error)

MultiNetworkEvaluation benchmarks multiple models on the same data.

func Normalize ¶

func Normalize(v []float32) []float32

Normalize computes the unit vector of the input weight slice.

func PerformanceSimilarity ¶

func PerformanceSimilarity(mA, mB ModelPerformance) float64

PerformanceSimilarity calculates cosine similarity between two model masks.

func PrintEnsembleReport ¶

func PrintEnsembleReport(matches []EnsembleMatch, topN int)

PrintEnsembleReport generates a human-readable summary of the best matches.

func PrintMultiNetworkSummary ¶

func PrintMultiNetworkSummary(results map[string]*DeviationMetrics)

func SampleTopK ¶

func SampleTopK(logits []float32, topK int, temperature float32, deterministic bool) int

SampleTopK performs top-K sampling with temperature and optional determinism

func SerializeNetwork ¶

func SerializeNetwork(net *VolumetricNetwork) ([]byte, error)

SerializeNetwork converts a VolumetricNetwork into a JSON byte slice.

func ShaderDenseBackwardDW ¶ added in v0.73.0

func ShaderDenseBackwardDW(tileSize int) string

ShaderDenseBackwardDW calculates gradWeights = gradOutput^T * input dw = dy^T * x => dw[o, i] = sum_b dy[b, o] * x[b, i]

func ShaderDenseBackwardDX ¶ added in v0.73.0

func ShaderDenseBackwardDX(tileSize int) string

ShaderDenseBackwardDX calculates gradInput = gradOutput * weights dx = dy * W^T => dx[b, i] = sum_o dy[b, o] * W[o, i]

func ShaderTiledDenseN ¶

func ShaderTiledDenseN(tileSize int) string

func ShaderTiledDenseQ4 ¶

func ShaderTiledDenseQ4(tileSize int) string

ShaderTiledDenseN generates a tiled dense (matmul) shader for the given tile size. The tile size is baked into the WGSL workgroup array and @workgroup_size. ShaderTiledDenseQ4 generates a tiled dense shader that dequantizes 4-bit weights on the fly. Block size is 32: 1 f32 scale + 16 bytes (32 nibbles).

func ShaderTiledMHAN ¶

func ShaderTiledMHAN(tileSize, headDim int) string

ShaderTiledMHAN generates a tiled MHA shader for the given tile size and headDim. Both are baked in as WGSL compile-time constants.

func ShaderTiledSwiGLUN ¶

func ShaderTiledSwiGLUN(tileSize int) string

ShaderTiledSwiGLUN generates a tiled SwiGLU shader for the given tile size.

func ShaderTiledSwiGLUQ4 ¶

func ShaderTiledSwiGLUQ4(tileSize int) string

ShaderTiledSwiGLUQ4 generates a tiled SwiGLU shader with Q4_0 weights.

func SimulatePrecision ¶

func SimulatePrecision(wVal float32, dtype DType, scale float32) float32

SimulatePrecision handles the numerical simulation of low-bit and non-standard types. It is the universal "Metamorphosis" engine used across Dense, CNN, and RNN layers.

func Softmax ¶

func Softmax(logits []float32) []float32

Softmax is a helper for Softmax math

func SoftmaxBackward ¶

func SoftmaxBackward(gradOutput, softmaxOutput []float32) []float32

SoftmaxBackward is a helper for Softmax Jacobian

func SoftmaxEntmaxHelper ¶

func SoftmaxEntmaxHelper(logits []float32, alpha float32) []float32

SoftmaxEntmaxHelper implements entmax-1.5 approximation

func SoftmaxSparseHelper ¶

func SoftmaxSparseHelper(logits []float32) []float32

SoftmaxSparseHelper implements sparsemax

func SystolicApplyTargetProp ¶

func SystolicApplyTargetProp[T Numeric](n *VolumetricNetwork, s *SystolicState[T], globalTarget *Tensor[T], lr float32)

SystolicApplyTargetProp bridges the Systolic state with the Target Propagation machinery. It uses the core 'Gap-Bridging' logic to update weights across the volumetric mesh.

func SystolicForward ¶

func SystolicForward[T Numeric](n *VolumetricNetwork, s *SystolicState[T], captureHistory bool) time.Duration

SystolicForward executes one "Clock Cycle" across the entire 3D grid. Every layer processes its current input buffer and writes to the next buffer.

func TargetPropBackward ¶

func TargetPropBackward[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], target *Tensor[T])

TargetPropBackward generates targets or gradients from the output back to the input.

func TargetPropBackwardChainRule ¶

func TargetPropBackwardChainRule[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], target *Tensor[T])

TargetPropBackwardChainRule uses standard gradients to shift targets.

func TargetPropBackwardTargetProp ¶

func TargetPropBackwardTargetProp[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], target *Tensor[T])

TargetPropBackwardTargetProp uses true Target Propagation (without derivatives).

Types ¶

type ActivationType ¶

type ActivationType int

ActivationType defines the activation function

const (
	ActivationReLU    ActivationType = 0
	ActivationSilu    ActivationType = 1
	ActivationGELU    ActivationType = 2
	ActivationTanh    ActivationType = 3
	ActivationSigmoid ActivationType = 4
	ActivationLinear  ActivationType = -1
)

func ParseActivationType ¶

func ParseActivationType(s string) ActivationType

ParseActivationType converts a string to an ActivationType.

func (ActivationType) String ¶

func (a ActivationType) String() string

type AdaptationResult ¶

type AdaptationResult struct {
	ModelName    string        `json:"model_name"`
	ModeName     string        `json:"mode_name"`
	TotalOutputs int           `json:"total_outputs"`
	AvgAccuracy  float64       `json:"avg_accuracy"`
	Windows      []TimeWindow  `json:"windows"`
	TaskChanges  []TaskChange  `json:"task_changes"`
	Duration     time.Duration `json:"duration"`
}

type AdaptationTracker ¶

type AdaptationTracker struct {
	// contains filtered or unexported fields
}

func NewAdaptationTracker ¶

func NewAdaptationTracker(winDur, totalDur time.Duration) *AdaptationTracker

func (*AdaptationTracker) Finalize ¶

func (at *AdaptationTracker) Finalize() *AdaptationResult

func (*AdaptationTracker) RecordOutput ¶

func (at *AdaptationTracker) RecordOutput(correct bool)

func (*AdaptationTracker) Start ¶

func (at *AdaptationTracker) Start(initialTask string, initialTaskID int)

type AggregatingObserver ¶

type AggregatingObserver struct {
	WindowSize int
	History    []LayerStats
	Events     []PolyLayerEvent
	// contains filtered or unexported fields
}

AggregatingObserver collects statistics over time windows.

func NewAggregatingObserver ¶

func NewAggregatingObserver(windowSize int) *AggregatingObserver

func (*AggregatingObserver) OnBackward ¶

func (o *AggregatingObserver) OnBackward(e PolyLayerEvent)

func (*AggregatingObserver) OnForward ¶

func (o *AggregatingObserver) OnForward(e PolyLayerEvent)

type ArchConfig ¶

type ArchConfig struct {
	ID            int            `json:"id"`
	Name          string         `json:"name"`
	GridDepth     int            `json:"gridDepth"`
	GridRows      int            `json:"gridRows"`
	GridCols      int            `json:"gridCols"`
	LayersPerCell int            `json:"layersPerCell"`
	DModel        int            `json:"dModel"`
	NumHeads      int            `json:"numHeads"`
	Activation    ActivationType `json:"activation"`
	DType         DType          `json:"dtype"`
	InitScale     float32        `json:"initScale"`
}

type BindGroupKey ¶

type BindGroupKey struct {
	Pipeline *wgpu.ComputePipeline
	Buffers  []*wgpu.Buffer
}

BindGroupKey is used for the BindGroupCache

type BrainType ¶

type BrainType int

const (
	BrainDense BrainType = iota
	BrainMHA
	BrainSwiGLU
	BrainRMSNorm
	BrainRNN
	BrainLSTM
	BrainLayerNorm
	BrainEmbedding
	BrainKMeans
	BrainSoftmax
	BrainParallel
	BrainSequential
)

func (BrainType) String ¶

func (bt BrainType) String() string

type ComparisonResult ¶

type ComparisonResult struct {
	Name      string                     `json:"name"`
	NumLayers int                        `json:"num_layers"`
	Methods   map[string]TrainingMetrics `json:"methods"`
}

ComparisonResult holds results from comparing multiple training methods.

func NewComparisonResult ¶

func NewComparisonResult(name string, numLayers int) *ComparisonResult

NewComparisonResult initializes aComparisonResult.

func (*ComparisonResult) DetermineBest ¶

func (cr *ComparisonResult) DetermineBest() string

DetermineBest returns the name of the best performing training method.

type ConsoleObserver ¶

type ConsoleObserver struct{}

ConsoleObserver prints events to stdout.

func (*ConsoleObserver) OnBackward ¶

func (o *ConsoleObserver) OnBackward(e PolyLayerEvent)

func (*ConsoleObserver) OnForward ¶

func (o *ConsoleObserver) OnForward(e PolyLayerEvent)

type DType ¶

type DType int

DType defines the numerical type stored in a Tensor or WeightStore

const (
	DTypeFloat64  DType = 0  // 64-bit double
	DTypeFloat32  DType = 1  // Standard 32-bit float
	DTypeFloat16  DType = 2  // 16-bit float
	DTypeBFloat16 DType = 3  // 16-bit Brain Float
	DTypeFP8E4M3  DType = 4  // 8-bit FP8 (E4M3)
	DTypeFP8E5M2  DType = 5  // 8-bit FP8 (E5M2)
	DTypeInt64    DType = 6  // 64-bit integer
	DTypeInt32    DType = 7  // 32-bit integer
	DTypeInt16    DType = 8  // 16-bit integer
	DTypeInt8     DType = 9  // 8-bit integer
	DTypeUint64   DType = 10 // 64-bit unsigned
	DTypeUint32   DType = 11 // 32-bit unsigned
	DTypeUint16   DType = 12 // 16-bit unsigned
	DTypeUint8    DType = 13 // 8-bit unsigned
	DTypeInt4     DType = 14 // 4-bit integer
	DTypeUint4    DType = 15 // 4-bit unsigned
	DTypeFP4      DType = 16 // 4-bit E2M1
	DTypeInt2     DType = 17 // 2-bit integer
	DTypeUint2    DType = 18 // 2-bit unsigned
	DTypeTernary  DType = 19 // 2-bit (Ternary: -1, 0, 1)
	DTypeBinary   DType = 20 // 1-bit (XNOR-Net)
)

func ParseDType ¶

func ParseDType(s string) DType

ParseDType converts a string to a DType.

func (DType) String ¶

func (d DType) String() string

type DetectedTensor ¶

type DetectedTensor struct {
	Name    string
	Shape   []int
	DType   string
	InSize  int
	OutSize int
	CanLoad bool
}

DetectedTensor represents a tensor found in a model file.

type DeviationBucket ¶

type DeviationBucket struct {
	RangeMin float64 `json:"range_min"`
	RangeMax float64 `json:"range_max"`
	Count    int     `json:"count"`
	Samples  []int   `json:"samples"`
}

DeviationBucket represents a specific deviation percentage range.

type DeviationMetrics ¶

type DeviationMetrics struct {
	Buckets          map[string]*DeviationBucket `json:"buckets"`
	Score            float64                     `json:"score"` // 0-100 quality score
	TotalSamples     int                         `json:"total_samples"`
	Failures         int                         `json:"failures"` // 100%+ deviations
	Results          []PredictionResult          `json:"results"`
	AverageDeviation float64                     `json:"avg_deviation"`
	CorrectCount     int                         `json:"correct_count"`
	Accuracy         float64                     `json:"accuracy"`
}

DeviationMetrics stores the model performance breakdown.

func EvaluateNetworkPolymorphic ¶

func EvaluateNetworkPolymorphic[T Numeric](n *VolumetricNetwork, inputs []*Tensor[T], expected []float64) (*DeviationMetrics, error)

EvaluateNetworkPolymorphic evaluates a VolumetricNetwork across multiple inputs.

func NewDeviationMetrics ¶

func NewDeviationMetrics() *DeviationMetrics

NewDeviationMetrics initializes empty metrics.

func (*DeviationMetrics) ComputeFinalMetrics ¶

func (dm *DeviationMetrics) ComputeFinalMetrics()

ComputeFinalMetrics completes the scoring.

func (*DeviationMetrics) PrintSummary ¶

func (dm *DeviationMetrics) PrintSummary()

func (*DeviationMetrics) UpdateMetrics ¶

func (dm *DeviationMetrics) UpdateMetrics(result PredictionResult)

UpdateMetrics adds one prediction to the metrics.

type EnsembleMatch ¶

type EnsembleMatch struct {
	ModelA   string
	ModelB   string
	Coverage float64 // Combined coverage (0.0 - 1.0)
	Overlap  float64 // Percentage of samples both got right
}

EnsembleMatch represents a pair of models that complement each other.

func FindComplementaryMatches ¶

func FindComplementaryMatches(models []ModelPerformance, minCoverage float64) []EnsembleMatch

FindComplementaryMatches identifies pairs of models whose combined coverage is maximized.

type GenOptions ¶

type GenOptions struct {
	MaxTokens         int
	Temperature       float32
	TopK              int
	Deterministic     bool
	UseKVCache        bool
	RepetitionPenalty float32
	RepetitionWindow  int
	EOSTokens         []int
}

GenOptions defines the generation parameters

type HTTPObserver ¶

type HTTPObserver struct {
	URL string
	// contains filtered or unexported fields
}

HTTPObserver sends events to an HTTP endpoint.

func NewHTTPObserver ¶

func NewHTTPObserver(url string) *HTTPObserver

func (*HTTPObserver) OnBackward ¶

func (o *HTTPObserver) OnBackward(e PolyLayerEvent)

func (*HTTPObserver) OnForward ¶

func (o *HTTPObserver) OnForward(e PolyLayerEvent)

type HardwareInfo ¶

type HardwareInfo struct {
	L1DataCacheSize int // in bytes
	L2CacheSize     int // in bytes
	L3CacheSize     int // in bytes
	NumCPU          int
}

HardwareInfo stores metadata about the running system to optimize tiling.

func GetHardwareInfo ¶

func GetHardwareInfo() HardwareInfo

GetHardwareInfo attempts to detect cache sizes and CPU info.

type LayerArchetype ¶

type LayerArchetype struct {
	Type        LayerType
	TypeName    string
	Indices     map[string]int
	GeomMetrics map[string]int
}

LayerArchetype represents a detected structural unit in the model.

func ProbeDeepGeometry ¶

func ProbeDeepGeometry(geoms []TensorMeta) ([]LayerArchetype, []int)

ProbeDeepGeometry identifies layer patterns within a set of tensors.

type LayerSignature ¶

type LayerSignature struct {
	Z, Y, X, L int
	Type       LayerType
	DType      DType
	Weights    []float32 // Normalized, precision-simulated weights
}

LayerSignature represents the unique 3D topological "DNA" of a layer.

type LayerSpec ¶

type LayerSpec struct {
	// Position
	Z int `json:"z"`
	Y int `json:"y"`
	X int `json:"x"`
	L int `json:"l"`

	// Core Type
	Type       string `json:"type"`
	Activation string `json:"activation"`
	DType      string `json:"dtype"`

	// Dimensions & Config
	InputHeight   int `json:"input_height"`
	InputWidth    int `json:"input_width"`
	InputDepth    int `json:"input_depth"`
	OutputHeight  int `json:"output_height"`
	OutputWidth   int `json:"output_width"`
	OutputDepth   int `json:"output_depth"`
	InputChannels int `json:"input_channels"`
	Filters       int `json:"filters"`
	KernelSize    int `json:"kernel_size"`
	Stride        int `json:"stride"`
	Padding       int `json:"padding"`

	NumHeads   int `json:"num_heads"`
	NumKVHeads int `json:"num_kv_heads"`
	DModel     int `json:"d_model"`
	SeqLength  int `json:"seq_length"`

	VocabSize    int `json:"vocab_size"`
	EmbeddingDim int `json:"embedding_dim"`

	NumClusters int    `json:"num_clusters"`
	OutputMode  string `json:"output_mode"`

	// Recursive structures
	ParallelBranches []LayerSpec `json:"parallel_branches,omitempty"`
	CombineMode      string      `json:"combine_mode,omitempty"`
	SequentialLayers []LayerSpec `json:"sequential_layers,omitempty"`

	UseTiling bool `json:"use_tiling,omitempty"`
	TileSize  int  `json:"tile_size,omitempty"`
}

LayerSpec represents the JSON structure for a single layer.

type LayerStats ¶

type LayerStats struct {
	Avg    float32 `json:"avg"`
	Max    float32 `json:"max"`
	Min    float32 `json:"min"`
	Active int     `json:"active"`
	Total  int     `json:"total"`
}

LayerStats provides summary statistics for a tensor's activations or gradients.

func ComputeLayerStats ¶

func ComputeLayerStats[T Numeric](t *Tensor[T]) LayerStats

ComputeLayerStats calculates summary statistics for a tensor.

type LayerTelemetry ¶

type LayerTelemetry struct {
	// Grid position
	Z int `json:"z"`
	Y int `json:"y"`
	X int `json:"x"`
	L int `json:"l"`

	// Layer info
	Type       string `json:"type"`
	Activation string `json:"activation,omitempty"`
	Parameters int    `json:"parameters"`

	// Dimensions
	InputShape  []int `json:"input_shape,omitempty"`
	OutputShape []int `json:"output_shape,omitempty"`

	// For nested/parallel layers
	Branches    []LayerTelemetry `json:"branches,omitempty"`
	CombineMode string           `json:"combine_mode,omitempty"`
}

LayerTelemetry contains metadata about a specific layer

func ExtractLayerTelemetry ¶

func ExtractLayerTelemetry(l VolumetricLayer) LayerTelemetry

ExtractLayerTelemetry converts a VolumetricLayer to its telemetry representation.

type LayerType ¶

type LayerType int

LayerType defines the type of neural network layer

const (
	LayerDense              LayerType = 0
	LayerMultiHeadAttention LayerType = 1
	LayerSwiGLU             LayerType = 2
	LayerRMSNorm            LayerType = 3
	LayerCNN1               LayerType = 4
	LayerCNN2               LayerType = 5
	LayerCNN3               LayerType = 6
	LayerRNN                LayerType = 7
	LayerLSTM               LayerType = 8
	LayerLayerNorm          LayerType = 9
	LayerConvTransposed1D   LayerType = 10
	LayerConvTransposed2D   LayerType = 11
	LayerConvTransposed3D   LayerType = 12
	LayerEmbedding          LayerType = 13
	LayerKMeans             LayerType = 14
	LayerSoftmax            LayerType = 15
	LayerParallel           LayerType = 16
	LayerSequential         LayerType = 17
	LayerResidual           LayerType = 18
)

func ParseLayerType ¶

func ParseLayerType(s string) LayerType

ParseLayerType converts a string to a LayerType.

func (LayerType) String ¶

func (t LayerType) String() string

type LogicShift ¶

type LogicShift struct {
	SourcePos string // "z,y,x,l"
	TargetPos string
	Overlap   float32
}

LogicShift identifies if a specific architectural pattern has moved in space.

type MergePair ¶

type MergePair struct {
	First  string
	Second string
	Rank   int
}

MergePair represents a BPE merge rule

type MethodInfo ¶

type MethodInfo struct {
	MethodName string          `json:"method_name"`
	Parameters []ParameterInfo `json:"parameters"`
	Returns    []string        `json:"returns"`
}

MethodInfo represents metadata about a method.

type ModelPerformance ¶

type ModelPerformance struct {
	ModelID string
	// Mask[i] is true if the model correctly handled sample i.
	Mask []bool
}

ModelPerformance holds the correctness mask for a specific model.

type ModelTelemetry ¶

type ModelTelemetry struct {
	ID          string           `json:"id"`
	TotalLayers int              `json:"total_layers"`
	TotalParams int              `json:"total_parameters"`
	Layers      []LayerTelemetry `json:"layers"`
}

ModelTelemetry represents a single network's structure

func ExtractNetworkBlueprint ¶

func ExtractNetworkBlueprint(n *VolumetricNetwork, modelID string) ModelTelemetry

ExtractNetworkBlueprint extracts structural telemetry from a VolumetricNetwork.

type NEATConfig ¶ added in v0.74.0

type NEATConfig struct {
	// Probabilities (0.0–1.0)
	WeightPerturbRate  float64 // Perturb each layer's weights with noise
	WeightPerturbScale float32 // Noise magnitude (default 0.05)
	NodeMutateRate     float64 // Swap a layer's type (and reinitialize its weights)
	ConnectionAddRate  float64 // Add a remote link (spatial hop) between two layers
	ConnectionDropRate float64 // Remove an existing remote link
	ActivationMutRate  float64 // Swap a layer's activation function
	LayerToggleRate    float64 // Enable/disable a dormant layer cell

	// AllowedLayerTypes for node mutation (nil = use defaults)
	AllowedLayerTypes []LayerType
	// DModel used when reinitializing a mutated layer's weights
	DModel int

	// Defaults for layer types that need extra config when reinitializing
	DefaultNumHeads    int // MHA: number of attention heads (default 4)
	DefaultInChannels  int // CNN/ConvTransposed: input channels (default 1)
	DefaultFilters     int // CNN/ConvTransposed: output filters (default 8)
	DefaultKernelSize  int // CNN/ConvTransposed: kernel size (default 3)
	DefaultVocabSize   int // Embedding: vocabulary size (default 256)
	DefaultNumClusters int // KMeans: number of clusters (default 8)

	Seed int64
}

NEATConfig controls which mutations are enabled and their probabilities.

func DefaultNEATConfig ¶ added in v0.74.0

func DefaultNEATConfig(dModel int) NEATConfig

DefaultNEATConfig returns conservative mutation rates supporting all 19 layer types.

type NEATPopulation ¶ added in v0.74.0

type NEATPopulation struct {
	Networks  []*VolumetricNetwork
	Fitnesses []float64
	Config    NEATConfig
	// contains filtered or unexported fields
}

NEATPopulation manages a pool of networks evolving over generations.

func NewNEATPopulation ¶ added in v0.74.0

func NewNEATPopulation(seed *VolumetricNetwork, size int, cfg NEATConfig) *NEATPopulation

NewNEATPopulation creates an initial population by mutating a seed network. Each member starts as a NEATMutate of the seed, giving diversity from day 0.

func (*NEATPopulation) Best ¶ added in v0.74.0

func (p *NEATPopulation) Best() *VolumetricNetwork

Best returns the highest-fitness network from the last Evolve call.

func (*NEATPopulation) BestFitness ¶ added in v0.74.0

func (p *NEATPopulation) BestFitness() float64

BestFitness returns the fitness score of the top network.

func (*NEATPopulation) Evolve ¶ added in v0.74.0

func (p *NEATPopulation) Evolve(fitnessFn func(*VolumetricNetwork) float64)

Evolve runs one generation:

Evaluate all networks with fitnessFn (higher = better)
Sort by fitness descending
Top 25% survive as elites
Remaining slots filled with SpliceDNA(elite pair) + NEATMutate offspring

fitnessFn should return a positive float64 (e.g., accuracy, reward, 1/loss).

func (*NEATPopulation) Summary ¶ added in v0.74.0

func (p *NEATPopulation) Summary(generation int) string

Summary prints a one-line diagnostic for the population.

type NetworkBlueprint ¶

type NetworkBlueprint struct {
	Models []ModelTelemetry `json:"models"`
}

NetworkBlueprint contains the structural information of a network extracted after loading or building.

type NetworkComparisonResult ¶

type NetworkComparisonResult struct {
	OverallOverlap float32
	LayerOverlaps  map[string]float32 // "z,y,x,l" -> score
	LogicShifts    []LogicShift
}

NetworkComparisonResult holds the hierarchical similarity metrics.

func CompareNetworks ¶

func CompareNetworks(dna1, dna2 NetworkDNA) NetworkComparisonResult

CompareNetworks performs the hierarchical spatial correlation between two blueprints.

type NetworkDNA ¶

type NetworkDNA []LayerSignature

NetworkDNA is the complete genetic blueprint of a VolumetricNetwork.

func ExtractDNA ¶

func ExtractDNA(n *VolumetricNetwork) NetworkDNA

ExtractDNA generates the topological signatures for all layers in a network. It uses SimulatePrecision to ensure that comparison reflects the actual numerical behavior.

All 19 layer types are handled:

Weighted layers (Dense, RNN, LSTM, MHA, CNN*, ConvTransposed*, SwiGLU, RMSNorm, LayerNorm, Embedding, KMeans): signature derived from WeightStore.Master.
Structural containers (Parallel, Sequential): weights are collected by recursing into ParallelBranches / SequentialLayers, then concatenated and normalized into a single flat signature vector.
Weightless layers (Softmax, Residual): neutral signature []float32{1.0}.

type NetworkSpec ¶

type NetworkSpec struct {
	ID            string      `json:"id"`
	Depth         int         `json:"depth"`
	Rows          int         `json:"rows"`
	Cols          int         `json:"cols"`
	LayersPerCell int         `json:"layers_per_cell"`
	Layers        []LayerSpec `json:"layers"`
}

NetworkSpec represents the top-level JSON structure for a network.

type Numeric ¶

type Numeric interface {
	~int | ~int8 | ~int16 | ~int32 | ~int64 |
		~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 |
		~float32 | ~float64
}

Numeric is a type constraint for all numeric types that Tensors can hold.

type PairWithIndex ¶

type PairWithIndex struct {
	First  string
	Second string
	Index  int
}

type ParameterInfo ¶

type ParameterInfo struct {
	Name string `json:"name"`
	Type string `json:"type"`
}

ParameterInfo represents metadata about a parameter.

type PersistenceLayerSpec ¶

type PersistenceLayerSpec struct {
	Z int `json:"z"`
	Y int `json:"y"`
	X int `json:"x"`
	L int `json:"l"`

	Type       string `json:"type"`
	Activation string `json:"activation"`
	DType      string `json:"dtype"`

	InputHeight   int `json:"input_height,omitempty"`
	InputWidth    int `json:"input_width,omitempty"`
	InputDepth    int `json:"input_depth,omitempty"`
	OutputHeight  int `json:"output_height,omitempty"`
	OutputWidth   int `json:"output_width,omitempty"`
	OutputDepth   int `json:"output_depth,omitempty"`
	InputChannels int `json:"input_channels,omitempty"`
	Filters       int `json:"filters,omitempty"`
	KernelSize    int `json:"kernel_size,omitempty"`
	Stride        int `json:"stride,omitempty"`
	Padding       int `json:"padding,omitempty"`
	OutputPadding int `json:"output_padding,omitempty"`

	NumHeads     int     `json:"num_heads,omitempty"`
	NumKVHeads   int     `json:"num_kv_heads,omitempty"`
	HeadDim      int     `json:"head_dim,omitempty"`
	DModel       int     `json:"d_model,omitempty"`
	SeqLength    int     `json:"seq_length,omitempty"`
	RoPEFreqBase float64 `json:"rope_freq_base,omitempty"`

	VocabSize    int `json:"vocab_size,omitempty"`
	EmbeddingDim int `json:"embedding_dim,omitempty"`

	NumClusters       int     `json:"num_clusters,omitempty"`
	KMeansTemperature float64 `json:"kmeans_temperature,omitempty"`
	OutputMode        string  `json:"output_mode,omitempty"`

	SoftmaxType string  `json:"softmax_type,omitempty"`
	Temperature float64 `json:"temperature,omitempty"`
	SoftmaxRows int     `json:"softmax_rows,omitempty"`
	SoftmaxCols int     `json:"softmax_cols,omitempty"`
	EntmaxAlpha float64 `json:"entmax_alpha,omitempty"`
	GumbelNoise bool    `json:"gumbel_noise,omitempty"`

	// Weights
	Weights string  `json:"weights,omitempty"` // Base64 encoded weights
	Native  bool    `json:"native,omitempty"`  // True if weights are in target DType, False if Master FP32
	Scale   float32 `json:"scale,omitempty"`

	// Recursion
	ParallelBranches []PersistenceLayerSpec `json:"parallel_branches,omitempty"`
	CombineMode      string                 `json:"combine_mode,omitempty"`
	SequentialLayers []PersistenceLayerSpec `json:"sequential_layers,omitempty"`

	UseTiling bool `json:"use_tiling,omitempty"`
	TileSize  int  `json:"tile_size,omitempty"`
}

PersistenceLayerSpec represents the serializable state of a VolumetricLayer.

type PersistenceNetworkSpec ¶

type PersistenceNetworkSpec struct {
	ID            string                 `json:"id"`
	Depth         int                    `json:"depth"`
	Rows          int                    `json:"rows"`
	Cols          int                    `json:"cols"`
	LayersPerCell int                    `json:"layers_per_cell"`
	Layers        []PersistenceLayerSpec `json:"layers"`
}

PersistenceNetworkSpec represents the serializable state of a VolumetricNetwork.

type PolyGradientObserver ¶

type PolyGradientObserver interface {
	OnGradient(event PolyLayerEvent)
}

PolyGradientObserver tracks gradient flow through layers.

type PolyLayerEvent ¶

type PolyLayerEvent struct {
	Mode      string     `json:"mode"`
	Type      string     `json:"type"` // "forward" or "backward"
	Z         int        `json:"z"`
	Y         int        `json:"y"`
	X         int        `json:"x"`
	L         int        `json:"l"`
	LayerType LayerType  `json:"layer_type"`
	Stats     LayerStats `json:"stats"`
	StepCount uint64     `json:"step_count"`
	ModelID   string     `json:"model_id"`
}

PolyLayerEvent captures state during a forward or backward pass.

type PolyObserver ¶

type PolyObserver interface {
	OnForward(event PolyLayerEvent)
	OnBackward(event PolyLayerEvent)
}

PolyObserver defines the interface for tracking neural activity in polymorphic layers.

type PreTokenizer ¶

type PreTokenizer struct {
	Pattern *regexp.Regexp
}

PreTokenizer handles text splitting before BPE

func (*PreTokenizer) SplitWithSpecialTokens ¶

func (pt *PreTokenizer) SplitWithSpecialTokens(text string, specialTokens map[string]int) []string

type PredictionResult ¶

type PredictionResult struct {
	SampleIndex    int     `json:"sample_index"`
	ExpectedOutput float64 `json:"expected"`
	ActualOutput   float64 `json:"actual"`
	Deviation      float64 `json:"deviation"` // % error
	Bucket         string  `json:"bucket"`
}

PredictionResult represents model performance on a single prediction.

func EvaluatePrediction ¶

func EvaluatePrediction(sampleIndex int, expected, actual float64) PredictionResult

EvaluatePrediction categorizes expected vs actual results.

type PrefixWeightMapper ¶

type PrefixWeightMapper struct {
	Patterns map[string][]string
}

PrefixWeightMapper handles mapping tensors with potentially complex prefixes

func NewPrefixWeightMapper ¶

func NewPrefixWeightMapper() *PrefixWeightMapper

NewPrefixWeightMapper creates a default mapper for common LLM architectures

func (*PrefixWeightMapper) Find ¶

func (m *PrefixWeightMapper) Find(tensors map[string][]float32, role string) []float32

Find searches for a tensor based on the patterns registered for a role

func (*PrefixWeightMapper) MapWeights ¶

func (m *PrefixWeightMapper) MapWeights(tensors map[string][]float32) (embeddings, lmHead, finalNorm []float32, hasFinalNorm bool)

MapWeights finds weights for specific roles in the provided tensor map, handling generic prefixes

type Q4_0Block ¶

type Q4_0Block struct {
	Scale   float32
	Weights [16]byte // 32 nibbles
}

Q4_0Block represents a block of 32 quantized 4-bit weights. Total size: 4 (f32 scale) + 16 (32 nibbles) = 20 bytes. Bandwidth: 0.625 bytes per weight.

func QuantizeQ4_0 ¶

func QuantizeQ4_0(weights []float32) []Q4_0Block

QuantizeQ4_0 converts a slice of f32 weights into Q4_0 blocks.

type SafetensorsHeader ¶

type SafetensorsHeader struct {
	Tensors map[string]TensorInfo `json:"-"`
}

SafetensorsHeader contains metadata about tensors in the file

type SoftmaxType ¶

type SoftmaxType int

SoftmaxType defines the variant of softmax to use

const (
	SoftmaxStandard     SoftmaxType = 0
	SoftmaxGrid         SoftmaxType = 1
	SoftmaxHierarchical SoftmaxType = 2
	SoftmaxTemperature  SoftmaxType = 3
	SoftmaxGumbel       SoftmaxType = 4
	SoftmaxMasked       SoftmaxType = 5
	SoftmaxSparse       SoftmaxType = 6
	SoftmaxAdaptive     SoftmaxType = 7
	SoftmaxMixture      SoftmaxType = 8
	SoftmaxEntmax       SoftmaxType = 9
)

func ParseSoftmaxType ¶

func ParseSoftmaxType(s string) SoftmaxType

ParseSoftmaxType converts string to SoftmaxType.

func (SoftmaxType) String ¶

func (s SoftmaxType) String() string

type SpliceConfig ¶ added in v0.74.0

type SpliceConfig struct {
	// CrossoverMode: "uniform", "point", or "blend"
	CrossoverMode string
	// BlendAlpha: interpolation factor for "blend" mode (0=all A, 1=all B)
	BlendAlpha float32
	// SplitRatio: fraction of weights taken from parent A in "point" mode
	SplitRatio float64
	// FitnessA/B: optional fitness scores to bias crossover toward fitter parent
	FitnessA float64
	FitnessB float64
}

SpliceConfig controls how two parent networks are combined.

func DefaultSpliceConfig ¶ added in v0.74.0

func DefaultSpliceConfig() SpliceConfig

DefaultSpliceConfig returns a balanced blend configuration.

type SpliceResult ¶ added in v0.74.0

type SpliceResult struct {
	Child        *VolumetricNetwork
	ParentADNA   NetworkDNA
	ParentBDNA   NetworkDNA
	ChildDNA     NetworkDNA
	Similarities map[string]float32 // "z,y,x,l" -> cosine similarity used
	BlendedCount int                // number of layers actually blended
}

SpliceResult holds the outcome of a DNA splice operation.

func SpliceDNAWithReport ¶ added in v0.74.0

func SpliceDNAWithReport(parentA, parentB *VolumetricNetwork, cfg SpliceConfig) SpliceResult

SpliceDNAWithReport performs a splice and returns a full diagnostic report. Use this when you want to inspect per-layer similarity scores or log blend stats.

type Streamer ¶

type Streamer struct {
	Decode func(tokens []uint32) string
	// contains filtered or unexported fields
}

Streamer handles real-time output of generated tokens

func NewStreamer ¶

func NewStreamer(decode func(tokens []uint32) string, promptTokens []uint32) *Streamer

func (*Streamer) HasNewUserTurn ¶

func (s *Streamer) HasNewUserTurn(allTokens []uint32) bool

func (*Streamer) Push ¶

func (s *Streamer) Push(allTokens []uint32)

func (*Streamer) String ¶

func (s *Streamer) String() string

type SystolicState ¶

type SystolicState[T Numeric] struct {
	// LayerData holds the current output of every layer in the grid.
	// Indexing follows VolumetricNetwork.GetIndex(z, y, x, l)
	LayerData []*Tensor[T]

	// BackwardContext stores pre-activations and inputs for backpropagation.
	// These are indexed by [Step][LayerIndex] to allow BPTT across clock cycles.
	HistoryIn  [][]*Tensor[T]
	HistoryPre [][]*Tensor[T]

	// Double buffering for simultaneous updates
	NextBuffer []*Tensor[T]

	// Grid Metadata
	StepCount uint64
	// contains filtered or unexported fields
}

SystolicState holds the temporal snapshot of the 3D grid.

func NewSystolicState ¶

func NewSystolicState[T Numeric](n *VolumetricNetwork) *SystolicState[T]

NewSystolicState initializes a state for a specific Volumetric Network.

func (*SystolicState[T]) SetInput ¶

func (s *SystolicState[T]) SetInput(input *Tensor[T])

SetInput injects data into the starting coordinate (0,0,0,0).

type TargetPropConfig ¶

type TargetPropConfig struct {
	BatchSize        int
	UseChainRule     bool    // If true, targets = Act + Grad * Scale
	GradientScale    float32 // Scaling factor for chaining
	DepthScaleFactor float32 // Gradient boosting for deeper layers
	Momentum         float32
	LearningRate     float32

	// Clamping for stability
	ActivationClamp float32
}

TargetPropConfig holds tunable parameters for Neural Target Propagation.

func DefaultTargetPropConfig ¶

func DefaultTargetPropConfig() *TargetPropConfig

DefaultTargetPropConfig returns standard settings for the TargetProp engine.

type TargetPropState ¶

type TargetPropState[T Numeric] struct {
	ForwardActs     []*Tensor[T]
	PreActs         []*Tensor[T] // Internal pre-activation states for weight-bearing layers
	BackwardTargets []*Tensor[T]

	// Chain Rule storage
	Gradients []*Tensor[float32]

	// Diagnostics
	LinkBudgets []float32
	Gaps        []float32

	Config      *TargetPropConfig
	TotalLayers int
}

TargetPropState tracks the bidirectional signal flow.

func NewTargetPropState ¶

func NewTargetPropState[T Numeric](n *VolumetricNetwork, config *TargetPropConfig) *TargetPropState[T]

NewTargetPropState initializes a state for the given volumetric network.

func (*TargetPropState[T]) CalculateLinkBudgets ¶

func (s *TargetPropState[T]) CalculateLinkBudgets()

CalculateLinkBudgets diagnostic: Measures how much informaton is preserved (Cosine Similarity).

type TaskChange ¶

type TaskChange struct {
	AtTime           time.Duration `json:"at_time"`
	FromTask         string        `json:"from_task"`
	ToTask           string        `json:"to_task"`
	PreChangeWindow  int           `json:"pre_change_window"`
	PostChangeWindow int           `json:"post_change_window"`
	PreAccuracy      float64       `json:"pre_accuracy"`
	PostAccuracy     float64       `json:"post_accuracy"`
	RecoveryTime     time.Duration `json:"recovery_time"`
}

type Template ¶

type Template struct {
	Name         string
	RolePrefixes map[string]string
	RoleSuffixes map[string]string
	GlobalPrefix string
	GlobalSuffix string
}

Template defines the formatting markers for different chat styles

func (Template) BuildNextTurnSegment ¶

func (t Template) BuildNextTurnSegment(userMsg string) string

BuildNextTurnSegment returns only the text that is NEW compared to what the KV cache already holds.

func (Template) BuildPrompt ¶

func (t Template) BuildPrompt(turns []Turn, systemPrompt string, userMsg string) string

BuildPrompt constructs a full prompt string from conversation turns

type Tensor ¶

type Tensor[T Numeric] struct {
	Data   []T
	DType  DType
	Shape  []int
	Nested []*Tensor[T] // For recursive activation caching in Parallel/Sequential layers
}

Tensor wraps numerical data with metadata.

func BackwardPolymorphic ¶

func BackwardPolymorphic[T Numeric](n *VolumetricNetwork, gradOutput *Tensor[T], inputs, preActs []*Tensor[T]) (gradInput *Tensor[T], layerGradients [][2]*Tensor[T], layerTimes []time.Duration)

BackwardPolymorphic executes a full backward pass through the 3D grid. It propagates gradients from the output back to the input, accumulating weight gradients.

func CNN1BackwardPolymorphic ¶

func CNN1BackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

CNN1BackwardPolymorphic calculates gradients for a 1D convolutional layer.

func CNN1BackwardTiled ¶

func CNN1BackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

CNN1BackwardTiled implements a loop-blocked backward pass for CNN1.

func CNN1ForwardPolymorphic ¶

func CNN1ForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

CNN1ForwardPolymorphic performs a forward pass through a 1D convolutional layer.

func CNN1ForwardTiled ¶

func CNN1ForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

CNN1ForwardTiled implements a loop-blocked forward pass for CNN1.

func CNN2BackwardPolymorphic ¶

func CNN2BackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

CNN2BackwardPolymorphic calculates gradients for a 2D convolutional layer.

func CNN2BackwardTiled ¶

func CNN2BackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

CNN2BackwardTiled implements a loop-blocked backward pass for CNN2.

func CNN2ForwardPolymorphic ¶

func CNN2ForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

CNN2ForwardPolymorphic performs a forward pass through a 2D convolutional layer.

func CNN2ForwardTiled ¶

func CNN2ForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

CNN2ForwardTiled implements a loop-blocked forward pass for CNN2.

func CNN3BackwardPolymorphic ¶

func CNN3BackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

CNN3BackwardPolymorphic calculates gradients for a 3D convolutional layer.

func CNN3BackwardTiled ¶

func CNN3BackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

CNN3BackwardTiled implements a loop-blocked backward pass for CNN3.

func CNN3ForwardPolymorphic ¶

func CNN3ForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

CNN3ForwardPolymorphic performs a forward pass through a 3D convolutional layer.

func CNN3ForwardTiled ¶

func CNN3ForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

CNN3ForwardTiled implements a loop-blocked forward pass for CNN3.

func ComputeLossGradient ¶

func ComputeLossGradient[T Numeric](output, target *Tensor[T], lossType string) *Tensor[T]

ComputeLossGradient computes the gradient of the loss with respect to the output.

func ConvTransposed1DBackwardPolymorphic ¶

func ConvTransposed1DBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

func ConvTransposed1DForwardPolymorphic ¶

func ConvTransposed1DForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

ConvTransposed1DForwardPolymorphic performs a forward pass through a 1D transposed convolutional layer.

func ConvTransposed2DBackwardPolymorphic ¶

func ConvTransposed2DBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

func ConvTransposed2DForwardPolymorphic ¶

func ConvTransposed2DForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

ConvTransposed2DForwardPolymorphic performs a forward pass through a 2D transposed convolutional layer.

func ConvTransposed3DBackwardPolymorphic ¶

func ConvTransposed3DBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

func ConvTransposed3DForwardPolymorphic ¶

func ConvTransposed3DForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

ConvTransposed3DForwardPolymorphic performs a forward pass through a 3D transposed convolutional layer.

func ConvertTensor ¶

func ConvertTensor[In Numeric, Out Numeric](in *Tensor[In]) *Tensor[Out]

ConvertTensor converts a tensor from one numeric type to another.

func DenseBackwardPolymorphic ¶

func DenseBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

DenseBackwardPolymorphic calculates gradients for the dense layer.

func DenseForwardPolymorphic ¶

func DenseForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

DenseForwardPolymorphic performs a forward pass through a dense layer. It handles precision transitions (e.g., FP32 input to FP4 layer).

func DenseForwardTiled ¶

func DenseForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

DenseForwardTiled performs a tiled forward pass for the dense layer.

func DispatchLayer ¶

func DispatchLayer[T Numeric](layer *VolumetricLayer, input, skip *Tensor[T]) (preAct, postAct *Tensor[T])

DispatchLayer acts as the universal routing hub for all layer types. This is the "Jump Table" that handles numerical metamorphosis across 50+ layer types.

func DispatchLayerBackward ¶

func DispatchLayerBackward[T Numeric](layer *VolumetricLayer, gradOutput, input, skip, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

DispatchLayerBackward acts as the universal routing hub for gradients. This handles the backward pass metamorphosis for various layer types.

func EmbeddingBackwardPolymorphic ¶

func EmbeddingBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

EmbeddingBackwardPolymorphic computes gradients for embedding lookup.

func EmbeddingBackwardTiled ¶

func EmbeddingBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

EmbeddingBackwardTiled implements a loop-blocked gradient calculation for embeddings.

func EmbeddingForwardPolymorphic ¶

func EmbeddingForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

EmbeddingForwardPolymorphic performs an embedding lookup across any numerical type.

func EmbeddingForwardTiled ¶

func EmbeddingForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

EmbeddingForwardTiled implements a loop-blocked embedding lookup for cache efficiency.

func ForwardPolymorphic ¶

func ForwardPolymorphic[T Numeric](n *VolumetricNetwork, input *Tensor[T]) (*Tensor[T], time.Duration, []time.Duration)

ForwardPolymorphic executes the network using a unified generic dispatcher. It iterates through the 3D grid and handles DType transitions between layers.

func KMeansBackwardPolymorphic ¶

func KMeansBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

KMeansBackwardPolymorphic computes gradients for cluster centers and propagates to input.

func KMeansForwardPolymorphic ¶

func KMeansForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

KMeansForwardPolymorphic performs a differentiable K-Means clustering forward pass.

func LSTMBackwardPolymorphic ¶

func LSTMBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

LSTMBackwardPolymorphic calculates gradients for the LSTM layer using BPTT.

func LSTMBackwardTiled ¶

func LSTMBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

LSTMBackwardTiled implements a tiled (blocked) LSTM backward pass.

func LSTMForwardPolymorphic ¶

func LSTMForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

LSTMForwardPolymorphic performs a forward pass through a polymorphic LSTM layer. preAct stores [iSum, fSum, gSum, oSum, cCurr] (5 * hiddenSize)

func LSTMForwardTiled ¶

func LSTMForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

LSTMForwardTiled implements a tiled (blocked) LSTM forward pass for cache efficiency.

func LayerNormBackwardPolymorphic ¶

func LayerNormBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

LayerNormBackwardPolymorphic calculates gradients for LayerNorm.

func LayerNormForwardPolymorphic ¶

func LayerNormForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

LayerNormForwardPolymorphic performs layer normalization for any numeric type.

func MHABackwardPolymorphic ¶

func MHABackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

MHABackwardPolymorphic handles BPTT-style gradients for MHA.

func MHAForwardPolymorphic ¶

func MHAForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

MHAForwardPolymorphic performs Multi-Head Attention across any numerical type.

func MHAForwardTiled ¶

func MHAForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

MHAForwardTiled performs an optimized, tiled forward pass for MHA.

func NewTensor ¶

func NewTensor[T Numeric](shape ...int) *Tensor[T]

NewTensor creates a new tensor with the given shape.

func NewTensorFromSlice ¶

func NewTensorFromSlice[T Numeric](data []T, shape ...int) *Tensor[T]

NewTensorFromSlice creates a tensor from existing data.

func ParallelBackwardPolymorphic ¶

func ParallelBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

ParallelBackwardPolymorphic distributes gradients back to branches.

func ParallelForwardPolymorphic ¶

func ParallelForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

ParallelForwardPolymorphic executes multiple sub-layers in parallel and combines outputs.

func RMSNormBackwardPolymorphic ¶

func RMSNormBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

RMSNormBackwardPolymorphic calculates gradients for RMSNorm.

func RMSNormForwardPolymorphic ¶

func RMSNormForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

RMSNormForwardPolymorphic performs RMS normalization.

func RNNBackwardPolymorphic ¶

func RNNBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

RNNBackwardPolymorphic calculates gradients for the RNN layer using BPTT.

func RNNBackwardTiled ¶

func RNNBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

RNNBackwardTiled performs a tiled backward pass for RNN using BPTT.

func RNNForwardPolymorphic ¶

func RNNForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

RNNForwardPolymorphic performs a forward pass through an RNN layer. It handles precision transitions and all 21 numerical types.

func RNNForwardTiled ¶

func RNNForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

RNNForwardTiled performs a tiled forward pass for RNN.

func ResidualBackwardPolymorphic ¶

func ResidualBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

ResidualBackwardPolymorphic computes gradients for Residual layer.

func ResidualBackwardTiled ¶

func ResidualBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

ResidualBackwardTiled performs a tiled backward pass for Residual.

func ResidualForwardPolymorphic ¶

func ResidualForwardPolymorphic[T Numeric](layer *VolumetricLayer, input, skip *Tensor[T]) (preAct, postAct *Tensor[T])

ResidualForwardPolymorphic adds a residual connection: output = input + skip.

func ResidualForwardTiled ¶

func ResidualForwardTiled[T Numeric](layer *VolumetricLayer, input, skip *Tensor[T]) (preAct, postAct *Tensor[T])

ResidualForwardTiled performs a tiled forward pass for Residual.

func SequentialBackwardPolymorphic ¶

func SequentialBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

SequentialBackwardPolymorphic distributes gradients back through the sequence in reverse.

func SequentialForwardPolymorphic ¶

func SequentialForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

SequentialForwardPolymorphic executes multiple sub-layers in sequence.

func SoftmaxBackwardPolymorphic ¶

func SoftmaxBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, postAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

SoftmaxBackwardPolymorphic computes gradients for ALL Softmax variants.

func SoftmaxForwardPolymorphic ¶

func SoftmaxForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

SoftmaxForwardPolymorphic performs a differentiable Softmax forward pass with ALL variants.

func SwiGLUBackwardPolymorphic ¶

func SwiGLUBackwardPolymorphic[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

SwiGLUBackwardPolymorphic calculates gradients for SwiGLU.

func SwiGLUBackwardTiled ¶

func SwiGLUBackwardTiled[T Numeric](layer *VolumetricLayer, gradOutput, input, preAct *Tensor[T]) (gradInput, gradWeights *Tensor[T])

SwiGLUBackwardTiled calculates gradients for SwiGLU using a tiled approach.

func SwiGLUForwardPolymorphic ¶

func SwiGLUForwardPolymorphic[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

SwiGLUForwardPolymorphic performs SwiGLU gated activation: silu(gate) * up then down_proj.

func SwiGLUForwardTiled ¶

func SwiGLUForwardTiled[T Numeric](layer *VolumetricLayer, input *Tensor[T]) (preAct, postAct *Tensor[T])

SwiGLUForwardTiled performs an optimized, tiled forward pass for SwiGLU.

func SystolicBackward ¶

func SystolicBackward[T Numeric](n *VolumetricNetwork, s *SystolicState[T], gradOutput *Tensor[T]) (gradIn *Tensor[T], layerGradients [][2]*Tensor[T], err error)

SystolicBackward propagates gradients backward through the systolic history. It walks backward through clock cycles, accurately routing gradients to their source coordinates.

func TargetPropForward ¶

func TargetPropForward[T Numeric](n *VolumetricNetwork, s *TargetPropState[T], input *Tensor[T]) *Tensor[T]

TargetPropForward executes a standard forward pass but captures ALL activations.

func (*Tensor[T]) Add ¶

func (t *Tensor[T]) Add(other *Tensor[T])

Add adds another tensor's data to this one (in-place).

func (*Tensor[T]) Clone ¶

func (t *Tensor[T]) Clone() *Tensor[T]

Clone creates a deep copy of the tensor.

type TensorInfo ¶

type TensorInfo struct {
	DType  string `json:"dtype"`
	Shape  []int  `json:"shape"`
	Offset []int  `json:"data_offsets"`
}

TensorInfo describes a tensor's properties

type TensorMeta ¶

type TensorMeta struct {
	Idx           int
	Shape         []int
	Data          []float32
	MeanAbs       float32
	Variance      float32
	Rank          int
	OriginalDType DType
}

TensorMeta holds geometric and statistical metadata for a tensor.

type TensorWithShape ¶

type TensorWithShape struct {
	Values []float32
	Shape  []int
	DType  string
}

TensorWithShape holds tensor data along with its shape

type TimeWindow ¶

type TimeWindow struct {
	WindowIndex   int           `json:"window_index"`
	Duration      time.Duration `json:"duration"`
	Outputs       int           `json:"outputs"`
	Correct       int           `json:"correct"`
	Accuracy      float64       `json:"accuracy"`
	OutputsPerSec int           `json:"outputs_per_sec"`
	CurrentTask   string        `json:"current_task"`
	TaskID        int           `json:"task_id"`
}

type Tokenizer ¶

type Tokenizer struct {
	Vocab         map[string]int // token -> id
	ReverseVocab  map[int]string // id -> token
	Merges        []MergePair    // BPE merge rules
	SpecialTokens map[string]int // special tokens
	AddedTokens   map[string]int // added tokens
	PreTokenizer  *PreTokenizer  // pre-tokenization rules
	ByteFallback  bool           // use byte fallback for unknown chars
}

Tokenizer represents a BPE tokenizer

func LoadTokenizer ¶

func LoadTokenizer(path string) (*Tokenizer, error)

LoadTokenizer loads a tokenizer from a HuggingFace tokenizer.json file

func (*Tokenizer) Decode ¶

func (t *Tokenizer) Decode(ids []uint32, skipSpecialTokens bool) string

Decode converts token IDs to text

func (*Tokenizer) Encode ¶

func (t *Tokenizer) Encode(text string, addSpecialTokens bool) []uint32

Encode converts text to token IDs

type TokenizerJSON ¶

type TokenizerJSON struct {
	Model struct {
		Type         string          `json:"type"`
		Vocab        map[string]int  `json:"vocab"`
		Merges       json.RawMessage `json:"merges"`
		ByteFallback bool            `json:"byte_fallback,omitempty"`
	} `json:"model"`
	AddedTokens []struct {
		ID      int    `json:"id"`
		Content string `json:"content"`
		Special bool   `json:"special"`
	} `json:"added_tokens"`
	PreTokenizer struct {
		Type          string `json:"type"`
		Pretokenizers []struct {
			Type    string `json:"type"`
			Pattern struct {
				String string `json:"String"`
			} `json:"pattern,omitempty"`
		} `json:"pretokenizers,omitempty"`
	} `json:"pre_tokenizer"`
}

TokenizerJSON represents the HuggingFace tokenizer.json format

type TrainingBatch ¶

type TrainingBatch[T Numeric] struct {
	Input  *Tensor[T]
	Target *Tensor[T]
}

TrainingBatch represents a single training batch for the Poly engine.

type TrainingConfig ¶

type TrainingConfig struct {
	Epochs       int
	LearningRate float32
	LossType     string  // "mse" or "cross_entropy"
	GradientClip float32 // Max gradient norm (0 = no clipping)
	Verbose      bool
	UseGPU       bool
	DeviceID     int
	TrackPerf    bool
}

TrainingConfig holds configuration for training in the Volumetric Grid.

func DefaultTrainingConfig ¶

func DefaultTrainingConfig() *TrainingConfig

DefaultTrainingConfig returns sensible defaults for the Bedrock architecture.

type TrainingMetrics ¶

type TrainingMetrics struct {
	Steps        int                   `json:"steps"`
	Accuracy     float64               `json:"accuracy"`
	Loss         float64               `json:"loss"`
	TimeTotal    time.Duration         `json:"time_total"`
	TimeToTarget time.Duration         `json:"time_to_target"`
	MemoryPeakMB float64               `json:"memory_peak_mb"`
	Milestones   map[int]time.Duration `json:"milestones"`
}

TrainingMetrics captures performance metrics for a training run.

func NewTrainingMetrics ¶

func NewTrainingMetrics() TrainingMetrics

NewTrainingMetrics creates an initialized TrainingMetrics.

type TrainingResult ¶

type TrainingResult struct {
	FinalLoss   float64
	TotalTime   time.Duration
	LossHistory []float64
	EpochTimes  []time.Duration
}

TrainingResult contains training statistics for the Poly engine.

func Train ¶

func Train[T Numeric](n *VolumetricNetwork, batches []TrainingBatch[T], config *TrainingConfig) (*TrainingResult, error)

Train executes the training loop on a VolumetricNetwork.

type Transformer ¶

type Transformer[T Numeric] struct {
	Network    *VolumetricNetwork
	Embeddings []float32
	LMHead     []float32
	FinalNorm  []float32
	HiddenSize int
	VocabSize  int
	Template   Template
	// contains filtered or unexported fields
}

Transformer coordinates high-level generation logic using the underlying VolumetricNetwork

func NewTransformer ¶

func NewTransformer[T Numeric](network *VolumetricNetwork, embeddings, lmHead, finalNorm []float32, template Template) *Transformer[T]

NewTransformer creates a new polymorphic transformer

func (*Transformer[T]) EnableTiling ¶

func (t *Transformer[T]) EnableTiling(tileSize int)

EnableTiling enables cache-tiling optimization for all layers in the transformer. If tileSize is <= 0, it dynamically auto-detects the best size for the hardware.

func (*Transformer[T]) ForwardTokenIDsWGPU ¶

func (t *Transformer[T]) ForwardTokenIDsWGPU(tokens []uint32, input *Tensor[T], computeLogits bool, onlyLast bool) (*Tensor[T], error)

ForwardWGPU handles both prefill (multi-token) and decode (single-token) GPU forward passes. All layer dispatches are recorded into a single CommandEncoder (BeginFrame/FlushFrame), reducing GPU submission overhead from ~150+ submits/token to just 1 submit + 1 download. ForwardTokenIDsWGPU is the "true" GPU residency path. If tokens are provided, embedding lookup happens on GPU. If final norm/LM head are synced, they run on GPU too.

func (*Transformer[T]) ForwardWGPU ¶

func (t *Transformer[T]) ForwardWGPU(input *Tensor[T]) (*Tensor[T], error)

func (*Transformer[T]) Generate ¶

func (t *Transformer[T]) Generate(
	encode func(text string) []uint32,
	decode func(tokens []uint32) string,
	turns []Turn,
	systemPrompt, userMsg string,
	opts GenOptions,
) string

Generate implements the stateless generation logic

func (*Transformer[T]) Reset ¶

func (t *Transformer[T]) Reset()

Reset clears the KV cache for all layers

func (*Transformer[T]) SyncToGPU ¶

func (t *Transformer[T]) SyncToGPU() error

type Turn ¶

type Turn struct {
	User      string
	Assistant string
}

Turn represents a single turn in a chat conversation

type VolumetricLayer ¶

type VolumetricLayer struct {
	Network     *VolumetricNetwork
	Type        LayerType
	Activation  ActivationType
	DType       DType
	WeightStore *WeightStore
	IsDisabled  bool

	// 3D Coordinates
	Z int // Depth
	Y int // Row
	X int // Col
	L int // Layer index within cell

	// Config (Expanding from LayerConfig)
	InputHeight   int
	InputWidth    int
	InputDepth    int
	OutputHeight  int
	OutputWidth   int
	OutputDepth   int
	InputChannels int
	Filters       int
	KernelSize    int
	Stride        int
	Padding       int
	OutputPadding int

	NumHeads     int
	NumKVHeads   int
	HeadDim      int
	DModel       int
	SeqLength    int
	RoPEFreqBase float64

	VocabSize    int
	EmbeddingDim int

	NumClusters       int
	KMeansTemperature float64
	KMeansOutputMode  string // "probabilities" or "features"

	SoftmaxType     SoftmaxType
	Temperature     float64
	SoftmaxRows     int
	SoftmaxCols     int
	HierarchyLevels []int
	EntmaxAlpha     float64
	Mask            []bool
	GumbelNoise     bool

	ParallelBranches []VolumetricLayer
	CombineMode      string // "concat", "add", "avg", "filter", "grid_scatter"
	FilterGateConfig *VolumetricLayer

	// Spatial Routing (Remote Links)
	IsRemoteLink bool
	TargetZ      int
	TargetY      int
	TargetX      int
	TargetL      int

	SequentialLayers []VolumetricLayer

	// Tiling & GPU Config
	UseTiling bool
	TileSize  int
	UseGPU    bool

	IsGPUResident        bool
	IsKVCacheGPUResident bool

	Observer PolyObserver

	// KV Cache (for MHA)
	KVCacheK  *Tensor[float32]
	KVCacheV  *Tensor[float32]
	KVOffset  int
	MaxSeqLen int

	// Persistent GPU KV buffers
	GPUKVCacheK any // *wgpu.Buffer
	GPUKVCacheV any // *wgpu.Buffer
}

VolumetricLayer represents a processing unit in the 3D volumetric grid.

func CreateResidualGraft ¶

func CreateResidualGraft(main *VolumetricNetwork) *VolumetricLayer

CreateResidualGraft wraps a network in a residual block.

func GraftNetworksPolymorphic ¶

func GraftNetworksPolymorphic(networks []*VolumetricNetwork, combineMode string) (*VolumetricLayer, error)

GraftNetworksPolymorphic takes multiple heterogeneous VolumetricNetworks and grafts their specific layers into a single parallel layer within a new network.

func ReconstructCNNLayer ¶

func ReconstructCNNLayer(name string, tensors []DetectedTensor, ltype LayerType) (*VolumetricLayer, error)

ReconstructCNNLayer attempts to build a VolumetricLayer of type CNN from grouped tensors.

func ReconstructLayerNormLayer ¶

func ReconstructLayerNormLayer(name string, tensors []DetectedTensor, dModel int) (*VolumetricLayer, error)

ReconstructLayerNormLayer builds a LayerNorm layer.

func ReconstructMHALayer ¶

func ReconstructMHALayer(name string, tensors []DetectedTensor, dModel int, numHeads int) (*VolumetricLayer, error)

ReconstructMHALayer attempts to build a VolumetricLayer of type MultiHeadAttention from grouped tensors.

func ReconstructRMSNormLayer ¶

func ReconstructRMSNormLayer(name string, tensors []DetectedTensor, dModel int) (*VolumetricLayer, error)

ReconstructRMSNormLayer builds an RMSNorm layer.

func ReconstructSwiGLULayer ¶

func ReconstructSwiGLULayer(name string, tensors []DetectedTensor, dModel int) (*VolumetricLayer, error)

ReconstructSwiGLULayer builds a SwiGLU layer from gated MLP tensors.

func (*VolumetricLayer) SyncToCPU ¶

func (l *VolumetricLayer) SyncToCPU()

SyncToCPU releases GPU resources.

func (*VolumetricLayer) SyncToGPU ¶

func (l *VolumetricLayer) SyncToGPU() error

SyncToGPU mirrors active weights and KV caches to the GPU.

type VolumetricNetwork ¶

type VolumetricNetwork struct {
	Depth         int
	Rows          int
	Cols          int
	LayersPerCell int

	Layers []VolumetricLayer

	// Global Tiling & GPU Switches
	UseTiling bool
	UseGPU    bool

	// GPU Acceleration context
	GPUContext *WGPUContext

	// Persistent GPU buffers to avoid allocations
	GPUHiddenState []any // map[DType]wgpu.Buffer or similar, use any for now
	GPULogits      any   // wgpu.Buffer

	GPUEmbeddings any // *wgpu.Buffer
	GPULMHead     any // *wgpu.Buffer
}

VolumetricNetwork represents a 3D grid neural network.

func BuildCNN ¶

func BuildCNN(inputSize, numClasses int, dtype DType) *VolumetricNetwork

BuildCNN creates a simple convolutional network.

func BuildNetworkFromJSON ¶

func BuildNetworkFromJSON(jsonData []byte) (*VolumetricNetwork, error)

BuildNetworkFromJSON creates a VolumetricNetwork from a JSON string.

func BuildRandomNetwork ¶

func BuildRandomNetwork(depth, rows, cols, lpc int, dModel int) *VolumetricNetwork

BuildRandomNetwork generates a diverse VolumetricNetwork.

func BuildSequentialNetwork ¶

func BuildSequentialNetwork(numLayers int, dModel int, act ActivationType, dtype DType) *VolumetricNetwork

func BuildTransformerNetwork ¶

func BuildTransformerNetwork(numBlocks int, dModel int, numHeads int, dtype DType) *VolumetricNetwork

BuildTransformerNetwork creates a stack of Transformer blocks.

func DeserializeNetwork ¶

func DeserializeNetwork(jsonData []byte) (*VolumetricNetwork, error)

DeserializeNetwork reconstructs a VolumetricNetwork from a JSON byte slice.

func LoadUniversal ¶

func LoadUniversal(path string) (*VolumetricNetwork, error)

LoadUniversal loads a model from a safetensors file and auto-detects its architecture.

func MountGeometrically ¶

func MountGeometrically(archs []LayerArchetype, geoms []TensorMeta) *VolumetricNetwork

MountGeometrically creates a VolumetricNetwork from archetypes and geometries.

func NEATMutate ¶ added in v0.74.0

func NEATMutate(n *VolumetricNetwork, cfg NEATConfig) *VolumetricNetwork

NEATMutate applies NEAT-style structural and weight mutations to a copy of n. The original network is never modified — a clone is returned.

Mutation sequence per layer:

Weight perturbation — add small Gaussian noise to Master weights
Activation mutation — randomly swap the activation function
Node mutation — change layer type, reinitialize weights
Layer toggle — flip IsDisabled (activate dormant / silence active)

Network-level mutations (applied once after per-layer pass):

Connection add — insert a remote link (IsRemoteLink spatial hop)
Connection drop — remove an existing remote link

func NewVolumetricNetwork ¶

func NewVolumetricNetwork(depth, rows, cols, layersPerCell int) *VolumetricNetwork

NewVolumetricNetwork initializes a 3D grid of layers.

func SpliceDNA ¶ added in v0.74.0

func SpliceDNA(parentA, parentB *VolumetricNetwork, cfg SpliceConfig) *VolumetricNetwork

SpliceDNA merges two trained parent networks into a child network.

parentA is the structural template (grid dimensions, layer types are inherited). parentB contributes weights to matching layers, weighted by DNA similarity.

For each layer at coordinate (z,y,x,l):

If both parents have the layer and their types match, weights are blended.
If parentB has no matching layer, the child keeps parentA's weights.

The three blend strategies:

"blend"   — interpolate: child[i] = wA[i]*(1-α) + wB[i]*α
            α is modulated by cosine similarity and relative fitness.
"point"   — split at SplitRatio: first N weights from A, rest from B.
"uniform" — per-weight random pick from A or B, biased by fitness.

func (*VolumetricNetwork) CalculateTotalMemory ¶

func (n *VolumetricNetwork) CalculateTotalMemory() int

CalculateTotalMemory returns the total size of all layers in bytes.

func (*VolumetricNetwork) GetIndex ¶

func (n *VolumetricNetwork) GetIndex(z, y, x, l int) int

GetIndex calculates the flattened index for a 3D coordinate.

func (*VolumetricNetwork) GetLayer ¶

func (n *VolumetricNetwork) GetLayer(z, y, x, l int) *VolumetricLayer

GetLayer returns the layer at specific 3D coordinates.

func (*VolumetricNetwork) GetMethodSignature ¶

func (n *VolumetricNetwork) GetMethodSignature(methodName string) (string, error)

GetMethodSignature returns the signature of a specific method.

func (*VolumetricNetwork) GetMethods ¶

func (n *VolumetricNetwork) GetMethods() ([]MethodInfo, error)

GetMethods retrieves all public methods of the VolumetricNetwork struct.

func (*VolumetricNetwork) GetMethodsJSON ¶

func (n *VolumetricNetwork) GetMethodsJSON() (string, error)

GetMethodsJSON returns a JSON string containing all methods attached to the VolumetricNetwork struct.

func (*VolumetricNetwork) HasMethod ¶

func (n *VolumetricNetwork) HasMethod(methodName string) bool

HasMethod checks if a method exists on the VolumetricNetwork.

func (*VolumetricNetwork) InitCNNCell ¶

func (n *VolumetricNetwork) InitCNNCell(z, y, x, l int, ltype LayerType, inChannels, filters, kSize int, dtype DType, scale float32)

func (*VolumetricNetwork) InitConvTransposedCell ¶

func (n *VolumetricNetwork) InitConvTransposedCell(z, y, x, l int, ltype LayerType, inChannels, filters, kSize int, dtype DType, scale float32)

func (*VolumetricNetwork) InitDenseCell ¶

func (n *VolumetricNetwork) InitDenseCell(z, y, x, l int, dModel int, act ActivationType, scale float32)

func (*VolumetricNetwork) InitEmbeddingCell ¶

func (n *VolumetricNetwork) InitEmbeddingCell(z, y, x, l int, vocabSize, dModel int, dtype DType)

func (*VolumetricNetwork) InitKMeansCell ¶

func (n *VolumetricNetwork) InitKMeansCell(z, y, x, l int, numClusters, dModel int, dtype DType)

func (*VolumetricNetwork) InitLSTMCell ¶

func (n *VolumetricNetwork) InitLSTMCell(z, y, x, l int, dModel int, scale float32)

func (*VolumetricNetwork) InitLayerNormCell ¶

func (n *VolumetricNetwork) InitLayerNormCell(z, y, x, l int, size int, dtype DType)

func (*VolumetricNetwork) InitMHACell ¶

func (n *VolumetricNetwork) InitMHACell(z, y, x, l int, dModel, numHeads int, scale float32)

func (*VolumetricNetwork) InitRNNCell ¶

func (n *VolumetricNetwork) InitRNNCell(z, y, x, l int, dModel int, scale float32)

func (*VolumetricNetwork) InitWGPU ¶

func (n *VolumetricNetwork) InitWGPU() error

InitWGPU initializes the WebGPU context for the network.

func (*VolumetricNetwork) ListMethods ¶

func (n *VolumetricNetwork) ListMethods() []string

ListMethods returns a simple list of all public method names.

func (*VolumetricNetwork) SyncAllToGPU ¶

func (n *VolumetricNetwork) SyncAllToGPU() error

SyncAllToGPU mirrors the entire network state to VRAM.

func (*VolumetricNetwork) SyncToGPU ¶ added in v0.73.0

func (n *VolumetricNetwork) SyncToGPU() error

SyncToGPU mirrors all layers to the GPU.

type WGPUActivationParams ¶ added in v0.73.0

type WGPUActivationParams struct {
	Size uint32
	Act  uint32
	// contains filtered or unexported fields
}

type WGPUApplyGradientsParams ¶ added in v0.73.0

type WGPUApplyGradientsParams struct {
	Size uint32
	LR   float32
	// contains filtered or unexported fields
}

type WGPUCNN1BackwardParams ¶ added in v0.73.0

type WGPUCNN1BackwardParams struct {
	BatchSize  uint32
	InC        uint32
	InL        uint32
	Filters    uint32
	OutL       uint32
	KSize      uint32
	Stride     uint32
	Padding    uint32
	Activation uint32
}

type WGPUCNN1Params ¶

type WGPUCNN1Params struct {
	BatchSize uint32
	InC       uint32
	InL       uint32
	OutC      uint32
	OutL      uint32
	KSize     uint32
	Stride    uint32
	Padding   uint32
}

type WGPUCNN2BackwardParams ¶ added in v0.73.0

type WGPUCNN2BackwardParams struct {
	BatchSize  uint32
	InC        uint32
	InH        uint32
	InW        uint32
	Filters    uint32
	OutH       uint32
	OutW       uint32
	KSize      uint32
	Stride     uint32
	Padding    uint32
	Activation uint32
}

type WGPUCNN2Params ¶

type WGPUCNN2Params struct {
	BatchSize uint32
	InC       uint32
	InH       uint32
	InW       uint32
	OutC      uint32
	OutH      uint32
	OutW      uint32
	KH        uint32
	KW        uint32
	StrideH   uint32
	StrideW   uint32
	PadH      uint32
	PadW      uint32
}

type WGPUCNN3BackwardParams ¶ added in v0.73.0

type WGPUCNN3BackwardParams struct {
	BatchSize  uint32
	InC        uint32
	InD        uint32
	InH        uint32
	InW        uint32
	Filters    uint32
	OutD       uint32
	OutH       uint32
	OutW       uint32
	KSize      uint32
	Stride     uint32
	Padding    uint32
	Activation uint32
}

type WGPUCNN3Params ¶

type WGPUCNN3Params struct {
	BatchSize              uint32
	InC, InD, InH, InW     uint32
	OutC, OutD, OutH, OutW uint32
	KD, KH, KW             uint32
	SD, SH, SW             uint32
	PD, PH, PW             uint32
}

type WGPUContext ¶

type WGPUContext struct {
	Instance       *wgpu.Instance
	Adapter        *wgpu.Adapter
	Device         *wgpu.Device
	Queue          *wgpu.Queue
	PipelineCache  map[string]*wgpu.ComputePipeline
	ActivationPool map[string]*wgpu.Buffer
	// GPUTileSize is the auto-detected optimal tile size for this GPU.
	// Can be overridden by the caller after init.
	GPUTileSize int
	// ActiveEncoder, when non-nil, is used by all Dispatch* calls instead of
	// creating their own encoder. This lets the entire forward pass be recorded
	// into a single command buffer and submitted once, reducing GPU overhead.
	ActiveEncoder *wgpu.CommandEncoder
	// PendingDestroys holds temporary uniform buffers that must not be destroyed
	// until after FlushFrame() submits the active encoder. When not batching,
	// buffers are destroyed immediately instead of queued here.
	PendingDestroys []*wgpu.Buffer

	// --- Performance Optimization Caches ---
	LayoutCache    map[string]*wgpu.BindGroupLayout
	BindGroupCache map[uint64]*wgpu.BindGroup

	// Uniform Pool
	UniformPool []*wgpu.Buffer
	UniformIdx  int

	// Negotiated limits
	Limits wgpu.Limits
}

WGPUContext manages the GPU device and queue for acceleration.

func (*WGPUContext) BeginFrame ¶

func (c *WGPUContext) BeginFrame() error

BeginFrame creates a shared CommandEncoder that all subsequent Dispatch* calls will record into until FlushFrame is called.

func (*WGPUContext) CreateComputePipeline ¶

func (c *WGPUContext) CreateComputePipeline(shaderSource string) (*wgpu.ComputePipeline, error)

func (*WGPUContext) CreatePersistentBuffer ¶

func (c *WGPUContext) CreatePersistentBuffer(data []float32, label string) (*wgpu.Buffer, error)

CreatePersistentBuffer creates a storage buffer that stays in VRAM.

func (*WGPUContext) DispatchActivation ¶ added in v0.73.0

func (c *WGPUContext) DispatchActivation(size int, act ActivationType, inputBuf, outputBuf *wgpu.Buffer) error

func (*WGPUContext) DispatchActivationBackward ¶ added in v0.73.0

func (c *WGPUContext) DispatchActivationBackward(size int, act ActivationType, gradOutBuf, preActBuf, gradInBuf *wgpu.Buffer) error

func (*WGPUContext) DispatchApplyGradients ¶ added in v0.73.0

func (c *WGPUContext) DispatchApplyGradients(size int, lr float32, weightBuf, gradBuf *wgpu.Buffer) error

func (*WGPUContext) DispatchBackwardLayer ¶ added in v0.73.0

func (c *WGPUContext) DispatchBackwardLayer(l *VolumetricLayer, batchSize int, gradOutBuf, inputBuf, preActBuf, dxBuf, dwBuf *wgpu.Buffer) error

func (*WGPUContext) DispatchCNN1 ¶

func (c *WGPUContext) DispatchCNN1(
	batchSize, inC, inL, outC, outL, kSize, stride, padding int,
	inputBuf, weightBuf, outputBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchCNN1BackwardDW ¶ added in v0.73.0

func (c *WGPUContext) DispatchCNN1BackwardDW(
	batchSize, inC, inL, filters, outL, kSize, stride, padding int,
	activation ActivationType,
	gradOutputBuf, inputBuf, preActBuf, gradWeightBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchCNN1BackwardDX ¶ added in v0.73.0

func (c *WGPUContext) DispatchCNN1BackwardDX(
	batchSize, inC, inL, filters, outL, kSize, stride, padding int,
	activation ActivationType,
	gradOutputBuf, weightBuf, preActBuf, gradInputBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchCNN2 ¶

func (c *WGPUContext) DispatchCNN2(
	batchSize, inC, inH, inW, outC, outH, outW, kH, kW, strideH, strideW, padH, padW int,
	inputBuf, weightBuf, outputBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchCNN2BackwardDW ¶ added in v0.73.0

func (c *WGPUContext) DispatchCNN2BackwardDW(
	batchSize, inC, inH, inW, filters, outH, outW, kSize, stride, padding int,
	activation ActivationType,
	gradOutputBuf, inputBuf, preActBuf, gradWeightBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchCNN2BackwardDX ¶ added in v0.73.0

func (c *WGPUContext) DispatchCNN2BackwardDX(
	batchSize, inC, inH, inW, filters, outH, outW, kSize, stride, padding int,
	activation ActivationType,
	gradOutputBuf, weightBuf, preActBuf, gradInputBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchCNN3 ¶

func (c *WGPUContext) DispatchCNN3(
	batchSize, inC, inD, inH, inW, outC, outD, outH, outW, kD, kH, kW, sD, sH, sW, pD, pH, pW int,
	inputBuf, weightBuf, outputBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchCNN3BackwardDW ¶ added in v0.73.0

func (c *WGPUContext) DispatchCNN3BackwardDW(
	batchSize, inC, inD, inH, inW, filters, outD, outH, outW, kSize, stride, padding int,
	activation ActivationType,
	gradOutputBuf, inputBuf, preActBuf, gradWeightBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchCNN3BackwardDX ¶ added in v0.73.0

func (c *WGPUContext) DispatchCNN3BackwardDX(
	batchSize, inC, inD, inH, inW, filters, outD, outH, outW, kSize, stride, padding int,
	activation ActivationType,
	gradOutputBuf, weightBuf, preActBuf, gradInputBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchDense ¶

func (c *WGPUContext) DispatchDense(
	batchSize, inputSize, outputSize int,
	inputBuf, weightBuf, outputBuf *wgpu.Buffer,
	tileSize int,
) error

DispatchDense dispatches a tiled dense matrix-multiply kernel.

func (*WGPUContext) DispatchDenseBackwardDW ¶ added in v0.73.0

func (c *WGPUContext) DispatchDenseBackwardDW(
	batchSize, inputSize, outputSize int,
	gradOutputBuf, inputBuf, gradWeightBuf *wgpu.Buffer,
	tileSize int,
) error

DispatchDenseBackwardDW calculates gradWeights = gradOutput^T * input

func (*WGPUContext) DispatchDenseBackwardDX ¶ added in v0.73.0

func (c *WGPUContext) DispatchDenseBackwardDX(
	batchSize, inputSize, outputSize int,
	gradOutputBuf, weightBuf, gradInputBuf *wgpu.Buffer,
	tileSize int,
) error

DispatchDenseBackwardDX calculates gradInput = gradOutput * weights

func (*WGPUContext) DispatchDenseQ4 ¶

func (c *WGPUContext) DispatchDenseQ4(
	batchSize, inputSize, outputSize int,
	inputBuf, scaleBuf, weightBuf, outputBuf *wgpu.Buffer,
	tileSize int,
) error

DispatchDenseQ4 dispatches a tiled dense kernel that dequantizes Q4_0 weights.

func (*WGPUContext) DispatchEmbedding ¶

func (c *WGPUContext) DispatchEmbedding(
	vocabSize, hiddenSize, numTokens int,
	indicesBuf, weightsBuf, outputBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchEmbeddingBackward ¶ added in v0.73.0

func (c *WGPUContext) DispatchEmbeddingBackward(
	vocabSize, hiddenSize, numTokens int,
	indicesBuf, gradOutputBuf, gradWeightBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchForwardLayer ¶ added in v0.73.0

func (c *WGPUContext) DispatchForwardLayer(l *VolumetricLayer, batchSize int, inputBuf, outBuf *wgpu.Buffer) error

func (*WGPUContext) DispatchKVUpdate ¶

func (c *WGPUContext) DispatchKVUpdate(
	offset, headDim, maxSeqLen, numKVHeads, numTokens int,
	kCache, vCache, newK, newV *wgpu.Buffer,
) error

func (*WGPUContext) DispatchLSTMStep ¶

func (c *WGPUContext) DispatchLSTMStep(
	batchSize, inputSize, hiddenSize int,
	inputBuf, hPrevBuf, cPrevBuf, weightBuf, hCurrBuf, cCurrBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchMHA ¶

func (c *WGPUContext) DispatchMHA(
	numHeads, numKVHeads, headDim, seqLen, kvOffset, maxSeqLen int,
	qBuf, kBuf, vBuf, oBuf *wgpu.Buffer,
	tileSize int,
) error

DispatchMHA dispatches the tiled multi-head attention kernel.

func (*WGPUContext) DispatchMHABackward ¶ added in v0.73.0

func (c *WGPUContext) DispatchMHABackward(
	batchSize, numHeads, numKVHeads, headDim, seqLen int, scale float32,
	gradOutputBuf, qBuf, kBuf, vBuf, dQBuf, dKBuf, dVBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchMSEGradPartialLoss ¶ added in v0.73.0

func (c *WGPUContext) DispatchMSEGradPartialLoss(
	size int,
	outputBuf, targetBuf, gradBuf, partialsBuf *wgpu.Buffer,
) error

DispatchMSEGradPartialLoss computes MSE gradients on GPU and writes partial loss sums. numWG = ceil(size/256) partial sums are written to partialsBuf. CPU sums them for total loss.

func (*WGPUContext) DispatchRMSNorm ¶

func (c *WGPUContext) DispatchRMSNorm(
	batchSize, size int, epsilon float32,
	inputBuf, weightBuf, outputBuf *wgpu.Buffer,
) error

DispatchRMSNorm dispatches the RMSNorm kernel.

func (*WGPUContext) DispatchRMSNormBackward ¶ added in v0.73.0

func (c *WGPUContext) DispatchRMSNormBackward(
	batchSize, size int, epsilon float32,
	gradOutputBuf, inputBuf, rmsBuf, weightBuf, gradInputBuf, gradWeightBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchRNNStep ¶

func (c *WGPUContext) DispatchRNNStep(
	batchSize, inputSize, hiddenSize int,
	inputBuf, hPrevBuf, wIHBuf, wHHBuf, biasBuf, hCurrBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchResidual ¶

func (c *WGPUContext) DispatchResidual(
	size int,
	inputBuf, residualBuf *wgpu.Buffer,
) error

DispatchResidual dispatches the element-wise addition kernel.

func (*WGPUContext) DispatchResidualBackward ¶ added in v0.73.0

func (c *WGPUContext) DispatchResidualBackward(
	size int,
	gradOutputBuf, gradInputBuf, gradResidualBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchRoPE ¶

func (c *WGPUContext) DispatchRoPE(
	seqLen, headDim, numHeads, offset int, theta float32,
	targetBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchSwiGLU ¶

func (c *WGPUContext) DispatchSwiGLU(
	batchSize, inputSize, outputSize int,
	inputBuf, gateBuf, upBuf, outputBuf *wgpu.Buffer,
	tileSize int,
) error

DispatchSwiGLU dispatches the tiled SwiGLU MLP kernel.

func (*WGPUContext) DispatchSwiGLUBackward ¶ added in v0.73.0

func (c *WGPUContext) DispatchSwiGLUBackward(
	batchSize, inputSize, outputSize int,
	gradOutputBuf, gateInBuf, upInBuf, gradGateBuf, gradUpBuf *wgpu.Buffer,
) error

func (*WGPUContext) DispatchSwiGLUQ4 ¶

func (c *WGPUContext) DispatchSwiGLUQ4(
	batchSize, inputSize, outputSize int,
	inputBuf, gateScaleBuf, gateWeightBuf, upScaleBuf, upWeightBuf, outputBuf *wgpu.Buffer,
	tileSize int,
) error

DispatchSwiGLUQ4 dispatches a tiled SwiGLU kernel with Q4_0 weights.

func (*WGPUContext) FlushFrame ¶

func (c *WGPUContext) FlushFrame()

FlushFrame finishes and submits the shared CommandEncoder, then destroys any temporary uniform buffers that were kept alive for the duration of recording.

func (*WGPUContext) GetActivationBuffer ¶

func (c *WGPUContext) GetActivationBuffer(name string, size uint64, usage wgpu.BufferUsage) *wgpu.Buffer

GetActivationBuffer retrieves or creates a persistent activation buffer.

func (*WGPUContext) GetBindGroup ¶

func (c *WGPUContext) GetBindGroup(pipeline *wgpu.ComputePipeline, buffers ...*wgpu.Buffer) (*wgpu.BindGroup, error)

GetBindGroup retrieves or creates a BindGroup for the given pipeline and buffers.

func (*WGPUContext) GetUniformBuffer ¶

func (c *WGPUContext) GetUniformBuffer(size uint64) *wgpu.Buffer

GetUniformBuffer provides a pre-allocated uniform buffer from the pool.

func (*WGPUContext) ReadBuffer ¶

func (c *WGPUContext) ReadBuffer(buf *wgpu.Buffer) ([]float32, error)

ReadBuffer reads data from a GPU buffer back to a float32 slice.

func (*WGPUContext) Release ¶

func (c *WGPUContext) Release()

Release releases all WebGPU resources.

func (*WGPUContext) ResetCache ¶

func (c *WGPUContext) ResetCache()

ResetCache clears all BindGroups and Pipelines. Should be called when model architecture or precision changes.

type WGPUDenseParams ¶

type WGPUDenseParams struct {
	BatchSize  uint32
	InputSize  uint32
	OutputSize uint32
	TileSize   uint32
}

WGPUDenseParams matches the WGSL struct

type WGPUEmbeddingParams ¶

type WGPUEmbeddingParams struct {
	VocabSize  uint32
	HiddenSize uint32
	NumTokens  uint32
	Padding    uint32
}

type WGPUKVParams ¶

type WGPUKVParams struct {
	Offset     uint32
	HeadDim    uint32
	MaxSeqLen  uint32
	NumKVHeads uint32
	NumTokens  uint32
}

type WGPULSTMParams ¶

type WGPULSTMParams struct {
	BatchSize  uint32
	InputSize  uint32
	HiddenSize uint32
	Padding    uint32
}

type WGPULossParams ¶ added in v0.73.0

type WGPULossParams struct {
	Size uint32
	// contains filtered or unexported fields
}

type WGPUMHABackwardParams ¶ added in v0.73.0

type WGPUMHABackwardParams struct {
	BatchSize  uint32
	NumHeads   uint32
	NumKVHeads uint32
	HeadDim    uint32
	SeqLen     uint32
	Scale      float32
	// contains filtered or unexported fields
}

type WGPUMHAParams ¶

type WGPUMHAParams struct {
	NumHeads   uint32
	NumKVHeads uint32
	HeadDim    uint32
	SeqLen     uint32
	KVOffset   uint32
	MaxSeqLen  uint32
	TileSize   uint32
	Padding    uint32
}

WGPUMHAParams matches the attention WGSL struct

type WGPURMSNormParams ¶

type WGPURMSNormParams struct {
	Size    uint32
	Epsilon float32
	// contains filtered or unexported fields
}

WGPURMSNormParams matches the WGSL struct

type WGPURNNParams ¶

type WGPURNNParams struct {
	BatchSize  uint32
	InputSize  uint32
	HiddenSize uint32
	Padding    uint32
}

type WGPURoPEParams ¶

type WGPURoPEParams struct {
	SeqLen   uint32
	HeadDim  uint32
	NumHeads uint32
	Offset   uint32
	Theta    float32
	// contains filtered or unexported fields
}

type WeightStore ¶

type WeightStore struct {
	Master     []float32              // Master FP32 weights (Source of Truth)
	Versions   map[DType]any          // Active versions (e.g., map[DTypeFP4][]byte)
	GPUWeights map[DType]any          // VRAM-resident versions (wgpu.Buffer)
	GPUScales  map[DType]*wgpu.Buffer // VRAM-resident scales for quantized types
	Scale      float32                // Quantization scale factor
}

WeightStore manages multiple numerical versions of the same weights. This is the core of "Polymorphic Layer-Morphing".

func NewWeightStore ¶

func NewWeightStore(size int) *WeightStore

NewWeightStore creates a new storage for weights.

func (*WeightStore) ApplyGradients ¶

func (ws *WeightStore) ApplyGradients(gradWeights *Tensor[float32], lr float32)

ApplyGradients performs a simple SGD update (weight = weight - lr * gradient). This is the "Learning" step that mutates the actual weights in the Master store.

func (*WeightStore) GetActive ¶

func (ws *WeightStore) GetActive(dtype DType) any

GetActive returns the data for the given DType if it exists.

func (*WeightStore) Morph ¶

func (ws *WeightStore) Morph(dtype DType)

Morph converts master weights into the target DType and caches the result.

func (*WeightStore) Randomize ¶

func (ws *WeightStore) Randomize(seed int64, scale float32)

Randomize fills the master weights with small random values to break symmetry.

func (*WeightStore) SetVersion ¶

func (ws *WeightStore) SetVersion(dtype DType, data any)

SetVersion stores a converted version of weights.

func (*WeightStore) SizeInBytes ¶

func (ws *WeightStore) SizeInBytes(dtype DType) int

SizeInBytes calculates the memory footprint of the currently active version.

func (*WeightStore) Unpack ¶

func (ws *WeightStore) Unpack(dtype DType)

Unpack reconstructs master weights from a bit-packed native version.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

M-POLY-VTD Architecture

Core Pillars

I. Multi-Numerical Architecture (M-POLY)

II. Polymorphic Layer-Morphing (POLY)

III. Volumetric Tensor Dispatch (VTD)

IV. Hierarchical Spatial Correlation Engine (DNA)

V. Native Bit-Packed Persistence

VI. Neural Target Propagation (TargetProp)

Performance & Verification

Running the Verification Demo

Running the Benchmarks

TypeScript / WASM Implementation Verification

TS/WASM Training Showdown Benchmark

Key Performance Insights

GPU Forward / Inference (CPU Tiling vs GPU)

GPU End-to-End Training (20 epochs, CPU vs GPU)

Per-Layer Gradient Correctness (DX / DW parity, CPU vs GPU)

The Bedrock Philosophy

Architectural Design Choices

1. Unified Package Structure (poly/)

2. The Morphic WeightStore (WeightStore)

3. Volumetric 3D Dispatch (VTD)

4. Systolic Grid Propagation (Neural Mesh)

5. Recursive Neural Trees (Tensor.Nested)

6. Explicit Numerical Fast-Paths

7. The "Simulation vs. Throughput" Strategy

The 3 Planes of Polymorphism (Hardcore Edition)

1. Parametric Polymorphism (Generics)

2. Ad-hoc Polymorphism (The Dispatcher)

3. Numerical Metamorphosis (Dynamic Identity)

The GPU "Fusion" Secret: Why the Dispatcher Refactor Matters

1. Avoiding "Thread Divergence"

2. Batched Metamorphosis

3. Asynchronous Predispatch

⚡ Performance Roadmap: Bridging the "Ollama" Speed Gap

What is implemented today:

What is coming next to achieve 70+ Tok/s Decoding:

The Path to 70+ Tokens/Sec

Omni-Neural Framework: The Road to v1.0.0

1. Core Engine & Numerical Precision

1.1 Standard Floating-Point Types

1.2 Low-Precision & Bit-Level Types

1.3 Integer & Fixed-Point Infrastructure

1.4 GPU Numerical Acceleration

1.5 Quantization & Numerical Deep-Dive

1.6 GPU Backward Pass Completion

2. Architectural Components & Layers

2.1 Foundational Layers

2.2 Sequence & Temporal Layers

2.3 Attention & Transformer Mechanisms

2.4 Feed-Forward & Activations

2.5 Normalization & Modern Layer Architectures

2.6 Advanced Topological Structures

2.7 Introspection & Telemetry

3. Edge-First Orchestration & Efficiency

3.1 Device-Aware Compute

3.2 Memory & I/O Optimization

3.3 Hardware Acceleration & Adaptation

4. Advanced Training Logic & Automation

4.1 Execution Flow

4.2 Optimizers & Schedulers

4.3 Automated Evolutionary Logic

5. Deployment, Compilation & Ecosystem

5.1 Backends

5.2 Compiler Integration

5.3 Polyglot Ecosystem & I/O

5.4 Benchmarks & Validation

6. LLM Engine & Tokenization

6.1 Tokenization Core

6.2 Generation Logic

6.3 LLM Tooling & Profiling

📊 True Version Calculation

Completion Ratio: 74.6%

Version 0.74.0 — Complete

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

1. Unified Package Structure (`poly/`)

2. The Morphic WeightStore (`WeightStore`)

3. Volumetric 3D Dispatch (`VTD`)

5. Recursive Neural Trees (`Tensor.Nested`)