ztensor

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: Apache-2.0

README

ztensor

GPU-accelerated tensor, compute engine, and computation graph library for Go.

Part of the Zerfoo ML ecosystem.

Install

go get github.com/zerfoo/ztensor

What's included

  • tensor/ - Multi-type tensor storage (CPU, GPU, quantized, FP16, BF16, FP8)
  • compute/ - Engine interface with CPU, CUDA, ROCm, and OpenCL backends
  • graph/ - Computation graph compiler with fusion passes and CUDA graph capture
  • numeric/ - Type-safe arithmetic operations for all numeric types
  • device/ - Device abstraction and memory allocators
  • internal/cuda/ - Zero-CGo CUDA bindings via purego with 25+ custom kernels
  • internal/xblas/ - ARM NEON and x86 AVX2 SIMD assembly for GEMM, RMSNorm, RoPE, SiLU, softmax

Quick start

package main

import (
    "fmt"
    "github.com/zerfoo/ztensor/compute"
    "github.com/zerfoo/ztensor/numeric"
    "github.com/zerfoo/ztensor/tensor"
)

func main() {
    eng := compute.NewCPUEngine[float32](numeric.Float32Ops{})
    a, _ := tensor.New[float32]([]int{2, 3}, []float32{1, 2, 3, 4, 5, 6})
    b, _ := tensor.New[float32]([]int{3, 2}, []float32{1, 2, 3, 4, 5, 6})
    c, _ := eng.MatMul(a, b)
    fmt.Println(c.Shape()) // [2, 2]
    fmt.Println(c.Data())  // [22 28 49 64]
}

License

Apache 2.0

Directories

Path Synopsis
Package compute implements tensor computation engines and operations.
Package compute implements tensor computation engines and operations.
Package device provides device abstraction and memory allocation interfaces.
Package device provides device abstraction and memory allocation interfaces.
Package graph provides a computational graph abstraction.
Package graph provides a computational graph abstraction.
internal
clblast
Package clblast provides Go wrappers for the CLBlast BLAS library.
Package clblast provides Go wrappers for the CLBlast BLAS library.
codegen
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
cublas
Package cublas provides low-level purego bindings for the cuBLAS library.
Package cublas provides low-level purego bindings for the cuBLAS library.
cuda
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
cuda/kernels
Package kernels provides Go wrappers for custom CUDA kernels.
Package kernels provides Go wrappers for custom CUDA kernels.
cudnn
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
gpuapi
Package gpuapi defines internal interfaces for GPU runtime operations.
Package gpuapi defines internal interfaces for GPU runtime operations.
hip
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
hip/kernels
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
miopen
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
nccl
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
opencl
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
opencl/kernels
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
rocblas
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
tensorrt
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
workerpool
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
Package log provides a structured, leveled logging abstraction.
Package log provides a structured, leveled logging abstraction.
runtime
Package runtime provides a backend-agnostic metrics collection abstraction for runtime observability.
Package runtime provides a backend-agnostic metrics collection abstraction for runtime observability.
Package numeric provides precision types, arithmetic operations, and generic constraints for the Zerfoo ML framework.
Package numeric provides precision types, arithmetic operations, and generic constraints for the Zerfoo ML framework.
Package tensor provides a multi-dimensional array (tensor) implementation.
Package tensor provides a multi-dimensional array (tensor) implementation.
testing
Package types contains shared, fundamental types for the Zerfoo framework.
Package types contains shared, fundamental types for the Zerfoo framework.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL