codegen

package

v0.2.0 Latest Latest Go to latest Published: Mar 16, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/ztensor

Links

Open Source Insights

Documentation ¶

Overview ¶

Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape. Each primitive op maps to a CUDA device function call that operates on register-resident or shared-memory data.

Index ¶

func CachedCompile(source, cacheDir, modelName string) (string, error)
func CheckSupport(instructions []graph.InstructionMeta) []string
func Emit(op graph.InstructionMeta, inputs []SlotInfo) (string, error)
func EmitMegakernel(cfg MegakernelConfig) (string, error)
func NvccPath() (string, error)
func Supported(opName string) bool
type FrozenSlotMeta
type MegakernelConfig
type MegakernelRunner
- func LoadMegakernel(soPath string) (*MegakernelRunner, error)
type OpEmitter
type SlotInfo
type WorkspaceLayout
- func ComputeWorkspaceLayout(cfg MegakernelConfig) WorkspaceLayout

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CachedCompile ¶

func CachedCompile(source, cacheDir, modelName string) (string, error)

CachedCompile compiles a CUDA source string to a shared library (.so), caching the result alongside the model. If the cached .so exists and the source hash matches, the cached version is returned immediately.

Parameters:

source: the CUDA source code string
cacheDir: directory for the cached .so and hash file
modelName: base name for the output files

Returns the path to the compiled .so file.

func CheckSupport ¶

func CheckSupport(instructions []graph.InstructionMeta) []string

CheckSupport verifies that all instructions in the tape have emitters. Returns the list of unsupported op names (empty if all are supported).

func Emit ¶

func Emit(op graph.InstructionMeta, inputs []SlotInfo) (string, error)

Emit generates CUDA code for a single instruction. Returns an error if the op is unsupported.

func EmitMegakernel ¶

func EmitMegakernel(cfg MegakernelConfig) (string, error)

EmitMegakernel generates a complete CUDA .cu source string from the compiled instruction tape. Returns an error if any op is unsupported.

The generated kernel uses a flat workspace buffer where each slot occupies a contiguous region. Frozen slots (model weights) are passed as a separate pointer array. A host-callable launch wrapper is emitted for dlopen/dlsym.

func NvccPath ¶

func NvccPath() (string, error)

NvccPath returns the path to the nvcc compiler, or an error if not found.

func Supported ¶

func Supported(opName string) bool

Supported returns true if the op has a registered emitter.

Types ¶

type FrozenSlotMeta ¶

type FrozenSlotMeta struct {
	SlotIdx int
}

FrozenSlotMeta describes a frozen (constant/weight) slot for the emitter.

type MegakernelConfig ¶

type MegakernelConfig struct {
	Instructions []graph.InstructionMeta
	SlotShapes   [][]int
	FrozenSlots  []FrozenSlotMeta
	InputSlots   []int
	OutputSlot   int
	NumKVLayers  int // number of KV cache layers (0 = no KV cache)
}

MegakernelConfig holds all information needed to emit a megakernel .cu file.

type MegakernelRunner ¶

type MegakernelRunner struct {
	// contains filtered or unexported fields
}

MegakernelRunner manages a compiled megakernel .so and its GPU resources.

func LoadMegakernel ¶

func LoadMegakernel(soPath string) (*MegakernelRunner, error)

LoadMegakernel opens a compiled megakernel .so and resolves the launch symbol.

func (*MegakernelRunner) Close ¶

func (r *MegakernelRunner) Close() error

Close releases all GPU resources.

func (*MegakernelRunner) HasKVCache ¶

func (r *MegakernelRunner) HasKVCache() bool

HasKVCache reports whether KV cache pointers have been configured.

func (*MegakernelRunner) Launch ¶

func (r *MegakernelRunner) Launch(inputData []float32, pos int) ([]float32, error)

Launch runs the megakernel with input data and returns the output. When KV cache is configured via SetKVCache, pos is used as both the rotary embedding position and the KV cache sequence position.

func (*MegakernelRunner) OutputShape ¶

func (r *MegakernelRunner) OutputShape() []int

OutputShape returns the shape of the megakernel output slot.

func (*MegakernelRunner) PrepareWorkspace ¶

func (r *MegakernelRunner) PrepareWorkspace(cfg MegakernelConfig, frozenData [][]float32) error

PrepareWorkspace allocates GPU memory for the workspace and frozen slots. frozenData provides the float32 data for each frozen slot, indexed by position in cfg.FrozenSlots (not by slot index).

func (*MegakernelRunner) SetKVCache ¶

func (r *MegakernelRunner) SetKVCache(kvK, kvV unsafe.Pointer)

SetKVCache configures the runner to pass KV cache device pointers to the megakernel. kvK and kvV are GPU arrays of float* pointers, one per layer.

type OpEmitter ¶

type OpEmitter func(op graph.InstructionMeta, inputs []SlotInfo) (string, error)

OpEmitter generates CUDA code for a single instruction. It returns a code fragment that will be inserted into the megakernel body.

type SlotInfo ¶

type SlotInfo struct {
	Shape []int
}

SlotInfo describes a slot's shape for the emitter.

type WorkspaceLayout ¶

type WorkspaceLayout struct {
	SlotOffsets  map[int]int // slot index -> offset in workspace (element count)
	TotalSize    int         // total workspace size in elements
	InputOffset  int         // offset of first input slot
	OutputOffset int         // offset of output slot
	OutputSize   int         // size of output slot in elements
}

WorkspaceLayout describes the memory layout for megakernel slot buffers. Frozen slots are NOT in the workspace -- they have their own pointer array.

func ComputeWorkspaceLayout ¶

func ComputeWorkspaceLayout(cfg MegakernelConfig) WorkspaceLayout

ComputeWorkspaceLayout computes the workspace memory layout for the megakernel. Frozen slots are excluded since they use a separate pointer array. The layout is deterministic (sorted by slot index).

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL