codegen

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Documentation

Overview

Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape. Each primitive op maps to a CUDA device function call that operates on register-resident or shared-memory data.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CachedCompile

func CachedCompile(source, cacheDir, modelName string) (string, error)

CachedCompile compiles a CUDA source string to a shared library (.so), caching the result alongside the model. If the cached .so exists and the source hash matches, the cached version is returned immediately.

Parameters:

  • source: the CUDA source code string
  • cacheDir: directory for the cached .so and hash file
  • modelName: base name for the output files

Returns the path to the compiled .so file.

func CheckSupport

func CheckSupport(instructions []graph.InstructionMeta) []string

CheckSupport verifies that all instructions in the tape have emitters. Returns the list of unsupported op names (empty if all are supported).

func Emit

func Emit(op graph.InstructionMeta, inputs []SlotInfo) (string, error)

Emit generates CUDA code for a single instruction. Returns an error if the op is unsupported.

func EmitMegakernel

func EmitMegakernel(cfg MegakernelConfig) (string, error)

EmitMegakernel generates a complete CUDA .cu source string from the compiled instruction tape. Returns an error if any op is unsupported.

The generated kernel uses a flat workspace buffer where each slot occupies a contiguous region. Frozen slots (model weights) are passed as a separate pointer array. A host-callable launch wrapper is emitted for dlopen/dlsym.

func NvccPath

func NvccPath() (string, error)

NvccPath returns the path to the nvcc compiler, or an error if not found.

func Supported

func Supported(opName string) bool

Supported returns true if the op has a registered emitter.

Types

type FrozenSlotMeta

type FrozenSlotMeta struct {
	SlotIdx int
}

FrozenSlotMeta describes a frozen (constant/weight) slot for the emitter.

type MegakernelConfig

type MegakernelConfig struct {
	Instructions []graph.InstructionMeta
	SlotShapes   [][]int
	FrozenSlots  []FrozenSlotMeta
	InputSlots   []int
	OutputSlot   int
	NumKVLayers  int // number of KV cache layers (0 = no KV cache)
}

MegakernelConfig holds all information needed to emit a megakernel .cu file.

type MegakernelRunner

type MegakernelRunner struct {
	// contains filtered or unexported fields
}

MegakernelRunner manages a compiled megakernel .so and its GPU resources.

func LoadMegakernel

func LoadMegakernel(soPath string) (*MegakernelRunner, error)

LoadMegakernel opens a compiled megakernel .so and resolves the launch symbol.

func (*MegakernelRunner) Close

func (r *MegakernelRunner) Close() error

Close releases all GPU resources.

func (*MegakernelRunner) HasKVCache

func (r *MegakernelRunner) HasKVCache() bool

HasKVCache reports whether KV cache pointers have been configured.

func (*MegakernelRunner) Launch

func (r *MegakernelRunner) Launch(inputData []float32, pos int) ([]float32, error)

Launch runs the megakernel with input data and returns the output. When KV cache is configured via SetKVCache, pos is used as both the rotary embedding position and the KV cache sequence position.

func (*MegakernelRunner) OutputShape

func (r *MegakernelRunner) OutputShape() []int

OutputShape returns the shape of the megakernel output slot.

func (*MegakernelRunner) PrepareWorkspace

func (r *MegakernelRunner) PrepareWorkspace(cfg MegakernelConfig, frozenData [][]float32) error

PrepareWorkspace allocates GPU memory for the workspace and frozen slots. frozenData provides the float32 data for each frozen slot, indexed by position in cfg.FrozenSlots (not by slot index).

func (*MegakernelRunner) SetKVCache

func (r *MegakernelRunner) SetKVCache(kvK, kvV unsafe.Pointer)

SetKVCache configures the runner to pass KV cache device pointers to the megakernel. kvK and kvV are GPU arrays of float* pointers, one per layer.

type OpEmitter

type OpEmitter func(op graph.InstructionMeta, inputs []SlotInfo) (string, error)

OpEmitter generates CUDA code for a single instruction. It returns a code fragment that will be inserted into the megakernel body.

type SlotInfo

type SlotInfo struct {
	Shape []int
}

SlotInfo describes a slot's shape for the emitter.

type WorkspaceLayout

type WorkspaceLayout struct {
	SlotOffsets  map[int]int // slot index -> offset in workspace (element count)
	TotalSize    int         // total workspace size in elements
	InputOffset  int         // offset of first input slot
	OutputOffset int         // offset of output slot
	OutputSize   int         // size of output slot in elements
}

WorkspaceLayout describes the memory layout for megakernel slot buffers. Frozen slots are NOT in the workspace -- they have their own pointer array.

func ComputeWorkspaceLayout

func ComputeWorkspaceLayout(cfg MegakernelConfig) WorkspaceLayout

ComputeWorkspaceLayout computes the workspace memory layout for the megakernel. Frozen slots are excluded since they use a separate pointer array. The layout is deterministic (sorted by slot index).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL