Documentation
¶
Overview ¶
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape. Each primitive op maps to a CUDA device function call that operates on register-resident or shared-memory data.
Index ¶
- func CachedCompile(source, cacheDir, modelName string) (string, error)
- func CheckSupport(instructions []graph.InstructionMeta) []string
- func Emit(op graph.InstructionMeta, inputs []SlotInfo) (string, error)
- func EmitMegakernel(cfg MegakernelConfig) (string, error)
- func NvccPath() (string, error)
- func Supported(opName string) bool
- type FrozenSlotMeta
- type MegakernelConfig
- type MegakernelRunner
- func (r *MegakernelRunner) Close() error
- func (r *MegakernelRunner) HasKVCache() bool
- func (r *MegakernelRunner) Launch(inputData []float32, pos int) ([]float32, error)
- func (r *MegakernelRunner) OutputShape() []int
- func (r *MegakernelRunner) PrepareWorkspace(cfg MegakernelConfig, frozenData [][]float32) error
- func (r *MegakernelRunner) SetKVCache(kvK, kvV unsafe.Pointer)
- type OpEmitter
- type SlotInfo
- type WorkspaceLayout
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CachedCompile ¶
CachedCompile compiles a CUDA source string to a shared library (.so), caching the result alongside the model. If the cached .so exists and the source hash matches, the cached version is returned immediately.
Parameters:
- source: the CUDA source code string
- cacheDir: directory for the cached .so and hash file
- modelName: base name for the output files
Returns the path to the compiled .so file.
func CheckSupport ¶
func CheckSupport(instructions []graph.InstructionMeta) []string
CheckSupport verifies that all instructions in the tape have emitters. Returns the list of unsupported op names (empty if all are supported).
func Emit ¶
func Emit(op graph.InstructionMeta, inputs []SlotInfo) (string, error)
Emit generates CUDA code for a single instruction. Returns an error if the op is unsupported.
func EmitMegakernel ¶
func EmitMegakernel(cfg MegakernelConfig) (string, error)
EmitMegakernel generates a complete CUDA .cu source string from the compiled instruction tape. Returns an error if any op is unsupported.
The generated kernel uses a flat workspace buffer where each slot occupies a contiguous region. Frozen slots (model weights) are passed as a separate pointer array. A host-callable launch wrapper is emitted for dlopen/dlsym.
Types ¶
type FrozenSlotMeta ¶
type FrozenSlotMeta struct {
SlotIdx int
}
FrozenSlotMeta describes a frozen (constant/weight) slot for the emitter.
type MegakernelConfig ¶
type MegakernelConfig struct {
Instructions []graph.InstructionMeta
SlotShapes [][]int
FrozenSlots []FrozenSlotMeta
InputSlots []int
OutputSlot int
NumKVLayers int // number of KV cache layers (0 = no KV cache)
}
MegakernelConfig holds all information needed to emit a megakernel .cu file.
type MegakernelRunner ¶
type MegakernelRunner struct {
// contains filtered or unexported fields
}
MegakernelRunner manages a compiled megakernel .so and its GPU resources.
func LoadMegakernel ¶
func LoadMegakernel(soPath string) (*MegakernelRunner, error)
LoadMegakernel opens a compiled megakernel .so and resolves the launch symbol.
func (*MegakernelRunner) Close ¶
func (r *MegakernelRunner) Close() error
Close releases all GPU resources.
func (*MegakernelRunner) HasKVCache ¶
func (r *MegakernelRunner) HasKVCache() bool
HasKVCache reports whether KV cache pointers have been configured.
func (*MegakernelRunner) Launch ¶
func (r *MegakernelRunner) Launch(inputData []float32, pos int) ([]float32, error)
Launch runs the megakernel with input data and returns the output. When KV cache is configured via SetKVCache, pos is used as both the rotary embedding position and the KV cache sequence position.
func (*MegakernelRunner) OutputShape ¶
func (r *MegakernelRunner) OutputShape() []int
OutputShape returns the shape of the megakernel output slot.
func (*MegakernelRunner) PrepareWorkspace ¶
func (r *MegakernelRunner) PrepareWorkspace(cfg MegakernelConfig, frozenData [][]float32) error
PrepareWorkspace allocates GPU memory for the workspace and frozen slots. frozenData provides the float32 data for each frozen slot, indexed by position in cfg.FrozenSlots (not by slot index).
func (*MegakernelRunner) SetKVCache ¶
func (r *MegakernelRunner) SetKVCache(kvK, kvV unsafe.Pointer)
SetKVCache configures the runner to pass KV cache device pointers to the megakernel. kvK and kvV are GPU arrays of float* pointers, one per layer.
type OpEmitter ¶
type OpEmitter func(op graph.InstructionMeta, inputs []SlotInfo) (string, error)
OpEmitter generates CUDA code for a single instruction. It returns a code fragment that will be inserted into the megakernel body.
type SlotInfo ¶
type SlotInfo struct {
Shape []int
}
SlotInfo describes a slot's shape for the emitter.
type WorkspaceLayout ¶
type WorkspaceLayout struct {
SlotOffsets map[int]int // slot index -> offset in workspace (element count)
TotalSize int // total workspace size in elements
InputOffset int // offset of first input slot
OutputOffset int // offset of output slot
OutputSize int // size of output slot in elements
}
WorkspaceLayout describes the memory layout for megakernel slot buffers. Frozen slots are NOT in the workspace -- they have their own pointer array.
func ComputeWorkspaceLayout ¶
func ComputeWorkspaceLayout(cfg MegakernelConfig) WorkspaceLayout
ComputeWorkspaceLayout computes the workspace memory layout for the megakernel. Frozen slots are excluded since they use a separate pointer array. The layout is deterministic (sorted by slot index).