transmla

package
v1.40.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Overview

Package transmla converts standard multi-head attention (MHA) weights into multi-head latent attention (MLA) form via truncated SVD.

The conversion decomposes the concatenated key-value projection matrix W_KV = [W_K; W_V] into three smaller matrices:

wDKV  (d_model × rank)   — shared down-projection
wUK   (rank × d_k)       — key up-projection
wUV   (rank × d_v)       — value up-projection

such that W_KV ≈ [wUK; wUV] · wDKV^T, reducing KV cache size from (d_k + d_v) to rank per token.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ConvertGGUF added in v1.29.0

func ConvertGGUF(src io.ReadSeeker, dst io.Writer, opts ConvertGGUFOptions) error

ConvertGGUF reads a source GGUF file, decomposes K/V projection weights per layer via truncated SVD, and writes a new GGUF with:

  • All original tensors except k_proj/v_proj weights
  • Three new tensors per layer: transmla.{layer}.wDKV, wUK, wUV
  • Metadata: transmla.kv_lora_dim, transmla.source_arch

The source GGUF metadata is preserved in the output.

func DecomposeKVProjection

func DecomposeKVProjection(wK, wV [][]float64, rank int) (wDKV, wUK, wUV [][]float64, err error)

DecomposeKVProjection takes key and value projection weight matrices and decomposes their vertical concatenation [W_K; W_V] via truncated SVD.

W_K is (d_k × d_model) and W_V is (d_v × d_model). The concatenated matrix is ((d_k+d_v) × d_model).

Returns:

  • wDKV: (d_model × rank) shared down-projection
  • wUK: (d_k × rank) key up-projection
  • wUV: (d_v × rank) value up-projection

such that W_K ≈ wUK · wDKV^T and W_V ≈ wUV · wDKV^T.

func ReconstructionError

func ReconstructionError(original [][]float64, wDKV, wUK, wUV [][]float64) float64

ReconstructionError computes the relative Frobenius norm error between the original concatenated [W_K; W_V] matrix and its low-rank reconstruction from wDKV, wUK, wUV: reconstructed = [wUK; wUV] · wDKV^T.

Returns ||original - reconstructed||_F / ||original||_F.

Types

type ConvertGGUFOptions added in v1.29.0

type ConvertGGUFOptions struct {
	// Rank is the KV LoRA dimension (truncated SVD rank).
	Rank int
	// SourceArch is the original model architecture name (e.g., "llama").
	SourceArch string
	// OnLayerDone is called after each layer's K/V projection is decomposed.
	// Arguments are the layer index (0-based) and total number of layers.
	// May be nil.
	OnLayerDone func(layer, total int)
}

ConvertGGUFOptions configures the TransMLA GGUF conversion.

type SVDResult

type SVDResult struct {
	// U is an m×k matrix of left singular vectors.
	U [][]float64
	// S is a length-k vector of singular values in descending order.
	S []float64
	// Vt is a k×n matrix of right singular vectors (transposed).
	Vt [][]float64
}

SVDResult holds the full or truncated singular value decomposition of a matrix A = U · diag(S) · Vt.

func TruncatedSVD

func TruncatedSVD(matrix [][]float64, rank int) (*SVDResult, error)

TruncatedSVD computes a rank-r truncated SVD of an m×n matrix using the one-sided Jacobi method. The full SVD is computed first, then truncated to the top-r singular values/vectors.

The input matrix is a row-major m×n slice of slices. All rows must have the same length. Rank must be positive and at most min(m, n).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL