Documentation
¶
Overview ¶
Package transmla converts standard multi-head attention (MHA) weights into multi-head latent attention (MLA) form via truncated SVD.
The conversion decomposes the concatenated key-value projection matrix W_KV = [W_K; W_V] into three smaller matrices:
wDKV (d_model × rank) — shared down-projection wUK (rank × d_k) — key up-projection wUV (rank × d_v) — value up-projection
such that W_KV ≈ [wUK; wUV] · wDKV^T, reducing KV cache size from (d_k + d_v) to rank per token.
Index ¶
- func ConvertGGUF(src io.ReadSeeker, dst io.Writer, opts ConvertGGUFOptions) error
- func DecomposeKVProjection(wK, wV [][]float64, rank int) (wDKV, wUK, wUV [][]float64, err error)
- func ReconstructionError(original [][]float64, wDKV, wUK, wUV [][]float64) float64
- type ConvertGGUFOptions
- type SVDResult
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ConvertGGUF ¶ added in v1.29.0
func ConvertGGUF(src io.ReadSeeker, dst io.Writer, opts ConvertGGUFOptions) error
ConvertGGUF reads a source GGUF file, decomposes K/V projection weights per layer via truncated SVD, and writes a new GGUF with:
- All original tensors except k_proj/v_proj weights
- Three new tensors per layer: transmla.{layer}.wDKV, wUK, wUV
- Metadata: transmla.kv_lora_dim, transmla.source_arch
The source GGUF metadata is preserved in the output.
func DecomposeKVProjection ¶
DecomposeKVProjection takes key and value projection weight matrices and decomposes their vertical concatenation [W_K; W_V] via truncated SVD.
W_K is (d_k × d_model) and W_V is (d_v × d_model). The concatenated matrix is ((d_k+d_v) × d_model).
Returns:
- wDKV: (d_model × rank) shared down-projection
- wUK: (d_k × rank) key up-projection
- wUV: (d_v × rank) value up-projection
such that W_K ≈ wUK · wDKV^T and W_V ≈ wUV · wDKV^T.
func ReconstructionError ¶
ReconstructionError computes the relative Frobenius norm error between the original concatenated [W_K; W_V] matrix and its low-rank reconstruction from wDKV, wUK, wUV: reconstructed = [wUK; wUV] · wDKV^T.
Returns ||original - reconstructed||_F / ||original||_F.
Types ¶
type ConvertGGUFOptions ¶ added in v1.29.0
type ConvertGGUFOptions struct {
// Rank is the KV LoRA dimension (truncated SVD rank).
Rank int
// SourceArch is the original model architecture name (e.g., "llama").
SourceArch string
// OnLayerDone is called after each layer's K/V projection is decomposed.
// Arguments are the layer index (0-based) and total number of layers.
// May be nil.
OnLayerDone func(layer, total int)
}
ConvertGGUFOptions configures the TransMLA GGUF conversion.
type SVDResult ¶
type SVDResult struct {
// U is an m×k matrix of left singular vectors.
U [][]float64
// S is a length-k vector of singular values in descending order.
S []float64
// Vt is a k×n matrix of right singular vectors (transposed).
Vt [][]float64
}
SVDResult holds the full or truncated singular value decomposition of a matrix A = U · diag(S) · Vt.
func TruncatedSVD ¶
TruncatedSVD computes a rank-r truncated SVD of an m×n matrix using the one-sided Jacobi method. The full SVD is computed first, then truncated to the top-r singular values/vectors.
The input matrix is a row-major m×n slice of slices. All rows must have the same length. Rank must be positive and at most min(m, n).