transmla

package

v1.40.0 Latest Latest Go to latest Published: Apr 2, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package transmla converts standard multi-head attention (MHA) weights into multi-head latent attention (MLA) form via truncated SVD.

The conversion decomposes the concatenated key-value projection matrix W_KV = [W_K; W_V] into three smaller matrices:

wDKV  (d_model × rank)   — shared down-projection
wUK   (rank × d_k)       — key up-projection
wUV   (rank × d_v)       — value up-projection

such that W_KV ≈ [wUK; wUV] · wDKV^T, reducing KV cache size from (d_k + d_v) to rank per token.

Index ¶

func ConvertGGUF(src io.ReadSeeker, dst io.Writer, opts ConvertGGUFOptions) error
func DecomposeKVProjection(wK, wV [][]float64, rank int) (wDKV, wUK, wUV [][]float64, err error)
func ReconstructionError(original [][]float64, wDKV, wUK, wUV [][]float64) float64
type ConvertGGUFOptions
type SVDResult
- func TruncatedSVD(matrix [][]float64, rank int) (*SVDResult, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ConvertGGUF ¶ added in v1.29.0

func ConvertGGUF(src io.ReadSeeker, dst io.Writer, opts ConvertGGUFOptions) error

ConvertGGUF reads a source GGUF file, decomposes K/V projection weights per layer via truncated SVD, and writes a new GGUF with:

All original tensors except k_proj/v_proj weights
Three new tensors per layer: transmla.{layer}.wDKV, wUK, wUV
Metadata: transmla.kv_lora_dim, transmla.source_arch

The source GGUF metadata is preserved in the output.

func DecomposeKVProjection ¶

func DecomposeKVProjection(wK, wV [][]float64, rank int) (wDKV, wUK, wUV [][]float64, err error)

DecomposeKVProjection takes key and value projection weight matrices and decomposes their vertical concatenation [W_K; W_V] via truncated SVD.

W_K is (d_k × d_model) and W_V is (d_v × d_model). The concatenated matrix is ((d_k+d_v) × d_model).

Returns:

wDKV: (d_model × rank) shared down-projection
wUK: (d_k × rank) key up-projection
wUV: (d_v × rank) value up-projection

such that W_K ≈ wUK · wDKV^T and W_V ≈ wUV · wDKV^T.

func ReconstructionError ¶

func ReconstructionError(original [][]float64, wDKV, wUK, wUV [][]float64) float64

ReconstructionError computes the relative Frobenius norm error between the original concatenated [W_K; W_V] matrix and its low-rank reconstruction from wDKV, wUK, wUV: reconstructed = [wUK; wUV] · wDKV^T.

Returns ||original - reconstructed||_F / ||original||_F.

Types ¶

type ConvertGGUFOptions ¶ added in v1.29.0

type ConvertGGUFOptions struct {
	// Rank is the KV LoRA dimension (truncated SVD rank).
	Rank int
	// SourceArch is the original model architecture name (e.g., "llama").
	SourceArch string
	// OnLayerDone is called after each layer's K/V projection is decomposed.
	// Arguments are the layer index (0-based) and total number of layers.
	// May be nil.
	OnLayerDone func(layer, total int)
}

ConvertGGUFOptions configures the TransMLA GGUF conversion.

type SVDResult ¶

type SVDResult struct {
	// U is an m×k matrix of left singular vectors.
	U [][]float64
	// S is a length-k vector of singular values in descending order.
	S []float64
	// Vt is a k×n matrix of right singular vectors (transposed).
	Vt [][]float64
}

SVDResult holds the full or truncated singular value decomposition of a matrix A = U · diag(S) · Vt.

func TruncatedSVD ¶

func TruncatedSVD(matrix [][]float64, rank int) (*SVDResult, error)

TruncatedSVD computes a rank-r truncated SVD of an m×n matrix using the one-sided Jacobi method. The full SVD is computed first, then truncated to the top-r singular values/vectors.

The input matrix is a row-major m×n slice of slices. All rows must have the same length. Rank must be positive and at most min(m, n).

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL