lora

package

v1.32.0 Latest Latest Go to latest Published: Mar 28, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package lora implements loading and validation of LoRA (Low-Rank Adaptation) adapter weights from GGUF files.

A LoRA adapter contains delta matrices A and B per layer such that W_adapted = W_base + scaleFactor * B @ A, where scaleFactor = alpha / rank. Tensors are identified by naming convention: "lora_a.<layer_name>" and "lora_b.<layer_name>" with rank and alpha stored in GGUF metadata.

Index ¶

func Apply(baseOutput []float32, x []float32, layer *Layer, scaleFactor float64, ...) []float32
func ApplyBatch(baseOutput []float32, x []float32, layer *Layer, scaleFactor float64, ...)
type Adapter
- func LoadAdapter(path string, r io.ReadSeeker) (*Adapter, error)
- func (a *Adapter) HasLayer(name string) bool
- func (a *Adapter) LayerNames() []string
type AdapterCache
- func NewAdapterCache(maxSize int) *AdapterCache
type Layer

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Apply ¶ added in v1.29.0

func Apply(baseOutput []float32, x []float32, layer *Layer, scaleFactor float64, batch, inDim, outDim int) []float32

Apply computes the LoRA delta for a single layer and adds it to the base output.

The LoRA formula: output = base_output + scaleFactor * (x @ A^T @ B^T)

This is computed as two sequential small matmuls:

hidden = x @ A^T          (shape: [batch, rank])
delta  = hidden @ B^T     (shape: [batch, outDim])
output = base_output + scaleFactor * delta

x: input activation [batch, inDim] (flattened row-major) baseOutput: output from base model's linear layer [batch, outDim] (flattened, modified in-place) layer: LoRA layer with A [rank, inDim] and B [outDim, rank] scaleFactor: alpha / rank batch: number of batch elements inDim: input dimension outDim: output dimension

Returns the modified baseOutput slice.

func ApplyBatch ¶ added in v1.29.0

func ApplyBatch(baseOutput []float32, x []float32, layer *Layer, scaleFactor float64, batch, inDim, outDim int)

ApplyBatch applies LoRA adaptation to a batch of inputs. x: [batch * inDim] flattened input baseOutput: [batch * outDim] flattened base output (modified in-place)

Types ¶

type Adapter ¶

type Adapter struct {
	Rank        int
	Alpha       float64
	ScaleFactor float64
	Layers      map[string]*Layer
}

Adapter holds LoRA delta matrices and scaling parameters loaded from a GGUF file.

func LoadAdapter ¶

func LoadAdapter(path string, r io.ReadSeeker) (*Adapter, error)

LoadAdapter parses a GGUF file containing LoRA adapter weights and returns a validated Adapter. The GGUF file must contain metadata keys "lora.rank" (uint32) and "lora.alpha" (float32), and tensor pairs named "lora_a.<layer_name>" and "lora_b.<layer_name>".

func (*Adapter) HasLayer ¶

func (a *Adapter) HasLayer(name string) bool

HasLayer reports whether the adapter has a LoRA adaptation for the given layer name.

func (*Adapter) LayerNames ¶

func (a *Adapter) LayerNames() []string

LayerNames returns the sorted list of layer names that have LoRA adaptations.

type AdapterCache ¶ added in v1.29.0

type AdapterCache struct {
	// contains filtered or unexported fields
}

AdapterCache manages loaded LoRA adapters with LRU eviction. Thread-safe for concurrent access from multiple request goroutines.

func NewAdapterCache ¶ added in v1.29.0

func NewAdapterCache(maxSize int) *AdapterCache

NewAdapterCache creates a cache that holds up to maxSize adapters.

func (*AdapterCache) Evict ¶ added in v1.29.0

func (c *AdapterCache) Evict(name string)

Evict removes a specific adapter from the cache.

func (*AdapterCache) Get ¶ added in v1.29.0

func (c *AdapterCache) Get(name string) *Adapter

Get returns a cached adapter by name. Returns nil if not cached. Moves the entry to most-recently-used position.

func (*AdapterCache) GetOrLoad ¶ added in v1.29.0

func (c *AdapterCache) GetOrLoad(name, path string) (*Adapter, error)

GetOrLoad returns a cached adapter or loads it from path. If loading causes the cache to exceed maxSize, the least-recently-used adapter is evicted.

func (*AdapterCache) Names ¶ added in v1.29.0

func (c *AdapterCache) Names() []string

Names returns the names of all cached adapters in LRU order (oldest first).

func (*AdapterCache) Put ¶ added in v1.29.0

func (c *AdapterCache) Put(name string, adapter *Adapter)

Put inserts an adapter directly into the cache. If the cache is full, the least-recently-used adapter is evicted.

func (*AdapterCache) Size ¶ added in v1.29.0

func (c *AdapterCache) Size() int

Size returns the number of cached adapters.

type Layer ¶

type Layer struct {
	A          [][]float32
	B          [][]float32
	TargetName string
}

Layer holds the A and B delta matrices for a single adapted layer. A has shape [rank, inDim], B has shape [outDim, rank]. The adapted weight is: W_adapted = W_base + scaleFactor * B @ A.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL