Documentation
¶
Overview ¶
Package lora implements loading and validation of LoRA (Low-Rank Adaptation) adapter weights from GGUF files.
A LoRA adapter contains delta matrices A and B per layer such that W_adapted = W_base + scaleFactor * B @ A, where scaleFactor = alpha / rank. Tensors are identified by naming convention: "lora_a.<layer_name>" and "lora_b.<layer_name>" with rank and alpha stored in GGUF metadata.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Apply ¶ added in v1.29.0
func Apply(baseOutput []float32, x []float32, layer *Layer, scaleFactor float64, batch, inDim, outDim int) []float32
Apply computes the LoRA delta for a single layer and adds it to the base output.
The LoRA formula: output = base_output + scaleFactor * (x @ A^T @ B^T)
This is computed as two sequential small matmuls:
hidden = x @ A^T (shape: [batch, rank]) delta = hidden @ B^T (shape: [batch, outDim]) output = base_output + scaleFactor * delta
x: input activation [batch, inDim] (flattened row-major) baseOutput: output from base model's linear layer [batch, outDim] (flattened, modified in-place) layer: LoRA layer with A [rank, inDim] and B [outDim, rank] scaleFactor: alpha / rank batch: number of batch elements inDim: input dimension outDim: output dimension
Returns the modified baseOutput slice.
func ApplyBatch ¶ added in v1.29.0
func ApplyBatch(baseOutput []float32, x []float32, layer *Layer, scaleFactor float64, batch, inDim, outDim int)
ApplyBatch applies LoRA adaptation to a batch of inputs. x: [batch * inDim] flattened input baseOutput: [batch * outDim] flattened base output (modified in-place)
Types ¶
type Adapter ¶
Adapter holds LoRA delta matrices and scaling parameters loaded from a GGUF file.
func LoadAdapter ¶
func LoadAdapter(path string, r io.ReadSeeker) (*Adapter, error)
LoadAdapter parses a GGUF file containing LoRA adapter weights and returns a validated Adapter. The GGUF file must contain metadata keys "lora.rank" (uint32) and "lora.alpha" (float32), and tensor pairs named "lora_a.<layer_name>" and "lora_b.<layer_name>".
func (*Adapter) HasLayer ¶
HasLayer reports whether the adapter has a LoRA adaptation for the given layer name.
func (*Adapter) LayerNames ¶
LayerNames returns the sorted list of layer names that have LoRA adaptations.
type AdapterCache ¶ added in v1.29.0
type AdapterCache struct {
// contains filtered or unexported fields
}
AdapterCache manages loaded LoRA adapters with LRU eviction. Thread-safe for concurrent access from multiple request goroutines.
func NewAdapterCache ¶ added in v1.29.0
func NewAdapterCache(maxSize int) *AdapterCache
NewAdapterCache creates a cache that holds up to maxSize adapters.
func (*AdapterCache) Evict ¶ added in v1.29.0
func (c *AdapterCache) Evict(name string)
Evict removes a specific adapter from the cache.
func (*AdapterCache) Get ¶ added in v1.29.0
func (c *AdapterCache) Get(name string) *Adapter
Get returns a cached adapter by name. Returns nil if not cached. Moves the entry to most-recently-used position.
func (*AdapterCache) GetOrLoad ¶ added in v1.29.0
func (c *AdapterCache) GetOrLoad(name, path string) (*Adapter, error)
GetOrLoad returns a cached adapter or loads it from path. If loading causes the cache to exceed maxSize, the least-recently-used adapter is evicted.
func (*AdapterCache) Names ¶ added in v1.29.0
func (c *AdapterCache) Names() []string
Names returns the names of all cached adapters in LRU order (oldest first).
func (*AdapterCache) Put ¶ added in v1.29.0
func (c *AdapterCache) Put(name string, adapter *Adapter)
Put inserts an adapter directly into the cache. If the cache is full, the least-recently-used adapter is evicted.
func (*AdapterCache) Size ¶ added in v1.29.0
func (c *AdapterCache) Size() int
Size returns the number of cached adapters.