kv

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 21, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package kv provides paged-attention KV cache primitives for transformer inference. The BlockPool pre-allocates fixed-size blocks of key/value storage so that the hot path (Alloc / Free) is allocation-free.

Index

Constants

View Source
const DefaultBlockSize = 16

DefaultBlockSize is the number of token positions per block.

Variables

This section is empty.

Functions

This section is empty.

Types

type Block

type Block[T tensor.Numeric] struct {
	K    []T
	V    []T
	Used int // number of token positions written (0..blockSize)
}

Block holds pre-allocated key and value data for a fixed number of token positions across all layers. K and V each have numLayers * blockSize * headDim elements laid out as [layer][position][headDim] in row-major order.

type BlockPool

type BlockPool[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

BlockPool manages a fixed-size pool of pre-allocated KV cache blocks. Blocks are allocated at startup and recycled via Alloc/Free. All methods are safe for concurrent use.

func NewBlockPool

func NewBlockPool[T tensor.Numeric](numBlocks, numLayers, blockSize, headDim int) (*BlockPool[T], error)

NewBlockPool creates a pool with numBlocks pre-allocated blocks. Each block holds K and V data for blockSize token positions across numLayers, with headDim elements per position per layer. blockSize defaults to DefaultBlockSize (16) when zero.

func (*BlockPool[T]) Alloc

func (p *BlockPool[T]) Alloc() (*Block[T], error)

Alloc returns a free block from the pool. Returns an error if the pool is exhausted. The returned block has Used reset to 0.

func (*BlockPool[T]) Available

func (p *BlockPool[T]) Available() int

Available returns the number of free blocks.

func (*BlockPool[T]) BlockSize

func (p *BlockPool[T]) BlockSize() int

BlockSize returns the number of token positions per block.

func (*BlockPool[T]) Cap

func (p *BlockPool[T]) Cap() int

Cap returns the total number of blocks in the pool.

func (*BlockPool[T]) FragmentationRatio

func (p *BlockPool[T]) FragmentationRatio() float64

FragmentationRatio returns the fraction of allocated blocks that are partially used (0 < Used < blockSize). A ratio of 0.0 means all allocated blocks are either empty or full; 1.0 means every allocated block is partially filled. Returns 0.0 when no blocks are allocated.

func (*BlockPool[T]) Free

func (p *BlockPool[T]) Free(b *Block[T])

Free returns a block to the pool. The block's Used counter is reset to 0.

func (*BlockPool[T]) HeadDim

func (p *BlockPool[T]) HeadDim() int

HeadDim returns the head dimension per position per layer.

func (*BlockPool[T]) NumLayers

func (p *BlockPool[T]) NumLayers() int

NumLayers returns the number of layers per block.

type BlockTable

type BlockTable[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

BlockTable maps a single sequence's logical token positions to physical blocks allocated from a BlockPool. It grows automatically as tokens are appended and returns all blocks to the pool on Free.

func NewBlockTable

func NewBlockTable[T tensor.Numeric](pool *BlockPool[T]) *BlockTable[T]

NewBlockTable creates a BlockTable backed by the given pool.

func (*BlockTable[T]) Append

func (bt *BlockTable[T]) Append(tokenCount int) error

Append adds tokenCount new token positions to the table, allocating additional blocks from the pool as needed.

func (*BlockTable[T]) BlockCount

func (bt *BlockTable[T]) BlockCount() int

BlockCount returns the number of blocks currently held.

func (*BlockTable[T]) Blocks

func (bt *BlockTable[T]) Blocks() []*Block[T]

Blocks returns the ordered slice of physical blocks held by this table.

func (*BlockTable[T]) Free

func (bt *BlockTable[T]) Free()

Free returns all blocks to the pool and resets the table.

func (*BlockTable[T]) Lookup

func (bt *BlockTable[T]) Lookup(logicalPos int) (*Block[T], int, error)

Lookup returns the physical block and offset within that block for a logical token position.

func (*BlockTable[T]) Pool

func (bt *BlockTable[T]) Pool() *BlockPool[T]

Pool returns the BlockPool backing this table.

func (*BlockTable[T]) TokenCount

func (bt *BlockTable[T]) TokenCount() int

TokenCount returns the total number of token positions tracked.

type PagedKVCache

type PagedKVCache[T tensor.Numeric] struct {
	Table *BlockTable[T]
}

PagedKVCache bundles a BlockTable and its backing BlockPool into a single value that can be passed to the compute engine's paged GQA method.

func NewPagedKVCache

func NewPagedKVCache[T tensor.Numeric](table *BlockTable[T]) *PagedKVCache[T]

NewPagedKVCache creates a PagedKVCache wrapping the given BlockTable.

func (*PagedKVCache[T]) BlockSize

func (c *PagedKVCache[T]) BlockSize() int

BlockSize returns the number of token positions per block.

func (*PagedKVCache[T]) HeadDim

func (c *PagedKVCache[T]) HeadDim() int

HeadDim returns the head dimension from the backing pool.

func (*PagedKVCache[T]) SeqLen

func (c *PagedKVCache[T]) SeqLen() int

SeqLen returns the number of token positions in the cache.

type RadixNode

type RadixNode[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

RadixNode is a node in the radix tree. Each edge stores a slice of token IDs and a corresponding slice of KV blocks. Children are keyed by the first token ID of their edge.

type RadixTree

type RadixTree[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

RadixTree stores shared prompt prefixes so that KV blocks can be reused across sequences that share a common prefix. It is safe for concurrent use.

func NewRadixTree

func NewRadixTree[T tensor.Numeric](capacity int) *RadixTree[T]

NewRadixTree creates a radix tree that caches up to capacity KV blocks.

func (*RadixTree[T]) Evict

func (rt *RadixTree[T]) Evict()

Evict removes the least recently used leaf node from the tree.

func (*RadixTree[T]) Insert

func (rt *RadixTree[T]) Insert(tokenIDs []int32, blocks []*Block[T])

Insert adds a token prefix and its associated KV blocks into the tree. If the prefix already exists, the blocks are updated. When the tree exceeds capacity, LRU leaf nodes are evicted.

func (*RadixTree[T]) Match

func (rt *RadixTree[T]) Match(prefix []int32) (matchedBlocks []*Block[T], matchedLen int)

Match returns the KV blocks for the longest matching prefix and the number of tokens matched. If no prefix matches, matchedLen is 0 and blocks is nil.

func (*RadixTree[T]) Size

func (rt *RadixTree[T]) Size() int

Size returns the total number of blocks cached in the tree.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL