kv

package

v0.3.0 Latest Latest Go to latest Published: Mar 21, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/ztensor

Links

Open Source Insights

Documentation ¶

Overview ¶

Package kv provides paged-attention KV cache primitives for transformer inference. The BlockPool pre-allocates fixed-size blocks of key/value storage so that the hot path (Alloc / Free) is allocation-free.

Index ¶

Constants
type Block
type BlockPool
- func NewBlockPool[T tensor.Numeric](numBlocks, numLayers, blockSize, headDim int) (*BlockPool[T], error)
type BlockTable
- func NewBlockTable[T tensor.Numeric](pool *BlockPool[T]) *BlockTable[T]
type PagedKVCache
- func NewPagedKVCache[T tensor.Numeric](table *BlockTable[T]) *PagedKVCache[T]
type RadixNode
type RadixTree
- func NewRadixTree[T tensor.Numeric](capacity int) *RadixTree[T]

Constants ¶

View Source

const DefaultBlockSize = 16

DefaultBlockSize is the number of token positions per block.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Block ¶

type Block[T tensor.Numeric] struct {
	K    []T
	V    []T
	Used int // number of token positions written (0..blockSize)
}

Block holds pre-allocated key and value data for a fixed number of token positions across all layers. K and V each have numLayers * blockSize * headDim elements laid out as [layer][position][headDim] in row-major order.

type BlockPool ¶

type BlockPool[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

BlockPool manages a fixed-size pool of pre-allocated KV cache blocks. Blocks are allocated at startup and recycled via Alloc/Free. All methods are safe for concurrent use.

func NewBlockPool ¶

func NewBlockPool[T tensor.Numeric](numBlocks, numLayers, blockSize, headDim int) (*BlockPool[T], error)

NewBlockPool creates a pool with numBlocks pre-allocated blocks. Each block holds K and V data for blockSize token positions across numLayers, with headDim elements per position per layer. blockSize defaults to DefaultBlockSize (16) when zero.

func (*BlockPool[T]) Alloc ¶

func (p *BlockPool[T]) Alloc() (*Block[T], error)

Alloc returns a free block from the pool. Returns an error if the pool is exhausted. The returned block has Used reset to 0.

func (*BlockPool[T]) Available ¶

func (p *BlockPool[T]) Available() int

Available returns the number of free blocks.

func (*BlockPool[T]) BlockSize ¶

func (p *BlockPool[T]) BlockSize() int

BlockSize returns the number of token positions per block.

func (*BlockPool[T]) Cap ¶

func (p *BlockPool[T]) Cap() int

Cap returns the total number of blocks in the pool.

func (*BlockPool[T]) FragmentationRatio ¶

func (p *BlockPool[T]) FragmentationRatio() float64

FragmentationRatio returns the fraction of allocated blocks that are partially used (0 < Used < blockSize). A ratio of 0.0 means all allocated blocks are either empty or full; 1.0 means every allocated block is partially filled. Returns 0.0 when no blocks are allocated.

func (*BlockPool[T]) Free ¶

func (p *BlockPool[T]) Free(b *Block[T])

Free returns a block to the pool. The block's Used counter is reset to 0.

func (*BlockPool[T]) HeadDim ¶

func (p *BlockPool[T]) HeadDim() int

HeadDim returns the head dimension per position per layer.

func (*BlockPool[T]) NumLayers ¶

func (p *BlockPool[T]) NumLayers() int

NumLayers returns the number of layers per block.

type BlockTable ¶

type BlockTable[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

BlockTable maps a single sequence's logical token positions to physical blocks allocated from a BlockPool. It grows automatically as tokens are appended and returns all blocks to the pool on Free.

func NewBlockTable ¶

func NewBlockTable[T tensor.Numeric](pool *BlockPool[T]) *BlockTable[T]

NewBlockTable creates a BlockTable backed by the given pool.

func (*BlockTable[T]) Append ¶

func (bt *BlockTable[T]) Append(tokenCount int) error

Append adds tokenCount new token positions to the table, allocating additional blocks from the pool as needed.

func (*BlockTable[T]) BlockCount ¶

func (bt *BlockTable[T]) BlockCount() int

BlockCount returns the number of blocks currently held.

func (*BlockTable[T]) Blocks ¶

func (bt *BlockTable[T]) Blocks() []*Block[T]

Blocks returns the ordered slice of physical blocks held by this table.

func (*BlockTable[T]) Free ¶

func (bt *BlockTable[T]) Free()

Free returns all blocks to the pool and resets the table.

func (*BlockTable[T]) Lookup ¶

func (bt *BlockTable[T]) Lookup(logicalPos int) (*Block[T], int, error)

Lookup returns the physical block and offset within that block for a logical token position.

func (*BlockTable[T]) Pool ¶

func (bt *BlockTable[T]) Pool() *BlockPool[T]

Pool returns the BlockPool backing this table.

func (*BlockTable[T]) TokenCount ¶

func (bt *BlockTable[T]) TokenCount() int

TokenCount returns the total number of token positions tracked.

type PagedKVCache ¶

type PagedKVCache[T tensor.Numeric] struct {
	Table *BlockTable[T]
}

PagedKVCache bundles a BlockTable and its backing BlockPool into a single value that can be passed to the compute engine's paged GQA method.

func NewPagedKVCache ¶

func NewPagedKVCache[T tensor.Numeric](table *BlockTable[T]) *PagedKVCache[T]

NewPagedKVCache creates a PagedKVCache wrapping the given BlockTable.

func (*PagedKVCache[T]) BlockSize ¶

func (c *PagedKVCache[T]) BlockSize() int

BlockSize returns the number of token positions per block.

func (*PagedKVCache[T]) HeadDim ¶

func (c *PagedKVCache[T]) HeadDim() int

HeadDim returns the head dimension from the backing pool.

func (*PagedKVCache[T]) SeqLen ¶

func (c *PagedKVCache[T]) SeqLen() int

SeqLen returns the number of token positions in the cache.

type RadixNode ¶

type RadixNode[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

RadixNode is a node in the radix tree. Each edge stores a slice of token IDs and a corresponding slice of KV blocks. Children are keyed by the first token ID of their edge.

type RadixTree ¶

type RadixTree[T tensor.Numeric] struct {
	// contains filtered or unexported fields
}

RadixTree stores shared prompt prefixes so that KV blocks can be reused across sequences that share a common prefix. It is safe for concurrent use.

func NewRadixTree ¶

func NewRadixTree[T tensor.Numeric](capacity int) *RadixTree[T]

NewRadixTree creates a radix tree that caches up to capacity KV blocks.

func (*RadixTree[T]) Evict ¶

func (rt *RadixTree[T]) Evict()

Evict removes the least recently used leaf node from the tree.

func (*RadixTree[T]) Insert ¶

func (rt *RadixTree[T]) Insert(tokenIDs []int32, blocks []*Block[T])

Insert adds a token prefix and its associated KV blocks into the tree. If the prefix already exists, the blocks are updated. When the tree exceeds capacity, LRU leaf nodes are evicted.

func (*RadixTree[T]) Match ¶

func (rt *RadixTree[T]) Match(prefix []int32) (matchedBlocks []*Block[T], matchedLen int)

Match returns the KV blocks for the longest matching prefix and the number of tokens matched. If no prefix matches, matchedLen is 0 and blocks is nil.

func (*RadixTree[T]) Size ¶

func (rt *RadixTree[T]) Size() int

Size returns the total number of blocks cached in the tree.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL