capacity

package
v0.32.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 24, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Overview

Package capacity is modeld's hardware capacity planner: it resolves the EFFECTIVE context window a model can actually be served at on this device, from the model's KV-cache footprint and the device's free memory — not the model's trained ceiling alone. modeld owns this calculation because it owns the backend process and hardware telemetry; the runtime consumes the resolved value and does not inspect model files.

Index

Constants

View Source
const DefaultHeadroomFrac = 0.1

DefaultHeadroomFrac of free memory is reserved for activations, the compute graph, and fragmentation, leaving the rest for model weights + KV cache.

View Source
const DefaultHostColdFrac = 0.25

DefaultHostColdFrac is the launch-time cap for the host-RAM KV cold store when the user did not set one explicitly.

View Source
const DefaultMaxResidentFrac = 0.8

DefaultMaxResidentFrac caps modeld's resident footprint at this fraction of the device's CURRENTLY free memory when the user did not set an explicit ceiling. It is evaluated fresh on every resolution, so the budget tracks the device live instead of freezing a launch-time view.

Variables

This section is empty.

Functions

func HeadroomFromEnv

func HeadroomFromEnv() float64

HeadroomFromEnv reads CONTENOX_MODELD_MEM_HEADROOM (a fraction in (0,1)), falling back to DefaultHeadroomFrac.

func KVBytesPerToken

func KVBytesPerToken(nLayers, nKVHeads, headDim int, kvType string) int64

KVBytesPerToken is the memory one token of context costs in the KV cache: K and V, across every layer and KV head, at the KV precision.

func ParseBytes added in v0.32.4

func ParseBytes(s string) (int64, error)

ParseBytes parses byte strings used by modeld memory settings.

Types

type DeviceSnapshot added in v0.32.4

type DeviceSnapshot struct {
	Kind              string `json:"kind,omitempty"`
	DeviceID          string `json:"device_id,omitempty"`
	TotalBytes        int64  `json:"total_bytes,omitempty"`
	FreeBytes         int64  `json:"free_bytes,omitempty"`
	SharedWithDisplay bool   `json:"shared_with_display,omitempty"`
}

DeviceSnapshot describes the memory pool the backend will allocate from.

func Snapshot added in v0.32.4

func Snapshot(src MemorySource) (DeviceSnapshot, error)

Snapshot returns a DeviceSnapshot for either a richer source with Snapshot or a legacy FreeBytes-only source.

type MemorySource

type MemorySource interface {
	FreeBytes() (int64, error)
}

MemorySource reports the free memory of the device a backend serves on. modeld picks the source by device: system RAM for CPU; GPU VRAM (ov::Core / ggml) is a CGO seam filled per backend when a GPU device is selected.

type ModelCapacity

type ModelCapacity struct {
	ModelMaxContext         int
	EffectiveContext        int
	MemoryContextTokens     int
	HotContextTokens        int
	PlannerEffectiveContext int
	KVBytesPerToken         int64
	FreeBytes               int64
	WeightsBytes            int64
	OverheadBytes           int64
	ReservedBytes           int64
	UserLimitBytes          int64
	MinFreeBytes            int64
	HostColdBudgetBytes     int64
	UsableBytes             int64
	RequiredBytes           int64
	Clamped                 bool
	Reason                  string
}

ModelCapacity is the resolved result reported to the runtime. EffectiveContext remains the dense context window modeld will actually serve today and the value the cache identity must use. MemoryContextTokens is the raw KV-token budget from memory before model/request clamping. HotContextTokens is the physical hot KV budget. PlannerEffectiveContext is the logical planner window: it equals the dense window when no host cold budget exists, and can grow by the cold KV token budget once host offload is configured.

func Resolve

func Resolve(p Params) ModelCapacity

Resolve computes the dense compatibility window, physical hot context budget, and logical planner window:

usable = min(free - minFree, userLimit - reserved) * (1 - headroom)
effective = clamp(request, 0, min(modelMax, (usable - weights - overhead) / kvBytesPerToken))

Unknown inputs degrade gracefully: with no KV cost it falls back to the model ceiling (clamped by request); with no ceiling it uses the memory budget.

type Params

type Params struct {
	ModelMaxCtx         int     // model's trained context ceiling (0 = unknown)
	KVBytesPerToken     int64   // 0 = unknown (cannot budget by memory)
	WeightsBytes        int64   // resident model weight footprint
	OverheadBytes       int64   // fixed runtime buffers (compute graph, staging)
	FreeBytes           int64   // device free memory
	ReservedBytes       int64   // memory already reserved by resident sessions
	UserLimitBytes      int64   // user cap for modeld resident memory (0 = no cap)
	MinFreeBytes        int64   // memory to leave free for the desktop/other workloads
	HostColdBudgetBytes int64   // host-RAM budget for cold KV blocks (0 = none)
	Request             int     // requested window (0 = use the resolved max)
	HeadroomFrac        float64 // <=0 or >=1 falls back to DefaultHeadroomFrac
}

Params are the inputs to a capacity resolution. Zero values mean "unknown": an unknown ModelMaxCtx or KVBytesPerToken disables that side of the clamp rather than producing a bogus window.

type Policy added in v0.32.4

type Policy struct {
	MaxResidentBytes    int64   `json:"max_resident_bytes,omitempty"`
	MinFreeBytes        int64   `json:"min_free_bytes,omitempty"`
	HostColdBudgetBytes int64   `json:"host_cold_budget_bytes,omitempty"`
	HeadroomFrac        float64 `json:"headroom_frac,omitempty"`
}

Policy is the user/operator memory policy modeld applies before opening a resident session. MaxResidentBytes is a hard ceiling on modeld's resident footprint for the served device; MinFreeBytes preserves memory for the desktop or other local workloads that may share the same device.

func LoadPolicy added in v0.32.4

func LoadPolicy(dataRoot string) Policy

LoadPolicy reads <dataRoot>/modeld.json and then applies env overrides. The JSON accepts either numeric byte fields or string fields ("8GiB", "512MiB"):

{"memory":{"max_resident":"8GiB","reserve_free":"2GiB","headroom_frac":0.15}}

func WithHostColdDefaults added in v0.32.6

func WithHostColdDefaults(p Policy, host DeviceSnapshot) Policy

WithHostColdDefaults fills the host-RAM cold-store budget from a host memory snapshot. It is separate from WithResidentDefault because the hot model budget may come from VRAM while the cold store always lives in host RAM.

func WithResidentDefault added in v0.32.6

func WithResidentDefault(p Policy, dev DeviceSnapshot) Policy

WithResidentDefault fills a missing resident-memory cap from the device's CURRENT free memory. Services call it with a fresh snapshot on every resolution, so the default tracks the device live — it rises when memory frees up and falls when other workloads claim it. An explicit MaxResidentBytes (the user's hard cap) always wins and is left untouched.

type SystemRAM

type SystemRAM struct{}

SystemRAM reports available host RAM via gopsutil — the CPU-device source.

func (SystemRAM) FreeBytes

func (SystemRAM) FreeBytes() (int64, error)

func (SystemRAM) Snapshot added in v0.32.4

func (SystemRAM) Snapshot() (DeviceSnapshot, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL