cloud

package
v1.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 21, 2026 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Overview

Package cloud provides multi-tenant namespace isolation for the serving layer.

Stability: alpha

Package cloud provides multi-tenant namespace isolation for the serving layer.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BillingMiddleware

func BillingMiddleware(recorder UsageRecorder) func(http.Handler) http.Handler

BillingMiddleware returns an HTTP middleware that meters prompt and completion tokens per request and publishes usage events to the given recorder. It expects the TenantRegistry middleware to run first so that TenantFromContext returns a valid tenant. The tenant ID is taken from the apiKeyHeader value.

Types

type ModelInfo

type ModelInfo struct {
	ModelID   string
	VRAMBytes uint64
	LoadedAt  time.Time
	LastUsed  time.Time
}

ModelInfo describes a loaded model tracked by the ResourceManager.

type NDJSONRecorder

type NDJSONRecorder struct {
	// contains filtered or unexported fields
}

NDJSONRecorder writes usage events as newline-delimited JSON to an io.Writer.

func NewNDJSONRecorder

func NewNDJSONRecorder(w io.Writer) *NDJSONRecorder

NewNDJSONRecorder creates a recorder that writes NDJSON to w.

func (*NDJSONRecorder) Record

func (r *NDJSONRecorder) Record(event UsageEvent) error

Record serializes the event as a single JSON line followed by a newline.

type ResourceManager

type ResourceManager struct {
	// contains filtered or unexported fields
}

ResourceManager tracks loaded models and their VRAM usage, evicting least-recently-used models when a new load would exceed the memory budget.

func NewResourceManager

func NewResourceManager(budgetBytes uint64) (*ResourceManager, error)

NewResourceManager creates a ResourceManager with the given VRAM budget in bytes.

func (*ResourceManager) Evict

func (rm *ResourceManager) Evict(modelID string) error

Evict explicitly removes a model from the manager.

func (*ResourceManager) Load

func (rm *ResourceManager) Load(modelID string, vramBytes uint64) error

Load registers a model with the given VRAM footprint. If loading would exceed the budget, LRU models are evicted until there is enough space. Returns an error if the model alone exceeds the entire budget.

func (*ResourceManager) LoadedModels

func (rm *ResourceManager) LoadedModels() []ModelInfo

LoadedModels returns a snapshot of all currently loaded models.

func (*ResourceManager) SetEvictCallback

func (rm *ResourceManager) SetEvictCallback(fn func(modelID string))

SetEvictCallback sets an optional function called when a model is evicted.

func (*ResourceManager) Stats

func (rm *ResourceManager) Stats() (used, budget uint64, loaded int)

Stats returns the current memory usage statistics.

func (*ResourceManager) Touch

func (rm *ResourceManager) Touch(modelID string) error

Touch updates the last-used time for a model, moving it to the front of the LRU list. Call this on each inference request.

type Tenant

type Tenant struct {
	Config TenantConfig
	// contains filtered or unexported fields
}

Tenant represents a registered tenant with its quota state.

func TenantFromContext

func TenantFromContext(ctx context.Context) *Tenant

TenantFromContext extracts the Tenant from the request context.

func (*Tenant) ConsumeTokens

func (t *Tenant) ConsumeTokens(n int64) bool

ConsumeTokens attempts to consume n tokens from the tenant's per-minute budget. It returns true if the tokens were consumed, false if the budget is exhausted.

func (*Tenant) ModelAllowed

func (t *Tenant) ModelAllowed(model string) bool

ModelAllowed returns true if the model is in the tenant's allow list. An empty allow list permits all models.

type TenantConfig

type TenantConfig struct {
	MaxConcurrentRequests int      `json:"max_concurrent_requests"`
	MaxTokensPerMinute    int64    `json:"max_tokens_per_minute"`
	ModelAllowList        []string `json:"model_allow_list,omitempty"`
}

TenantConfig holds per-tenant quota configuration.

func (TenantConfig) Validate

func (c TenantConfig) Validate() error

Validate checks that the configuration has valid values.

type TenantRegistry

type TenantRegistry struct {
	// contains filtered or unexported fields
}

TenantRegistry manages per-API-key tenant registrations and quotas.

func NewTenantRegistry

func NewTenantRegistry() *TenantRegistry

NewTenantRegistry creates a new empty registry.

func (*TenantRegistry) Get

func (r *TenantRegistry) Get(apiKey string) (*Tenant, error)

Get retrieves the tenant for the given API key.

func (*TenantRegistry) Middleware

func (r *TenantRegistry) Middleware(next http.Handler) http.Handler

Middleware returns an HTTP middleware that enforces tenant isolation. It extracts the API key from the Authorization header (Bearer <key>), enforces concurrency limits, token rate limits, and injects the Tenant into the request context.

func (*TenantRegistry) Register

func (r *TenantRegistry) Register(apiKey string, cfg TenantConfig) error

Register adds a tenant with the given API key and configuration.

func (*TenantRegistry) Remove

func (r *TenantRegistry) Remove(apiKey string) error

Remove deletes a tenant registration.

type UsageEvent

type UsageEvent struct {
	TenantID         string `json:"tenant_id"`
	Model            string `json:"model"`
	PromptTokens     int    `json:"prompt_tokens"`
	CompletionTokens int    `json:"completion_tokens"`
	Timestamp        int64  `json:"timestamp"`
}

UsageEvent records token consumption for a single request.

type UsageRecorder

type UsageRecorder interface {
	Record(event UsageEvent) error
}

UsageRecorder defines the interface for recording usage events. The default implementation writes NDJSON; a Kafka adapter can implement this interface for production deployments.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL