cloud

package
v1.38.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2026 License: Apache-2.0 Imports: 24 Imported by: 0

Documentation

Overview

Package cloud provides a multi-tenant managed inference service for Zerfoo.

It wraps the serve.Server with tenant isolation, token-based billing, rate limiting, and health checking for cloud deployments.

Stability: alpha

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BillingMiddleware added in v1.18.0

func BillingMiddleware(recorder UsageRecorder) func(http.Handler) http.Handler

BillingMiddleware returns an HTTP middleware that meters prompt and completion tokens per request and publishes usage events to the given recorder. It expects the tenant authentication middleware to run first so that tenantFromContext returns a valid tenant. The tenant ID is taken from the Authorization header's Bearer token value.

For streaming (SSE) responses, the JSON response body cannot be parsed as a single object. The middleware injects a generate.TokenUsage into the request context; the generation session writes prompt/completion counts there, which works for both streaming and non-streaming responses. JSON body parsing is used as a fallback for handlers that do not use context-based usage tracking.

Types

type AuditAction

type AuditAction string

AuditAction identifies the type of API operation being logged.

const (
	AuditActionInference AuditAction = "inference"
	AuditActionCreate    AuditAction = "create"
	AuditActionUpdate    AuditAction = "update"
	AuditActionDelete    AuditAction = "delete"
	AuditActionList      AuditAction = "list"
	AuditActionAuth      AuditAction = "auth"
)

type AuditEntry

type AuditEntry struct {
	Timestamp  time.Time   `json:"timestamp"`
	TenantID   string      `json:"tenant_id"`
	Action     AuditAction `json:"action"`
	Result     AuditResult `json:"result"`
	Resource   string      `json:"resource"`
	StatusCode int         `json:"status_code"`
	Method     string      `json:"method"`
	Path       string      `json:"path"`
	RemoteAddr string      `json:"remote_addr"`
}

AuditEntry records a single auditable event for SOC 2 compliance. Sensitive data (API keys, request bodies) is never stored.

type AuditLogger

type AuditLogger struct {
	// contains filtered or unexported fields
}

AuditLogger records API requests for SOC 2 compliance. It deliberately omits sensitive fields (API keys, request/response bodies).

func NewAuditLogger

func NewAuditLogger(store AuditStore) *AuditLogger

NewAuditLogger creates an AuditLogger backed by the given store.

func (*AuditLogger) Log

func (a *AuditLogger) Log(entry AuditEntry) error

Log records an audit entry.

func (*AuditLogger) Query

func (a *AuditLogger) Query(tenantID string, from, to time.Time) ([]AuditEntry, error)

Query returns audit entries for a tenant within the given time range.

type AuditResult

type AuditResult string

AuditResult records the outcome of an API request.

const (
	AuditResultSuccess      AuditResult = "success"
	AuditResultDenied       AuditResult = "denied"
	AuditResultRateLimited  AuditResult = "rate_limited"
	AuditResultError        AuditResult = "error"
	AuditResultUnauthorized AuditResult = "unauthorized"
)

type AuditStore

type AuditStore interface {
	// Append persists an audit entry.
	Append(entry AuditEntry) error

	// Query returns audit entries for a tenant within the given time range.
	Query(tenantID string, from, to time.Time) ([]AuditEntry, error)
}

AuditStore is the persistence interface for audit entries.

type BboltTenantStoreBackend added in v1.18.0

type BboltTenantStoreBackend struct {
	// contains filtered or unexported fields
}

BboltTenantStoreBackend is a persistent TenantStoreBackend backed by a bbolt database.

func NewBboltTenantStoreBackend added in v1.18.0

func NewBboltTenantStoreBackend(path string) (*BboltTenantStoreBackend, error)

NewBboltTenantStoreBackend opens or creates a bbolt database at path and returns a backend ready for use with NewTenantManager(WithTenantBackend(...)).

func (*BboltTenantStoreBackend) Close added in v1.18.0

func (b *BboltTenantStoreBackend) Close() error

Close closes the underlying bbolt database.

func (*BboltTenantStoreBackend) DeleteTenant added in v1.18.0

func (b *BboltTenantStoreBackend) DeleteTenant(id string) error

DeleteTenant removes a tenant by ID.

func (*BboltTenantStoreBackend) ListTenants added in v1.18.0

func (b *BboltTenantStoreBackend) ListTenants() []TenantConfig

ListTenants returns all stored tenant configurations.

func (*BboltTenantStoreBackend) LoadTenant added in v1.18.0

func (b *BboltTenantStoreBackend) LoadTenant(id string) (*TenantConfig, bool)

LoadTenant retrieves a tenant configuration by ID.

func (*BboltTenantStoreBackend) SaveTenant added in v1.18.0

func (b *BboltTenantStoreBackend) SaveTenant(id string, cfg TenantConfig) error

SaveTenant persists a tenant configuration as JSON keyed by its ID.

type BillingRecord

type BillingRecord struct {
	TenantID     string    `json:"tenant_id"`
	InputTokens  int       `json:"input_tokens"`
	OutputTokens int       `json:"output_tokens"`
	Timestamp    time.Time `json:"timestamp"`
}

BillingRecord captures token usage for a single inference request.

type BillingStore

type BillingStore interface {
	// Store persists a billing record.
	Store(record BillingRecord) error

	// Query returns all billing records for a tenant within the given time range.
	Query(tenantID string, from, to time.Time) ([]BillingRecord, error)
}

BillingStore is the persistence interface for billing records.

type CloudServer

type CloudServer struct {
	// contains filtered or unexported fields
}

CloudServer wraps an HTTP handler with multi-tenant isolation, token billing, rate limiting, and health checking for cloud deployments.

func NewCloudServer

func NewCloudServer(handler http.Handler, tenants *TenantManager, meter *TokenMeter) *CloudServer

NewCloudServer creates a CloudServer that routes authenticated requests to the given handler through tenant isolation middleware.

func (*CloudServer) Handler

func (cs *CloudServer) Handler() http.Handler

Handler returns the root HTTP handler with all middleware applied.

func (*CloudServer) Meter

func (cs *CloudServer) Meter() *TokenMeter

Meter returns the TokenMeter for external billing queries.

func (*CloudServer) SetHealthy

func (cs *CloudServer) SetHealthy(healthy bool)

SetHealthy sets the health status of the cloud server.

func (*CloudServer) Tenants

func (cs *CloudServer) Tenants() *TenantManager

Tenants returns the TenantManager for external CRUD operations.

type MemoryAuditStore

type MemoryAuditStore struct {
	// contains filtered or unexported fields
}

MemoryAuditStore is an in-memory AuditStore for testing and development.

func NewMemoryAuditStore

func NewMemoryAuditStore() *MemoryAuditStore

NewMemoryAuditStore creates a new in-memory audit store.

func (*MemoryAuditStore) All

func (s *MemoryAuditStore) All() []AuditEntry

All returns a copy of all stored entries.

func (*MemoryAuditStore) Append

func (s *MemoryAuditStore) Append(entry AuditEntry) error

Append appends an entry to the in-memory store.

func (*MemoryAuditStore) Query

func (s *MemoryAuditStore) Query(tenantID string, from, to time.Time) ([]AuditEntry, error)

Query returns entries matching the tenant and time range.

type MemoryBillingStore

type MemoryBillingStore struct {
	// contains filtered or unexported fields
}

MemoryBillingStore is an in-memory BillingStore for testing and development.

func NewMemoryBillingStore

func NewMemoryBillingStore() *MemoryBillingStore

NewMemoryBillingStore creates a new in-memory billing store.

func (*MemoryBillingStore) All

func (s *MemoryBillingStore) All() []BillingRecord

All returns a copy of all stored records.

func (*MemoryBillingStore) Query

func (s *MemoryBillingStore) Query(tenantID string, from, to time.Time) ([]BillingRecord, error)

Query returns records matching the tenant and time range.

func (*MemoryBillingStore) Store

func (s *MemoryBillingStore) Store(record BillingRecord) error

Store appends a record to the in-memory store.

type ModelInfo added in v1.18.0

type ModelInfo struct {
	ModelID   string
	VRAMBytes uint64
	LoadedAt  time.Time
	LastUsed  time.Time
}

ModelInfo describes a loaded model tracked by the ResourceManager.

type NDJSONRecorder added in v1.18.0

type NDJSONRecorder struct {
	// contains filtered or unexported fields
}

NDJSONRecorder writes usage events as newline-delimited JSON to an io.Writer.

func NewNDJSONRecorder added in v1.18.0

func NewNDJSONRecorder(w io.Writer) *NDJSONRecorder

NewNDJSONRecorder creates a recorder that writes NDJSON to w.

func (*NDJSONRecorder) Record added in v1.18.0

func (r *NDJSONRecorder) Record(event UsageEvent) error

Record serializes the event as a single JSON line followed by a newline.

type ResourceManager added in v1.18.0

type ResourceManager struct {
	// contains filtered or unexported fields
}

ResourceManager tracks loaded models and their VRAM usage, evicting least-recently-used models when a new load would exceed the memory budget.

func NewResourceManager added in v1.18.0

func NewResourceManager(budgetBytes uint64) (*ResourceManager, error)

NewResourceManager creates a ResourceManager with the given VRAM budget in bytes.

func (*ResourceManager) Evict added in v1.18.0

func (rm *ResourceManager) Evict(modelID string) error

Evict explicitly removes a model from the manager.

func (*ResourceManager) Load added in v1.18.0

func (rm *ResourceManager) Load(modelID string, vramBytes uint64) error

Load registers a model with the given VRAM footprint. If loading would exceed the budget, LRU models are evicted until there is enough space. Returns an error if the model alone exceeds the entire budget.

func (*ResourceManager) LoadedModels added in v1.18.0

func (rm *ResourceManager) LoadedModels() []ModelInfo

LoadedModels returns a snapshot of all currently loaded models.

func (*ResourceManager) SetEvictCallback added in v1.18.0

func (rm *ResourceManager) SetEvictCallback(fn func(modelID string))

SetEvictCallback sets an optional function called when a model is evicted.

func (*ResourceManager) Stats added in v1.18.0

func (rm *ResourceManager) Stats() (used, budget uint64, loaded int)

Stats returns the current memory usage statistics.

func (*ResourceManager) Touch added in v1.18.0

func (rm *ResourceManager) Touch(modelID string) error

Touch updates the last-used time for a model, moving it to the front of the LRU list. Call this on each inference request.

type SAMLMetadata

type SAMLMetadata struct {
	EntityID        string `json:"entity_id"`
	SignOnURL       string `json:"sign_on_url"`
	Certificate     string `json:"certificate"`
	NameIDFormat    string `json:"name_id_format,omitempty"`
	WantAuthnSigned bool   `json:"want_authn_signed"`
}

SAMLMetadata holds identity provider configuration parsed from SAML 2.0 metadata XML.

func ParseSAMLMetadata

func ParseSAMLMetadata(data []byte) (*SAMLMetadata, error)

ParseSAMLMetadata parses SAML 2.0 IdP metadata XML into a SAMLMetadata struct.

type SAMLProvider

type SAMLProvider struct {
	// contains filtered or unexported fields
}

SAMLProvider implements SSOProvider for SAML 2.0.

func NewSAMLProvider

func NewSAMLProvider(metadata *SAMLMetadata, tenantID string) *SAMLProvider

NewSAMLProvider creates a SAML 2.0 SSO provider from parsed metadata, bound to a specific tenant.

func (*SAMLProvider) EntityID

func (p *SAMLProvider) EntityID() string

EntityID returns the identity provider's entity ID.

func (*SAMLProvider) ValidateAssertion

func (p *SAMLProvider) ValidateAssertion(assertion []byte) (*SSOIdentity, error)

ValidateAssertion parses and validates a SAML 2.0 assertion, including XXE protection, XML digital signature verification, NotBefore clock skew tolerance, and assertion replay prevention.

type SSOIdentity

type SSOIdentity struct {
	Subject    string            `json:"subject"`
	TenantID   string            `json:"tenant_id"`
	Email      string            `json:"email,omitempty"`
	Attributes map[string]string `json:"attributes,omitempty"`
	ExpiresAt  time.Time         `json:"expires_at"`
}

SSOIdentity represents an authenticated user from an SSO provider.

type SSOProvider

type SSOProvider interface {
	// EntityID returns the identity provider's entity ID.
	EntityID() string

	// ValidateAssertion validates an assertion and returns the authenticated identity.
	ValidateAssertion(assertion []byte) (*SSOIdentity, error)
}

SSOProvider defines the interface for SSO authentication. Implementations handle protocol-specific details (SAML 2.0, OIDC, etc.).

type Tenant

type Tenant struct {
	ID string
	// contains filtered or unexported fields
}

Tenant represents a registered cloud tenant with runtime rate-limit state. Always accessed via pointer; must not be copied.

func (*Tenant) AllowConcurrent added in v1.18.0

func (t *Tenant) AllowConcurrent() bool

AllowConcurrent checks whether the tenant can accept another concurrent request. If MaxConcurrentRequests is 0 (unset), concurrency is unlimited. Returns true and increments the in-flight counter if allowed.

func (*Tenant) AllowRequest

func (t *Tenant) AllowRequest() bool

AllowRequest checks whether the tenant can make another request this minute. Returns true and increments the counter if allowed.

func (*Tenant) Config

func (t *Tenant) Config() TenantConfig

Config returns a copyable snapshot of the tenant's configuration. The APIKey field is redacted to prevent accidental credential leakage.

func (*Tenant) ConsumeTokens

func (t *Tenant) ConsumeTokens(n int64) bool

ConsumeTokens attempts to consume n tokens from the per-minute budget. Returns true if the tokens were consumed.

func (*Tenant) DeductTokens added in v1.16.0

func (t *Tenant) DeductTokens(n int64)

DeductTokens unconditionally adds n tokens to the consumed count without checking the budget. This is used to charge excess usage when actual token generation exceeds the pre-authorized estimate (e.g. max_tokens=1 but the model produced more tokens). Unlike ConsumeTokens, it never fails.

func (*Tenant) ModelAllowed added in v1.18.0

func (t *Tenant) ModelAllowed(model string) bool

ModelAllowed returns true if the model is in the tenant's allow list. An empty allow list permits all models.

func (*Tenant) RefundTokens added in v1.12.0

func (t *Tenant) RefundTokens(n int64)

RefundTokens returns n tokens to the per-minute budget, used to reconcile pre-authorized estimates with actual usage after inference completes.

func (*Tenant) ReleaseConcurrent added in v1.18.0

func (t *Tenant) ReleaseConcurrent()

ReleaseConcurrent decrements the in-flight counter after a request completes.

type TenantConfig

type TenantConfig struct {
	ID                    string   `json:"id"`
	APIKey                string   `json:"api_key"`
	RateLimit             int64    `json:"rate_limit"`   // max requests per minute
	TokenBudget           int64    `json:"token_budget"` // max tokens per minute
	MaxConcurrentRequests int      `json:"max_concurrent_requests,omitempty"`
	ModelAllowList        []string `json:"model_allow_list,omitempty"`
}

TenantConfig is the input for creating or describing a tenant. It contains no atomic fields and is safe to copy.

type TenantManager

type TenantManager struct {
	// contains filtered or unexported fields
}

TenantManager provides CRUD operations on tenants, keyed by both tenant ID and API key for O(1) lookups in either direction.

func NewTenantManager

func NewTenantManager(opts ...TenantManagerOption) *TenantManager

NewTenantManager creates a new empty TenantManager. By default it uses an in-memory backend. Use WithTenantBackend to supply a persistent backend.

func (*TenantManager) Create

func (m *TenantManager) Create(cfg TenantConfig) error

Create registers a new tenant. The tenant ID and API key must be unique.

func (*TenantManager) Delete

func (m *TenantManager) Delete(id string) error

Delete removes a tenant by ID.

func (*TenantManager) Get

func (m *TenantManager) Get(id string) (*Tenant, error)

Get retrieves a tenant by ID.

func (*TenantManager) GetByAPIKey

func (m *TenantManager) GetByAPIKey(apiKey string) (*Tenant, error)

GetByAPIKey retrieves a tenant by API key. The input key is hashed with SHA-256 for O(1) map lookup, then verified with constant-time comparison on the hashes to prevent timing side-channel attacks.

func (*TenantManager) List

func (m *TenantManager) List() []TenantConfig

List returns a copyable snapshot of all tenant configurations.

func (*TenantManager) Update

func (m *TenantManager) Update(id string, rateLimit, tokenBudget int64) error

Update modifies a tenant's rate limits and token budget.

type TenantManagerOption added in v1.18.0

type TenantManagerOption func(*TenantManager)

TenantManagerOption configures a TenantManager.

func WithTenantBackend added in v1.18.0

func WithTenantBackend(b TenantStoreBackend) TenantManagerOption

WithTenantBackend sets the persistence backend for a TenantManager.

type TenantStoreBackend added in v1.18.0

type TenantStoreBackend interface {
	SaveTenant(id string, cfg TenantConfig) error
	LoadTenant(id string) (*TenantConfig, bool)
	DeleteTenant(id string) error
	ListTenants() []TenantConfig
}

TenantStoreBackend abstracts persistence operations for tenant configurations.

type TokenMeter

type TokenMeter struct {
	// contains filtered or unexported fields
}

TokenMeter tracks input and output token usage per tenant and emits billing records to a BillingStore.

func NewTokenMeter

func NewTokenMeter(store BillingStore) *TokenMeter

NewTokenMeter creates a TokenMeter backed by the given BillingStore.

func (*TokenMeter) Query

func (m *TokenMeter) Query(tenantID string, from, to time.Time) ([]BillingRecord, error)

Query returns billing records for a tenant within the given time range.

func (*TokenMeter) Record

func (m *TokenMeter) Record(tenantID string, inputTokens, outputTokens int) error

Record records token usage for a tenant and persists a BillingRecord.

type UsageEvent added in v1.18.0

type UsageEvent struct {
	TenantID         string `json:"tenant_id"`
	Model            string `json:"model"`
	PromptTokens     int    `json:"prompt_tokens"`
	CompletionTokens int    `json:"completion_tokens"`
	Timestamp        int64  `json:"timestamp"`
}

UsageEvent records token consumption for a single request.

type UsageRecorder added in v1.18.0

type UsageRecorder interface {
	Record(event UsageEvent) error
}

UsageRecorder defines the interface for recording usage events. The default implementation writes NDJSON; a Kafka adapter can implement this interface for production deployments.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL