Documentation
¶
Overview ¶
Package cloud provides a multi-tenant managed inference service for Zerfoo.
It wraps the serve.Server with tenant isolation, token-based billing, rate limiting, and health checking for cloud deployments.
Stability: alpha
Index ¶
- func BillingMiddleware(recorder UsageRecorder) func(http.Handler) http.Handler
- type AuditAction
- type AuditEntry
- type AuditLogger
- type AuditResult
- type AuditStore
- type BboltTenantStoreBackend
- func (b *BboltTenantStoreBackend) Close() error
- func (b *BboltTenantStoreBackend) DeleteTenant(id string) error
- func (b *BboltTenantStoreBackend) ListTenants() []TenantConfig
- func (b *BboltTenantStoreBackend) LoadTenant(id string) (*TenantConfig, bool)
- func (b *BboltTenantStoreBackend) SaveTenant(id string, cfg TenantConfig) error
- type BillingRecord
- type BillingStore
- type CloudServer
- type MemoryAuditStore
- type MemoryBillingStore
- type ModelInfo
- type NDJSONRecorder
- type ResourceManager
- func (rm *ResourceManager) Evict(modelID string) error
- func (rm *ResourceManager) Load(modelID string, vramBytes uint64) error
- func (rm *ResourceManager) LoadedModels() []ModelInfo
- func (rm *ResourceManager) SetEvictCallback(fn func(modelID string))
- func (rm *ResourceManager) Stats() (used, budget uint64, loaded int)
- func (rm *ResourceManager) Touch(modelID string) error
- type SAMLMetadata
- type SAMLProvider
- type SSOIdentity
- type SSOProvider
- type Tenant
- func (t *Tenant) AllowConcurrent() bool
- func (t *Tenant) AllowRequest() bool
- func (t *Tenant) Config() TenantConfig
- func (t *Tenant) ConsumeTokens(n int64) bool
- func (t *Tenant) DeductTokens(n int64)
- func (t *Tenant) ModelAllowed(model string) bool
- func (t *Tenant) RefundTokens(n int64)
- func (t *Tenant) ReleaseConcurrent()
- type TenantConfig
- type TenantManager
- func (m *TenantManager) Create(cfg TenantConfig) error
- func (m *TenantManager) Delete(id string) error
- func (m *TenantManager) Get(id string) (*Tenant, error)
- func (m *TenantManager) GetByAPIKey(apiKey string) (*Tenant, error)
- func (m *TenantManager) List() []TenantConfig
- func (m *TenantManager) Update(id string, rateLimit, tokenBudget int64) error
- type TenantManagerOption
- type TenantStoreBackend
- type TokenMeter
- type UsageEvent
- type UsageRecorder
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BillingMiddleware ¶ added in v1.18.0
func BillingMiddleware(recorder UsageRecorder) func(http.Handler) http.Handler
BillingMiddleware returns an HTTP middleware that meters prompt and completion tokens per request and publishes usage events to the given recorder. It expects the tenant authentication middleware to run first so that tenantFromContext returns a valid tenant. The tenant ID is taken from the Authorization header's Bearer token value.
For streaming (SSE) responses, the JSON response body cannot be parsed as a single object. The middleware injects a generate.TokenUsage into the request context; the generation session writes prompt/completion counts there, which works for both streaming and non-streaming responses. JSON body parsing is used as a fallback for handlers that do not use context-based usage tracking.
Types ¶
type AuditAction ¶
type AuditAction string
AuditAction identifies the type of API operation being logged.
const ( AuditActionInference AuditAction = "inference" AuditActionCreate AuditAction = "create" AuditActionUpdate AuditAction = "update" AuditActionDelete AuditAction = "delete" AuditActionList AuditAction = "list" AuditActionAuth AuditAction = "auth" )
type AuditEntry ¶
type AuditEntry struct {
Timestamp time.Time `json:"timestamp"`
TenantID string `json:"tenant_id"`
Action AuditAction `json:"action"`
Result AuditResult `json:"result"`
Resource string `json:"resource"`
StatusCode int `json:"status_code"`
Method string `json:"method"`
Path string `json:"path"`
RemoteAddr string `json:"remote_addr"`
}
AuditEntry records a single auditable event for SOC 2 compliance. Sensitive data (API keys, request bodies) is never stored.
type AuditLogger ¶
type AuditLogger struct {
// contains filtered or unexported fields
}
AuditLogger records API requests for SOC 2 compliance. It deliberately omits sensitive fields (API keys, request/response bodies).
func NewAuditLogger ¶
func NewAuditLogger(store AuditStore) *AuditLogger
NewAuditLogger creates an AuditLogger backed by the given store.
func (*AuditLogger) Log ¶
func (a *AuditLogger) Log(entry AuditEntry) error
Log records an audit entry.
func (*AuditLogger) Query ¶
func (a *AuditLogger) Query(tenantID string, from, to time.Time) ([]AuditEntry, error)
Query returns audit entries for a tenant within the given time range.
type AuditResult ¶
type AuditResult string
AuditResult records the outcome of an API request.
const ( AuditResultSuccess AuditResult = "success" AuditResultDenied AuditResult = "denied" AuditResultRateLimited AuditResult = "rate_limited" AuditResultError AuditResult = "error" )
type AuditStore ¶
type AuditStore interface {
// Append persists an audit entry.
Append(entry AuditEntry) error
// Query returns audit entries for a tenant within the given time range.
Query(tenantID string, from, to time.Time) ([]AuditEntry, error)
}
AuditStore is the persistence interface for audit entries.
type BboltTenantStoreBackend ¶ added in v1.18.0
type BboltTenantStoreBackend struct {
// contains filtered or unexported fields
}
BboltTenantStoreBackend is a persistent TenantStoreBackend backed by a bbolt database.
func NewBboltTenantStoreBackend ¶ added in v1.18.0
func NewBboltTenantStoreBackend(path string) (*BboltTenantStoreBackend, error)
NewBboltTenantStoreBackend opens or creates a bbolt database at path and returns a backend ready for use with NewTenantManager(WithTenantBackend(...)).
func (*BboltTenantStoreBackend) Close ¶ added in v1.18.0
func (b *BboltTenantStoreBackend) Close() error
Close closes the underlying bbolt database.
func (*BboltTenantStoreBackend) DeleteTenant ¶ added in v1.18.0
func (b *BboltTenantStoreBackend) DeleteTenant(id string) error
DeleteTenant removes a tenant by ID.
func (*BboltTenantStoreBackend) ListTenants ¶ added in v1.18.0
func (b *BboltTenantStoreBackend) ListTenants() []TenantConfig
ListTenants returns all stored tenant configurations.
func (*BboltTenantStoreBackend) LoadTenant ¶ added in v1.18.0
func (b *BboltTenantStoreBackend) LoadTenant(id string) (*TenantConfig, bool)
LoadTenant retrieves a tenant configuration by ID.
func (*BboltTenantStoreBackend) SaveTenant ¶ added in v1.18.0
func (b *BboltTenantStoreBackend) SaveTenant(id string, cfg TenantConfig) error
SaveTenant persists a tenant configuration as JSON keyed by its ID.
type BillingRecord ¶
type BillingRecord struct {
TenantID string `json:"tenant_id"`
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
Timestamp time.Time `json:"timestamp"`
}
BillingRecord captures token usage for a single inference request.
type BillingStore ¶
type BillingStore interface {
// Store persists a billing record.
Store(record BillingRecord) error
// Query returns all billing records for a tenant within the given time range.
Query(tenantID string, from, to time.Time) ([]BillingRecord, error)
}
BillingStore is the persistence interface for billing records.
type CloudServer ¶
type CloudServer struct {
// contains filtered or unexported fields
}
CloudServer wraps an HTTP handler with multi-tenant isolation, token billing, rate limiting, and health checking for cloud deployments.
func NewCloudServer ¶
func NewCloudServer(handler http.Handler, tenants *TenantManager, meter *TokenMeter) *CloudServer
NewCloudServer creates a CloudServer that routes authenticated requests to the given handler through tenant isolation middleware.
func (*CloudServer) Handler ¶
func (cs *CloudServer) Handler() http.Handler
Handler returns the root HTTP handler with all middleware applied.
func (*CloudServer) Meter ¶
func (cs *CloudServer) Meter() *TokenMeter
Meter returns the TokenMeter for external billing queries.
func (*CloudServer) SetHealthy ¶
func (cs *CloudServer) SetHealthy(healthy bool)
SetHealthy sets the health status of the cloud server.
func (*CloudServer) Tenants ¶
func (cs *CloudServer) Tenants() *TenantManager
Tenants returns the TenantManager for external CRUD operations.
type MemoryAuditStore ¶
type MemoryAuditStore struct {
// contains filtered or unexported fields
}
MemoryAuditStore is an in-memory AuditStore for testing and development.
func NewMemoryAuditStore ¶
func NewMemoryAuditStore() *MemoryAuditStore
NewMemoryAuditStore creates a new in-memory audit store.
func (*MemoryAuditStore) All ¶
func (s *MemoryAuditStore) All() []AuditEntry
All returns a copy of all stored entries.
func (*MemoryAuditStore) Append ¶
func (s *MemoryAuditStore) Append(entry AuditEntry) error
Append appends an entry to the in-memory store.
func (*MemoryAuditStore) Query ¶
func (s *MemoryAuditStore) Query(tenantID string, from, to time.Time) ([]AuditEntry, error)
Query returns entries matching the tenant and time range.
type MemoryBillingStore ¶
type MemoryBillingStore struct {
// contains filtered or unexported fields
}
MemoryBillingStore is an in-memory BillingStore for testing and development.
func NewMemoryBillingStore ¶
func NewMemoryBillingStore() *MemoryBillingStore
NewMemoryBillingStore creates a new in-memory billing store.
func (*MemoryBillingStore) All ¶
func (s *MemoryBillingStore) All() []BillingRecord
All returns a copy of all stored records.
func (*MemoryBillingStore) Query ¶
func (s *MemoryBillingStore) Query(tenantID string, from, to time.Time) ([]BillingRecord, error)
Query returns records matching the tenant and time range.
func (*MemoryBillingStore) Store ¶
func (s *MemoryBillingStore) Store(record BillingRecord) error
Store appends a record to the in-memory store.
type ModelInfo ¶ added in v1.18.0
ModelInfo describes a loaded model tracked by the ResourceManager.
type NDJSONRecorder ¶ added in v1.18.0
type NDJSONRecorder struct {
// contains filtered or unexported fields
}
NDJSONRecorder writes usage events as newline-delimited JSON to an io.Writer.
func NewNDJSONRecorder ¶ added in v1.18.0
func NewNDJSONRecorder(w io.Writer) *NDJSONRecorder
NewNDJSONRecorder creates a recorder that writes NDJSON to w.
func (*NDJSONRecorder) Record ¶ added in v1.18.0
func (r *NDJSONRecorder) Record(event UsageEvent) error
Record serializes the event as a single JSON line followed by a newline.
type ResourceManager ¶ added in v1.18.0
type ResourceManager struct {
// contains filtered or unexported fields
}
ResourceManager tracks loaded models and their VRAM usage, evicting least-recently-used models when a new load would exceed the memory budget.
func NewResourceManager ¶ added in v1.18.0
func NewResourceManager(budgetBytes uint64) (*ResourceManager, error)
NewResourceManager creates a ResourceManager with the given VRAM budget in bytes.
func (*ResourceManager) Evict ¶ added in v1.18.0
func (rm *ResourceManager) Evict(modelID string) error
Evict explicitly removes a model from the manager.
func (*ResourceManager) Load ¶ added in v1.18.0
func (rm *ResourceManager) Load(modelID string, vramBytes uint64) error
Load registers a model with the given VRAM footprint. If loading would exceed the budget, LRU models are evicted until there is enough space. Returns an error if the model alone exceeds the entire budget.
func (*ResourceManager) LoadedModels ¶ added in v1.18.0
func (rm *ResourceManager) LoadedModels() []ModelInfo
LoadedModels returns a snapshot of all currently loaded models.
func (*ResourceManager) SetEvictCallback ¶ added in v1.18.0
func (rm *ResourceManager) SetEvictCallback(fn func(modelID string))
SetEvictCallback sets an optional function called when a model is evicted.
func (*ResourceManager) Stats ¶ added in v1.18.0
func (rm *ResourceManager) Stats() (used, budget uint64, loaded int)
Stats returns the current memory usage statistics.
func (*ResourceManager) Touch ¶ added in v1.18.0
func (rm *ResourceManager) Touch(modelID string) error
Touch updates the last-used time for a model, moving it to the front of the LRU list. Call this on each inference request.
type SAMLMetadata ¶
type SAMLMetadata struct {
EntityID string `json:"entity_id"`
SignOnURL string `json:"sign_on_url"`
Certificate string `json:"certificate"`
NameIDFormat string `json:"name_id_format,omitempty"`
WantAuthnSigned bool `json:"want_authn_signed"`
}
SAMLMetadata holds identity provider configuration parsed from SAML 2.0 metadata XML.
func ParseSAMLMetadata ¶
func ParseSAMLMetadata(data []byte) (*SAMLMetadata, error)
ParseSAMLMetadata parses SAML 2.0 IdP metadata XML into a SAMLMetadata struct.
type SAMLProvider ¶
type SAMLProvider struct {
// contains filtered or unexported fields
}
SAMLProvider implements SSOProvider for SAML 2.0.
func NewSAMLProvider ¶
func NewSAMLProvider(metadata *SAMLMetadata, tenantID string) *SAMLProvider
NewSAMLProvider creates a SAML 2.0 SSO provider from parsed metadata, bound to a specific tenant.
func (*SAMLProvider) EntityID ¶
func (p *SAMLProvider) EntityID() string
EntityID returns the identity provider's entity ID.
func (*SAMLProvider) ValidateAssertion ¶
func (p *SAMLProvider) ValidateAssertion(assertion []byte) (*SSOIdentity, error)
ValidateAssertion parses and validates a SAML 2.0 assertion, including XXE protection, XML digital signature verification, NotBefore clock skew tolerance, and assertion replay prevention.
type SSOIdentity ¶
type SSOIdentity struct {
Subject string `json:"subject"`
TenantID string `json:"tenant_id"`
Email string `json:"email,omitempty"`
Attributes map[string]string `json:"attributes,omitempty"`
ExpiresAt time.Time `json:"expires_at"`
}
SSOIdentity represents an authenticated user from an SSO provider.
type SSOProvider ¶
type SSOProvider interface {
// EntityID returns the identity provider's entity ID.
EntityID() string
// ValidateAssertion validates an assertion and returns the authenticated identity.
ValidateAssertion(assertion []byte) (*SSOIdentity, error)
}
SSOProvider defines the interface for SSO authentication. Implementations handle protocol-specific details (SAML 2.0, OIDC, etc.).
type Tenant ¶
type Tenant struct {
ID string
// contains filtered or unexported fields
}
Tenant represents a registered cloud tenant with runtime rate-limit state. Always accessed via pointer; must not be copied.
func (*Tenant) AllowConcurrent ¶ added in v1.18.0
AllowConcurrent checks whether the tenant can accept another concurrent request. If MaxConcurrentRequests is 0 (unset), concurrency is unlimited. Returns true and increments the in-flight counter if allowed.
func (*Tenant) AllowRequest ¶
AllowRequest checks whether the tenant can make another request this minute. Returns true and increments the counter if allowed.
func (*Tenant) Config ¶
func (t *Tenant) Config() TenantConfig
Config returns a copyable snapshot of the tenant's configuration. The APIKey field is redacted to prevent accidental credential leakage.
func (*Tenant) ConsumeTokens ¶
ConsumeTokens attempts to consume n tokens from the per-minute budget. Returns true if the tokens were consumed.
func (*Tenant) DeductTokens ¶ added in v1.16.0
DeductTokens unconditionally adds n tokens to the consumed count without checking the budget. This is used to charge excess usage when actual token generation exceeds the pre-authorized estimate (e.g. max_tokens=1 but the model produced more tokens). Unlike ConsumeTokens, it never fails.
func (*Tenant) ModelAllowed ¶ added in v1.18.0
ModelAllowed returns true if the model is in the tenant's allow list. An empty allow list permits all models.
func (*Tenant) RefundTokens ¶ added in v1.12.0
RefundTokens returns n tokens to the per-minute budget, used to reconcile pre-authorized estimates with actual usage after inference completes.
func (*Tenant) ReleaseConcurrent ¶ added in v1.18.0
func (t *Tenant) ReleaseConcurrent()
ReleaseConcurrent decrements the in-flight counter after a request completes.
type TenantConfig ¶
type TenantConfig struct {
ID string `json:"id"`
APIKey string `json:"api_key"`
RateLimit int64 `json:"rate_limit"` // max requests per minute
TokenBudget int64 `json:"token_budget"` // max tokens per minute
MaxConcurrentRequests int `json:"max_concurrent_requests,omitempty"`
ModelAllowList []string `json:"model_allow_list,omitempty"`
}
TenantConfig is the input for creating or describing a tenant. It contains no atomic fields and is safe to copy.
type TenantManager ¶
type TenantManager struct {
// contains filtered or unexported fields
}
TenantManager provides CRUD operations on tenants, keyed by both tenant ID and API key for O(1) lookups in either direction.
func NewTenantManager ¶
func NewTenantManager(opts ...TenantManagerOption) *TenantManager
NewTenantManager creates a new empty TenantManager. By default it uses an in-memory backend. Use WithTenantBackend to supply a persistent backend.
func (*TenantManager) Create ¶
func (m *TenantManager) Create(cfg TenantConfig) error
Create registers a new tenant. The tenant ID and API key must be unique.
func (*TenantManager) Delete ¶
func (m *TenantManager) Delete(id string) error
Delete removes a tenant by ID.
func (*TenantManager) Get ¶
func (m *TenantManager) Get(id string) (*Tenant, error)
Get retrieves a tenant by ID.
func (*TenantManager) GetByAPIKey ¶
func (m *TenantManager) GetByAPIKey(apiKey string) (*Tenant, error)
GetByAPIKey retrieves a tenant by API key. The input key is hashed with SHA-256 for O(1) map lookup, then verified with constant-time comparison on the hashes to prevent timing side-channel attacks.
func (*TenantManager) List ¶
func (m *TenantManager) List() []TenantConfig
List returns a copyable snapshot of all tenant configurations.
type TenantManagerOption ¶ added in v1.18.0
type TenantManagerOption func(*TenantManager)
TenantManagerOption configures a TenantManager.
func WithTenantBackend ¶ added in v1.18.0
func WithTenantBackend(b TenantStoreBackend) TenantManagerOption
WithTenantBackend sets the persistence backend for a TenantManager.
type TenantStoreBackend ¶ added in v1.18.0
type TenantStoreBackend interface {
SaveTenant(id string, cfg TenantConfig) error
LoadTenant(id string) (*TenantConfig, bool)
DeleteTenant(id string) error
ListTenants() []TenantConfig
}
TenantStoreBackend abstracts persistence operations for tenant configurations.
type TokenMeter ¶
type TokenMeter struct {
// contains filtered or unexported fields
}
TokenMeter tracks input and output token usage per tenant and emits billing records to a BillingStore.
func NewTokenMeter ¶
func NewTokenMeter(store BillingStore) *TokenMeter
NewTokenMeter creates a TokenMeter backed by the given BillingStore.
func (*TokenMeter) Query ¶
func (m *TokenMeter) Query(tenantID string, from, to time.Time) ([]BillingRecord, error)
Query returns billing records for a tenant within the given time range.
type UsageEvent ¶ added in v1.18.0
type UsageEvent struct {
TenantID string `json:"tenant_id"`
Model string `json:"model"`
PromptTokens int `json:"prompt_tokens"`
CompletionTokens int `json:"completion_tokens"`
Timestamp int64 `json:"timestamp"`
}
UsageEvent records token consumption for a single request.
type UsageRecorder ¶ added in v1.18.0
type UsageRecorder interface {
Record(event UsageEvent) error
}
UsageRecorder defines the interface for recording usage events. The default implementation writes NDJSON; a Kafka adapter can implement this interface for production deployments.