cloud

package

v1.38.2 Latest Latest Go to latest Published: Mar 31, 2026 License: Apache-2.0 Imports: 24 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package cloud provides a multi-tenant managed inference service for Zerfoo.

It wraps the serve.Server with tenant isolation, token-based billing, rate limiting, and health checking for cloud deployments.

Stability: alpha

Index ¶

func BillingMiddleware(recorder UsageRecorder) func(http.Handler) http.Handler
type AuditAction
type AuditEntry
type AuditLogger
- func NewAuditLogger(store AuditStore) *AuditLogger
- func (a *AuditLogger) Log(entry AuditEntry) error
- func (a *AuditLogger) Query(tenantID string, from, to time.Time) ([]AuditEntry, error)
type AuditResult
type AuditStore
type BboltTenantStoreBackend
- func NewBboltTenantStoreBackend(path string) (*BboltTenantStoreBackend, error)
- func (b *BboltTenantStoreBackend) Close() error
- func (b *BboltTenantStoreBackend) DeleteTenant(id string) error
- func (b *BboltTenantStoreBackend) ListTenants() []TenantConfig
- func (b *BboltTenantStoreBackend) LoadTenant(id string) (*TenantConfig, bool)
- func (b *BboltTenantStoreBackend) SaveTenant(id string, cfg TenantConfig) error
type BillingRecord
type BillingStore
type CloudServer
- func NewCloudServer(handler http.Handler, tenants *TenantManager, meter *TokenMeter) *CloudServer
- func (cs *CloudServer) Handler() http.Handler
- func (cs *CloudServer) Meter() *TokenMeter
- func (cs *CloudServer) SetHealthy(healthy bool)
- func (cs *CloudServer) Tenants() *TenantManager
type MemoryAuditStore
- func NewMemoryAuditStore() *MemoryAuditStore
- func (s *MemoryAuditStore) All() []AuditEntry
- func (s *MemoryAuditStore) Append(entry AuditEntry) error
- func (s *MemoryAuditStore) Query(tenantID string, from, to time.Time) ([]AuditEntry, error)
type MemoryBillingStore
- func NewMemoryBillingStore() *MemoryBillingStore
- func (s *MemoryBillingStore) All() []BillingRecord
- func (s *MemoryBillingStore) Query(tenantID string, from, to time.Time) ([]BillingRecord, error)
- func (s *MemoryBillingStore) Store(record BillingRecord) error
type ModelInfo
type NDJSONRecorder
- func NewNDJSONRecorder(w io.Writer) *NDJSONRecorder
- func (r *NDJSONRecorder) Record(event UsageEvent) error
type ResourceManager
- func NewResourceManager(budgetBytes uint64) (*ResourceManager, error)
- func (rm *ResourceManager) Evict(modelID string) error
- func (rm *ResourceManager) Load(modelID string, vramBytes uint64) error
- func (rm *ResourceManager) LoadedModels() []ModelInfo
- func (rm *ResourceManager) SetEvictCallback(fn func(modelID string))
- func (rm *ResourceManager) Stats() (used, budget uint64, loaded int)
- func (rm *ResourceManager) Touch(modelID string) error
type SAMLMetadata
- func ParseSAMLMetadata(data []byte) (*SAMLMetadata, error)
type SAMLProvider
- func NewSAMLProvider(metadata *SAMLMetadata, tenantID string) *SAMLProvider
- func (p *SAMLProvider) EntityID() string
- func (p *SAMLProvider) ValidateAssertion(assertion []byte) (*SSOIdentity, error)
type SSOIdentity
type SSOProvider
type Tenant
- func (t *Tenant) AllowConcurrent() bool
- func (t *Tenant) AllowRequest() bool
- func (t *Tenant) Config() TenantConfig
- func (t *Tenant) ConsumeTokens(n int64) bool
- func (t *Tenant) DeductTokens(n int64)
- func (t *Tenant) ModelAllowed(model string) bool
- func (t *Tenant) RefundTokens(n int64)
- func (t *Tenant) ReleaseConcurrent()
type TenantConfig
type TenantManager
- func NewTenantManager(opts ...TenantManagerOption) *TenantManager
- func (m *TenantManager) Create(cfg TenantConfig) error
- func (m *TenantManager) Delete(id string) error
- func (m *TenantManager) Get(id string) (*Tenant, error)
- func (m *TenantManager) GetByAPIKey(apiKey string) (*Tenant, error)
- func (m *TenantManager) List() []TenantConfig
- func (m *TenantManager) Update(id string, rateLimit, tokenBudget int64) error
type TenantManagerOption
- func WithTenantBackend(b TenantStoreBackend) TenantManagerOption
type TenantStoreBackend
type TokenMeter
- func NewTokenMeter(store BillingStore) *TokenMeter
- func (m *TokenMeter) Query(tenantID string, from, to time.Time) ([]BillingRecord, error)
- func (m *TokenMeter) Record(tenantID string, inputTokens, outputTokens int) error
type UsageEvent
type UsageRecorder

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BillingMiddleware ¶ added in v1.18.0

func BillingMiddleware(recorder UsageRecorder) func(http.Handler) http.Handler

BillingMiddleware returns an HTTP middleware that meters prompt and completion tokens per request and publishes usage events to the given recorder. It expects the tenant authentication middleware to run first so that tenantFromContext returns a valid tenant. The tenant ID is taken from the Authorization header's Bearer token value.

For streaming (SSE) responses, the JSON response body cannot be parsed as a single object. The middleware injects a generate.TokenUsage into the request context; the generation session writes prompt/completion counts there, which works for both streaming and non-streaming responses. JSON body parsing is used as a fallback for handlers that do not use context-based usage tracking.

Types ¶

type AuditAction ¶

type AuditAction string

AuditAction identifies the type of API operation being logged.

const (
	AuditActionInference AuditAction = "inference"
	AuditActionCreate    AuditAction = "create"
	AuditActionUpdate    AuditAction = "update"
	AuditActionDelete    AuditAction = "delete"
	AuditActionList      AuditAction = "list"
	AuditActionAuth      AuditAction = "auth"
)

type AuditEntry ¶

type AuditEntry struct {
	Timestamp  time.Time   `json:"timestamp"`
	TenantID   string      `json:"tenant_id"`
	Action     AuditAction `json:"action"`
	Result     AuditResult `json:"result"`
	Resource   string      `json:"resource"`
	StatusCode int         `json:"status_code"`
	Method     string      `json:"method"`
	Path       string      `json:"path"`
	RemoteAddr string      `json:"remote_addr"`
}

AuditEntry records a single auditable event for SOC 2 compliance. Sensitive data (API keys, request bodies) is never stored.

type AuditLogger ¶

type AuditLogger struct {
	// contains filtered or unexported fields
}

AuditLogger records API requests for SOC 2 compliance. It deliberately omits sensitive fields (API keys, request/response bodies).

func NewAuditLogger ¶

func NewAuditLogger(store AuditStore) *AuditLogger

NewAuditLogger creates an AuditLogger backed by the given store.

func (*AuditLogger) Log ¶

func (a *AuditLogger) Log(entry AuditEntry) error

Log records an audit entry.

func (*AuditLogger) Query ¶

func (a *AuditLogger) Query(tenantID string, from, to time.Time) ([]AuditEntry, error)

Query returns audit entries for a tenant within the given time range.

type AuditResult ¶

type AuditResult string

AuditResult records the outcome of an API request.

const (
	AuditResultSuccess      AuditResult = "success"
	AuditResultDenied       AuditResult = "denied"
	AuditResultRateLimited  AuditResult = "rate_limited"
	AuditResultError        AuditResult = "error"
	AuditResultUnauthorized AuditResult = "unauthorized"
)

type AuditStore ¶

type AuditStore interface {
	// Append persists an audit entry.
	Append(entry AuditEntry) error

	// Query returns audit entries for a tenant within the given time range.
	Query(tenantID string, from, to time.Time) ([]AuditEntry, error)
}

AuditStore is the persistence interface for audit entries.

type BboltTenantStoreBackend ¶ added in v1.18.0

type BboltTenantStoreBackend struct {
	// contains filtered or unexported fields
}

BboltTenantStoreBackend is a persistent TenantStoreBackend backed by a bbolt database.

func NewBboltTenantStoreBackend ¶ added in v1.18.0

func NewBboltTenantStoreBackend(path string) (*BboltTenantStoreBackend, error)

NewBboltTenantStoreBackend opens or creates a bbolt database at path and returns a backend ready for use with NewTenantManager(WithTenantBackend(...)).

func (*BboltTenantStoreBackend) Close ¶ added in v1.18.0

func (b *BboltTenantStoreBackend) Close() error

Close closes the underlying bbolt database.

func (*BboltTenantStoreBackend) DeleteTenant ¶ added in v1.18.0

func (b *BboltTenantStoreBackend) DeleteTenant(id string) error

DeleteTenant removes a tenant by ID.

func (*BboltTenantStoreBackend) ListTenants ¶ added in v1.18.0

func (b *BboltTenantStoreBackend) ListTenants() []TenantConfig

ListTenants returns all stored tenant configurations.

func (*BboltTenantStoreBackend) LoadTenant ¶ added in v1.18.0

func (b *BboltTenantStoreBackend) LoadTenant(id string) (*TenantConfig, bool)

LoadTenant retrieves a tenant configuration by ID.

func (*BboltTenantStoreBackend) SaveTenant ¶ added in v1.18.0

func (b *BboltTenantStoreBackend) SaveTenant(id string, cfg TenantConfig) error

SaveTenant persists a tenant configuration as JSON keyed by its ID.

type BillingRecord ¶

type BillingRecord struct {
	TenantID     string    `json:"tenant_id"`
	InputTokens  int       `json:"input_tokens"`
	OutputTokens int       `json:"output_tokens"`
	Timestamp    time.Time `json:"timestamp"`
}

BillingRecord captures token usage for a single inference request.

type BillingStore ¶

type BillingStore interface {
	// Store persists a billing record.
	Store(record BillingRecord) error

	// Query returns all billing records for a tenant within the given time range.
	Query(tenantID string, from, to time.Time) ([]BillingRecord, error)
}

BillingStore is the persistence interface for billing records.

type CloudServer ¶

type CloudServer struct {
	// contains filtered or unexported fields
}

CloudServer wraps an HTTP handler with multi-tenant isolation, token billing, rate limiting, and health checking for cloud deployments.

func NewCloudServer ¶

func NewCloudServer(handler http.Handler, tenants *TenantManager, meter *TokenMeter) *CloudServer

NewCloudServer creates a CloudServer that routes authenticated requests to the given handler through tenant isolation middleware.

func (*CloudServer) Handler ¶

func (cs *CloudServer) Handler() http.Handler

Handler returns the root HTTP handler with all middleware applied.

func (*CloudServer) Meter ¶

func (cs *CloudServer) Meter() *TokenMeter

Meter returns the TokenMeter for external billing queries.

func (*CloudServer) SetHealthy ¶

func (cs *CloudServer) SetHealthy(healthy bool)

SetHealthy sets the health status of the cloud server.

func (*CloudServer) Tenants ¶

func (cs *CloudServer) Tenants() *TenantManager

Tenants returns the TenantManager for external CRUD operations.

type MemoryAuditStore ¶

type MemoryAuditStore struct {
	// contains filtered or unexported fields
}

MemoryAuditStore is an in-memory AuditStore for testing and development.

func NewMemoryAuditStore ¶

func NewMemoryAuditStore() *MemoryAuditStore

NewMemoryAuditStore creates a new in-memory audit store.

func (*MemoryAuditStore) All ¶

func (s *MemoryAuditStore) All() []AuditEntry

All returns a copy of all stored entries.

func (*MemoryAuditStore) Append ¶

func (s *MemoryAuditStore) Append(entry AuditEntry) error

Append appends an entry to the in-memory store.

func (*MemoryAuditStore) Query ¶

func (s *MemoryAuditStore) Query(tenantID string, from, to time.Time) ([]AuditEntry, error)

Query returns entries matching the tenant and time range.

type MemoryBillingStore ¶

type MemoryBillingStore struct {
	// contains filtered or unexported fields
}

MemoryBillingStore is an in-memory BillingStore for testing and development.

func NewMemoryBillingStore ¶

func NewMemoryBillingStore() *MemoryBillingStore

NewMemoryBillingStore creates a new in-memory billing store.

func (*MemoryBillingStore) All ¶

func (s *MemoryBillingStore) All() []BillingRecord

All returns a copy of all stored records.

func (*MemoryBillingStore) Query ¶

func (s *MemoryBillingStore) Query(tenantID string, from, to time.Time) ([]BillingRecord, error)

Query returns records matching the tenant and time range.

func (*MemoryBillingStore) Store ¶

func (s *MemoryBillingStore) Store(record BillingRecord) error

Store appends a record to the in-memory store.

type ModelInfo ¶ added in v1.18.0

type ModelInfo struct {
	ModelID   string
	VRAMBytes uint64
	LoadedAt  time.Time
	LastUsed  time.Time
}

ModelInfo describes a loaded model tracked by the ResourceManager.

type NDJSONRecorder ¶ added in v1.18.0

type NDJSONRecorder struct {
	// contains filtered or unexported fields
}

NDJSONRecorder writes usage events as newline-delimited JSON to an io.Writer.

func NewNDJSONRecorder ¶ added in v1.18.0

func NewNDJSONRecorder(w io.Writer) *NDJSONRecorder

NewNDJSONRecorder creates a recorder that writes NDJSON to w.

func (*NDJSONRecorder) Record ¶ added in v1.18.0

func (r *NDJSONRecorder) Record(event UsageEvent) error

Record serializes the event as a single JSON line followed by a newline.

type ResourceManager ¶ added in v1.18.0

type ResourceManager struct {
	// contains filtered or unexported fields
}

ResourceManager tracks loaded models and their VRAM usage, evicting least-recently-used models when a new load would exceed the memory budget.

func NewResourceManager ¶ added in v1.18.0

func NewResourceManager(budgetBytes uint64) (*ResourceManager, error)

NewResourceManager creates a ResourceManager with the given VRAM budget in bytes.

func (*ResourceManager) Evict ¶ added in v1.18.0

func (rm *ResourceManager) Evict(modelID string) error

Evict explicitly removes a model from the manager.

func (*ResourceManager) Load ¶ added in v1.18.0

func (rm *ResourceManager) Load(modelID string, vramBytes uint64) error

Load registers a model with the given VRAM footprint. If loading would exceed the budget, LRU models are evicted until there is enough space. Returns an error if the model alone exceeds the entire budget.

func (*ResourceManager) LoadedModels ¶ added in v1.18.0

func (rm *ResourceManager) LoadedModels() []ModelInfo

LoadedModels returns a snapshot of all currently loaded models.

func (*ResourceManager) SetEvictCallback ¶ added in v1.18.0

func (rm *ResourceManager) SetEvictCallback(fn func(modelID string))

SetEvictCallback sets an optional function called when a model is evicted.

func (*ResourceManager) Stats ¶ added in v1.18.0

func (rm *ResourceManager) Stats() (used, budget uint64, loaded int)

Stats returns the current memory usage statistics.

func (*ResourceManager) Touch ¶ added in v1.18.0

func (rm *ResourceManager) Touch(modelID string) error

Touch updates the last-used time for a model, moving it to the front of the LRU list. Call this on each inference request.

type SAMLMetadata ¶

type SAMLMetadata struct {
	EntityID        string `json:"entity_id"`
	SignOnURL       string `json:"sign_on_url"`
	Certificate     string `json:"certificate"`
	NameIDFormat    string `json:"name_id_format,omitempty"`
	WantAuthnSigned bool   `json:"want_authn_signed"`
}

SAMLMetadata holds identity provider configuration parsed from SAML 2.0 metadata XML.

func ParseSAMLMetadata ¶

func ParseSAMLMetadata(data []byte) (*SAMLMetadata, error)

ParseSAMLMetadata parses SAML 2.0 IdP metadata XML into a SAMLMetadata struct.

type SAMLProvider ¶

type SAMLProvider struct {
	// contains filtered or unexported fields
}

SAMLProvider implements SSOProvider for SAML 2.0.

func NewSAMLProvider ¶

func NewSAMLProvider(metadata *SAMLMetadata, tenantID string) *SAMLProvider

NewSAMLProvider creates a SAML 2.0 SSO provider from parsed metadata, bound to a specific tenant.

func (*SAMLProvider) EntityID ¶

func (p *SAMLProvider) EntityID() string

EntityID returns the identity provider's entity ID.

func (*SAMLProvider) ValidateAssertion ¶

func (p *SAMLProvider) ValidateAssertion(assertion []byte) (*SSOIdentity, error)

ValidateAssertion parses and validates a SAML 2.0 assertion, including XXE protection, XML digital signature verification, NotBefore clock skew tolerance, and assertion replay prevention.

type SSOIdentity ¶

type SSOIdentity struct {
	Subject    string            `json:"subject"`
	TenantID   string            `json:"tenant_id"`
	Email      string            `json:"email,omitempty"`
	Attributes map[string]string `json:"attributes,omitempty"`
	ExpiresAt  time.Time         `json:"expires_at"`
}

SSOIdentity represents an authenticated user from an SSO provider.

type SSOProvider ¶

type SSOProvider interface {
	// EntityID returns the identity provider's entity ID.
	EntityID() string

	// ValidateAssertion validates an assertion and returns the authenticated identity.
	ValidateAssertion(assertion []byte) (*SSOIdentity, error)
}

SSOProvider defines the interface for SSO authentication. Implementations handle protocol-specific details (SAML 2.0, OIDC, etc.).

type Tenant ¶

type Tenant struct {
	ID string
	// contains filtered or unexported fields
}

Tenant represents a registered cloud tenant with runtime rate-limit state. Always accessed via pointer; must not be copied.

func (*Tenant) AllowConcurrent ¶ added in v1.18.0

func (t *Tenant) AllowConcurrent() bool

AllowConcurrent checks whether the tenant can accept another concurrent request. If MaxConcurrentRequests is 0 (unset), concurrency is unlimited. Returns true and increments the in-flight counter if allowed.

func (*Tenant) AllowRequest ¶

func (t *Tenant) AllowRequest() bool

AllowRequest checks whether the tenant can make another request this minute. Returns true and increments the counter if allowed.

func (*Tenant) Config ¶

func (t *Tenant) Config() TenantConfig

Config returns a copyable snapshot of the tenant's configuration. The APIKey field is redacted to prevent accidental credential leakage.

func (*Tenant) ConsumeTokens ¶

func (t *Tenant) ConsumeTokens(n int64) bool

ConsumeTokens attempts to consume n tokens from the per-minute budget. Returns true if the tokens were consumed.

func (*Tenant) DeductTokens ¶ added in v1.16.0

func (t *Tenant) DeductTokens(n int64)

DeductTokens unconditionally adds n tokens to the consumed count without checking the budget. This is used to charge excess usage when actual token generation exceeds the pre-authorized estimate (e.g. max_tokens=1 but the model produced more tokens). Unlike ConsumeTokens, it never fails.

func (*Tenant) ModelAllowed ¶ added in v1.18.0

func (t *Tenant) ModelAllowed(model string) bool

ModelAllowed returns true if the model is in the tenant's allow list. An empty allow list permits all models.

func (*Tenant) RefundTokens ¶ added in v1.12.0

func (t *Tenant) RefundTokens(n int64)

RefundTokens returns n tokens to the per-minute budget, used to reconcile pre-authorized estimates with actual usage after inference completes.

func (*Tenant) ReleaseConcurrent ¶ added in v1.18.0

func (t *Tenant) ReleaseConcurrent()

ReleaseConcurrent decrements the in-flight counter after a request completes.

type TenantConfig ¶

type TenantConfig struct {
	ID                    string   `json:"id"`
	APIKey                string   `json:"api_key"`
	RateLimit             int64    `json:"rate_limit"`   // max requests per minute
	TokenBudget           int64    `json:"token_budget"` // max tokens per minute
	MaxConcurrentRequests int      `json:"max_concurrent_requests,omitempty"`
	ModelAllowList        []string `json:"model_allow_list,omitempty"`
}

TenantConfig is the input for creating or describing a tenant. It contains no atomic fields and is safe to copy.

type TenantManager ¶

type TenantManager struct {
	// contains filtered or unexported fields
}

TenantManager provides CRUD operations on tenants, keyed by both tenant ID and API key for O(1) lookups in either direction.

func NewTenantManager ¶

func NewTenantManager(opts ...TenantManagerOption) *TenantManager

NewTenantManager creates a new empty TenantManager. By default it uses an in-memory backend. Use WithTenantBackend to supply a persistent backend.

func (*TenantManager) Create ¶

func (m *TenantManager) Create(cfg TenantConfig) error

Create registers a new tenant. The tenant ID and API key must be unique.

func (*TenantManager) Delete ¶

func (m *TenantManager) Delete(id string) error

Delete removes a tenant by ID.

func (*TenantManager) Get ¶

func (m *TenantManager) Get(id string) (*Tenant, error)

Get retrieves a tenant by ID.

func (*TenantManager) GetByAPIKey ¶

func (m *TenantManager) GetByAPIKey(apiKey string) (*Tenant, error)

GetByAPIKey retrieves a tenant by API key. The input key is hashed with SHA-256 for O(1) map lookup, then verified with constant-time comparison on the hashes to prevent timing side-channel attacks.

func (*TenantManager) List ¶

func (m *TenantManager) List() []TenantConfig

List returns a copyable snapshot of all tenant configurations.

func (*TenantManager) Update ¶

func (m *TenantManager) Update(id string, rateLimit, tokenBudget int64) error

Update modifies a tenant's rate limits and token budget.

type TenantManagerOption ¶ added in v1.18.0

type TenantManagerOption func(*TenantManager)

TenantManagerOption configures a TenantManager.

func WithTenantBackend ¶ added in v1.18.0

func WithTenantBackend(b TenantStoreBackend) TenantManagerOption

WithTenantBackend sets the persistence backend for a TenantManager.

type TenantStoreBackend ¶ added in v1.18.0

type TenantStoreBackend interface {
	SaveTenant(id string, cfg TenantConfig) error
	LoadTenant(id string) (*TenantConfig, bool)
	DeleteTenant(id string) error
	ListTenants() []TenantConfig
}

TenantStoreBackend abstracts persistence operations for tenant configurations.

type TokenMeter ¶

type TokenMeter struct {
	// contains filtered or unexported fields
}

TokenMeter tracks input and output token usage per tenant and emits billing records to a BillingStore.

func NewTokenMeter ¶

func NewTokenMeter(store BillingStore) *TokenMeter

NewTokenMeter creates a TokenMeter backed by the given BillingStore.

func (*TokenMeter) Query ¶

func (m *TokenMeter) Query(tenantID string, from, to time.Time) ([]BillingRecord, error)

Query returns billing records for a tenant within the given time range.

func (*TokenMeter) Record ¶

func (m *TokenMeter) Record(tenantID string, inputTokens, outputTokens int) error

Record records token usage for a tenant and persists a BillingRecord.

type UsageEvent ¶ added in v1.18.0

type UsageEvent struct {
	TenantID         string `json:"tenant_id"`
	Model            string `json:"model"`
	PromptTokens     int    `json:"prompt_tokens"`
	CompletionTokens int    `json:"completion_tokens"`
	Timestamp        int64  `json:"timestamp"`
}

UsageEvent records token consumption for a single request.

type UsageRecorder ¶ added in v1.18.0

type UsageRecorder interface {
	Record(event UsageEvent) error
}

UsageRecorder defines the interface for recording usage events. The default implementation writes NDJSON; a Kafka adapter can implement this interface for production deployments.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL