Documentation
¶
Index ¶
- Variables
- func DetectServerType(ctx context.Context, baseURL string) string
- func FormatEndpointDisplay(endpoints []EndpointInfo) string
- func ValidateName(name string) error
- type Client
- type ContainerManager
- type Deployment
- type EndpointInfo
- type Gateway
- type GatewayConfig
- type ModelInfo
- type Provenance
- type Store
Constants ¶
This section is empty.
Variables ¶
var ErrDeploymentExists = errors.New("inference: deployment already exists")
ErrDeploymentExists is returned by Create when a deployment with the same name already exists and --force was not specified.
var ErrDeploymentNotFound = errors.New("inference: deployment not found")
ErrDeploymentNotFound is returned when a named inference deployment does not exist in the store.
Functions ¶
func DetectServerType ¶ added in v0.8.1
DetectServerType probes baseURL to determine the server software. Returns "ollama", "llama-server", "openai-compat", or "".
func FormatEndpointDisplay ¶ added in v0.8.1
func FormatEndpointDisplay(endpoints []EndpointInfo) string
FormatEndpointDisplay pretty-prints a list of discovered endpoints.
func ValidateName ¶ added in v0.8.1
ValidateName checks that name is a safe deployment identifier. It rejects empty strings, path traversal attempts, and shell metacharacters.
Types ¶
type Client ¶ added in v0.8.1
type Client struct {
// GatewayURL is the base URL of the inference gateway (e.g. "http://localhost:8402").
GatewayURL string
// HTTP is the underlying transport. Defaults to http.DefaultTransport.
HTTP http.RoundTripper
// contains filtered or unexported fields
}
Client is an http.RoundTripper that transparently encrypts request bodies to an Obol SE inference gateway and optionally decrypts encrypted responses.
The SE public key is fetched lazily on first use and cached for the lifetime of the Client.
func NewClient ¶ added in v0.8.1
NewClient creates a Client targeting the given gateway URL and eagerly fetches the SE public key so the first request does not block on the fetch.
func (*Client) Do ¶ added in v0.8.1
Do sends req using the client's transport (with SE encryption applied). It is a convenience wrapper around RoundTrip that matches http.Client.Do's signature.
func (*Client) EnableEncryptedReplies ¶ added in v0.8.1
EnableEncryptedReplies generates an ephemeral local key. When set, the client attaches X-Obol-Reply-Pubkey to every encrypted request so the gateway encrypts the response back to this key, and Do() decrypts it transparently before returning.
On non-darwin builds this returns enclave.ErrNotSupported because the decryption half requires the SE; encryption (for the request) is always available.
func (*Client) Pubkey ¶ added in v0.8.1
Pubkey returns the cached SE public key bytes (65-byte uncompressed P-256). Returns nil if the key has not been fetched yet.
type ContainerManager ¶ added in v0.8.1
type ContainerManager struct {
// contains filtered or unexported fields
}
ContainerManager manages an Ollama Linux container using the apple/container CLI (github.com/apple/container v0.9.0+).
The container runs Ollama on its internal port 11434, mapped to a host-local port (default 11435) so only the gateway process can reach it. No bridge to the external network is provided — the container can receive inference requests from the gateway but cannot initiate outbound connections.
Install the CLI before use:
curl -L -o /tmp/container-installer-signed.pkg \ https://github.com/apple/container/releases/download/0.9.0/container-installer-signed.pkg sudo installer -pkg /tmp/container-installer-signed.pkg -target /
func (*ContainerManager) EnsureSystemRunning ¶ added in v0.8.1
func (m *ContainerManager) EnsureSystemRunning(ctx context.Context) error
EnsureSystemRunning starts the container system daemon if it is not already active. Safe to call when the daemon is already running.
func (*ContainerManager) Start ¶ added in v0.8.1
Start pulls the OCI image (if not cached) and runs the Ollama container, then blocks until the Ollama API responds or ctx is cancelled.
Any stale container with the same name is removed before starting.
func (*ContainerManager) Stop ¶ added in v0.8.1
func (m *ContainerManager) Stop(ctx context.Context) error
Stop gracefully stops and removes the named container. Returns nil if the container does not exist.
func (*ContainerManager) UpstreamURL ¶ added in v0.8.1
func (m *ContainerManager) UpstreamURL() string
UpstreamURL returns the URL where the running Ollama can be reached from the host.
type Deployment ¶ added in v0.8.1
type Deployment struct {
// Name is the human-readable identifier for this deployment.
// Used as the keychain tag suffix and directory name.
Name string `json:"name"`
// EnclaveTag is the macOS keychain application tag for the SE key.
// Derived from Name if not explicitly set:
// "com.obol.inference.<name>"
EnclaveTag string `json:"enclave_tag"`
// ListenAddr is the gateway listen address (default ":8402").
ListenAddr string `json:"listen_addr"`
// UpstreamURL is the inference backend URL (default "http://localhost:11434").
UpstreamURL string `json:"upstream_url"`
// WalletAddress is the USDC payment recipient.
WalletAddress string `json:"wallet_address"`
// PricePerRequest is the USDC price per inference call (default "0.001").
PricePerRequest string `json:"price_per_request"`
// PricePerMTok is the original per-million-token price when request pricing
// was derived from the temporary phase-1 approximation.
PricePerMTok string `json:"price_per_mtok,omitempty"`
// ApproxTokensPerRequest records the fixed approximation used to derive the
// charged request price from PricePerMTok.
ApproxTokensPerRequest int `json:"approx_tokens_per_request,omitempty"`
// Chain is the x402 payment chain name (e.g. "base", "base-sepolia").
Chain string `json:"chain"`
// FacilitatorURL is the x402 facilitator URL.
FacilitatorURL string `json:"facilitator_url"`
// VMMode enables running the upstream inference engine inside an Apple
// Containerization Linux micro-VM instead of pointing at an existing
// Ollama process. Requires the apple/container CLI to be installed.
// See: https://github.com/apple/container
VMMode bool `json:"vm_mode,omitempty"`
// VMImage is the OCI image to run (default "ollama/ollama:latest").
VMImage string `json:"vm_image,omitempty"`
// VMCPUs is the number of vCPUs to allocate to the VM (default 4).
VMCPUs int `json:"vm_cpus,omitempty"`
// VMMemoryMB is the RAM to allocate to the VM in MiB (default 8192).
VMMemoryMB int `json:"vm_memory_mb,omitempty"`
// VMHostPort is the host-local port mapped to Ollama's 11434 inside the
// container (default 11435). Must not conflict with other deployments.
VMHostPort int `json:"vm_host_port,omitempty"`
// TEEType is the Linux TEE backend ("tdx", "snp", "nitro", "stub").
// Empty means macOS Secure Enclave mode.
// Mutually exclusive with EnclaveTag-based SE mode on macOS.
TEEType string `json:"tee_type,omitempty"`
// ModelHash is the hex-encoded SHA-256 of the model being served.
// Required when TEEType is set. Bound into the TEE attestation user_data.
ModelHash string `json:"model_hash,omitempty"`
// NoPaymentGate disables the built-in x402 payment middleware when the
// gateway runs behind the cluster's x402 verifier to avoid double-gating.
NoPaymentGate bool `json:"no_payment_gate,omitempty"`
// Provenance holds optional metadata about how the model was produced
// (e.g. autoresearch experiment results). Stored alongside the deployment
// config and passed to the registration document when selling.
Provenance *Provenance `json:"provenance,omitempty"`
// CreatedAt is the RFC3339 timestamp of when this deployment was created.
CreatedAt string `json:"created_at"`
// UpdatedAt is the RFC3339 timestamp of the most recent update.
UpdatedAt string `json:"updated_at,omitempty"`
}
Deployment is a named, persisted inference gateway configuration. A long-lived entity with a stable identity (SE public key) and configurable parameters.
type EndpointInfo ¶ added in v0.8.1
EndpointInfo describes a discovered local inference endpoint.
func ProbeEndpoint ¶ added in v0.8.1
func ProbeEndpoint(host string, port int) (*EndpointInfo, error)
ProbeEndpoint hits host:port/v1/models and returns discovered info.
func ProbeEndpointContext ¶ added in v0.8.1
ProbeEndpointContext is the context-aware version of ProbeEndpoint. It creates a shared HTTP client used for both server type detection and model fetching to avoid redundant connections.
func ScanLocalEndpoints ¶ added in v0.8.1
func ScanLocalEndpoints() ([]EndpointInfo, error)
ScanLocalEndpoints probes all common local ports and returns any that respond.
func ScanLocalEndpointsContext ¶ added in v0.8.1
func ScanLocalEndpointsContext(ctx context.Context) ([]EndpointInfo, error)
ScanLocalEndpointsContext probes common ports concurrently with context support. All ports are probed in parallel using goroutines; results are collected and returned in the same order as commonPorts.
func (EndpointInfo) BaseURL ¶ added in v0.8.1
func (e EndpointInfo) BaseURL() string
BaseURL returns the HTTP base URL for this endpoint.
type Gateway ¶
type Gateway struct {
// contains filtered or unexported fields
}
Gateway is an x402-enabled reverse proxy for LLM inference with optional Secure Enclave or TEE request encryption and optional container-isolated upstream.
func NewGateway ¶
func NewGateway(cfg GatewayConfig) (*Gateway, error)
NewGateway creates a new inference gateway with the given configuration.
type GatewayConfig ¶
type GatewayConfig struct {
// ListenAddr is the address to listen on (e.g., ":8402").
ListenAddr string
// UpstreamURL is the upstream inference service URL (e.g., "http://localhost:11434").
UpstreamURL string
// WalletAddress is the USDC recipient address for payments.
WalletAddress string
// PricePerRequest is the USDC amount charged per inference request (e.g., "0.001").
PricePerRequest string
// Chain is the x402 chain configuration (e.g., x402pkg.ChainBaseMainnet).
Chain x402pkg.ChainInfo
// FacilitatorURL is the x402 facilitator service URL.
FacilitatorURL string
// VerifyOnly skips blockchain settlement after successful verification.
// Useful for testing and staging environments where no real funds are involved.
VerifyOnly bool
// EnclaveTag is the macOS Secure Enclave keychain application tag used for
// request decryption. When non-empty the gateway enables two additional
// behaviours:
//
// 1. GET /v1/enclave/pubkey — returns the SE public key as JSON so that
// clients can encrypt their request bodies.
//
// 2. Inference endpoints accept Content-Type: application/x-obol-encrypted
// bodies. The gateway decrypts them via the SE private key before
// forwarding to the upstream service. If the request also contains a
// X-Obol-Reply-Pubkey header, the response is re-encrypted to the
// client's ephemeral key (end-to-end confidentiality).
//
// When empty, all enclave functionality is disabled and the gateway
// operates in plain x402-only mode.
EnclaveTag string
// VMMode enables running the upstream inference engine inside an Apple
// Containerization Linux micro-VM via the apple/container CLI.
// When true, the gateway starts the container on Start() and stops it on
// Stop(), overriding UpstreamURL with the container's mapped local port.
VMMode bool
// VMImage is the OCI image to run (default "ollama/ollama:latest").
VMImage string
// VMCPUs is the number of vCPUs to allocate (default 4).
VMCPUs int
// VMMemoryMB is the RAM to allocate in MiB (default 8192).
VMMemoryMB int
// VMHostPort is the host-local port mapped from the container's Ollama
// port 11434 (default 11435).
VMHostPort int
// VMBinary is the path to the container CLI binary.
// Defaults to "container" (PATH lookup).
VMBinary string
// TEEType specifies the Linux TEE backend. When non-empty, the gateway
// uses internal/tee instead of internal/enclave for key management.
// Valid values: "tdx", "snp", "nitro", "stub".
// Mutually exclusive with EnclaveTag.
TEEType string
// ModelHash is the hex-encoded SHA-256 of the model being served.
// Required when TEEType is set. Bound into the TEE attestation user_data
// so verifiers can confirm the model identity.
ModelHash string
// NoPaymentGate disables the built-in x402 payment middleware. Use this
// when the gateway runs behind the cluster's x402 verifier (via Traefik
// ForwardAuth) to avoid double-gating requests. Enclave/TEE encryption
// middleware remains active when enabled.
NoPaymentGate bool
}
GatewayConfig holds configuration for the x402 inference gateway.
type ModelInfo ¶ added in v0.8.1
type ModelInfo struct {
ID string `json:"id"`
OwnedBy string `json:"owned_by"`
Created int64 `json:"created"`
}
ModelInfo describes a single model exposed by an inference server.
func ParseModelsResponse ¶ added in v0.8.1
ParseModelsResponse parses raw JSON bytes into a slice of ModelInfo. Exported for testing.
type Provenance ¶ added in v0.8.1
type Provenance struct {
Framework string `json:"framework,omitempty"` // e.g. "autoresearch"
MetricName string `json:"metricName,omitempty"` // e.g. "val_bpb"
MetricValue string `json:"metricValue,omitempty"` // e.g. "0.9973"
ExperimentID string `json:"experimentId,omitempty"` // commit hash or UUID
TrainHash string `json:"trainHash,omitempty"` // e.g. "sha256:..."
ParamCount string `json:"paramCount,omitempty"` // e.g. "50000000"
}
Provenance tracks how a model or service was produced. JSON field names use camelCase so the same document can flow through publish.py -> --provenance-file -> ServiceOffer -> agent-registration.json.
type Store ¶ added in v0.8.1
type Store struct {
// contains filtered or unexported fields
}
Store manages named inference deployment configurations on disk. Layout: <configDir>/inference/<name>/config.json
func (*Store) Create ¶ added in v0.8.1
func (s *Store) Create(d *Deployment, force bool) error
Create persists a new Deployment. Returns ErrDeploymentExists if a deployment with that name is already stored and force is false.
func (*Store) Delete ¶ added in v0.8.1
Delete removes a deployment's config directory from disk. The SE key in the keychain is NOT deleted by this method — call enclave.DeleteKey(d.EnclaveTag) separately if desired.
func (*Store) Get ¶ added in v0.8.1
func (s *Store) Get(name string) (*Deployment, error)
Get loads a Deployment by name. Returns ErrDeploymentNotFound if missing.
func (*Store) List ¶ added in v0.8.1
func (s *Store) List() ([]*Deployment, error)
List returns all deployment names in alphabetical order.
func (*Store) Update ¶ added in v0.8.1
func (s *Store) Update(d *Deployment) error
Update persists changes to an existing Deployment.