Documentation
¶
Overview ¶
Package computer defines the Computer interface for screen-based environments. This package contains only the interface and types — no implementations. Implementations live in github.com/apteva/computer (browserbase, service, etc.)
Index ¶
- func AnthropicBetaHeader(toolVersion string) string
- func HandleComputerAction(comp Computer, args map[string]string) (text string, screenshot []byte, err error)
- func HandleGeminiComputerAction(comp Computer, name string, args map[string]string) (text string, screenshot []byte, err error)
- func HandleSessionAction(comp Computer, args map[string]string) (text string, screenshot []byte, err error)
- func IsGeminiComputerAction(name string) bool
- func NormalizeActionType(action string) string
- type Action
- type AnthropicToolSpec
- type Computer
- type Context
- type ContextInfo
- type DisplaySize
- type OpenOptions
- type Resumable
- type SessionInfo
- type SessionOpener
- type Timeoutable
- type ToolDefinition
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AnthropicBetaHeader ¶
AnthropicBetaHeader returns the appropriate beta header for computer use.
func HandleComputerAction ¶
func HandleComputerAction(comp Computer, args map[string]string) (text string, screenshot []byte, err error)
HandleComputerAction executes a screen interaction action (no navigate). Normalizes provider-specific action names (e.g. Claude's left_click → click). Retries screenshot if it fails after a click (page may be mid-navigation).
func HandleGeminiComputerAction ¶
func HandleGeminiComputerAction(comp Computer, name string, args map[string]string) (text string, screenshot []byte, err error)
HandleGeminiComputerAction translates a Gemini Computer Use action to our Computer interface. Gemini uses normalized 0-999 coordinates; we denormalize to actual pixels.
func HandleSessionAction ¶
func HandleSessionAction(comp Computer, args map[string]string) (text string, screenshot []byte, err error)
HandleSessionAction manages browser session lifecycle.
func IsGeminiComputerAction ¶
IsGeminiComputerAction returns true if the function name is a Gemini Computer Use predefined action.
func NormalizeActionType ¶
geminiComputerUseActions maps Gemini native Computer Use function names. NormalizeActionType maps provider-specific action names to our standard names. Claude sends left_click, right_click, etc. — we normalize to what Computer.Execute understands.
Types ¶
type Action ¶
type Action struct {
Type string `json:"type"` // "click", "double_click", "type", "key", "scroll", "screenshot", "navigate", "wait"
X int `json:"x,omitempty"` // click/scroll coordinate
Y int `json:"y,omitempty"` // click/scroll coordinate
Text string `json:"text,omitempty"` // for "type" action
Key string `json:"key,omitempty"` // for "key" action (e.g. "Enter", "Escape")
Direction string `json:"direction,omitempty"` // for "scroll": "up", "down", "left", "right"
Amount int `json:"amount,omitempty"` // scroll amount
URL string `json:"url,omitempty"` // for "navigate"
Duration int `json:"duration,omitempty"` // for "wait" (milliseconds)
// Label: Set-of-Mark target. When non-zero, click/double_click
// resolve the target via the label→bbox map populated by the
// most recent screenshot. Takes precedence over X/Y when set.
// Implementations that don't support SoM fall back to X/Y.
Label int `json:"label,omitempty"`
}
Action represents a normalized computer use action.
type AnthropicToolSpec ¶
type AnthropicToolSpec struct {
Type string `json:"type"`
Name string `json:"name"`
DisplayWidthPx int `json:"display_width_px"`
DisplayHeightPx int `json:"display_height_px"`
}
AnthropicToolSpec is the native Claude computer use tool format.
func GetAnthropicToolSpec ¶
func GetAnthropicToolSpec(display DisplaySize, toolVersion string) AnthropicToolSpec
GetAnthropicToolSpec returns the native Anthropic computer use tool spec.
type Computer ¶
type Computer interface {
// Execute performs an action and returns a screenshot.
Execute(action Action) (screenshot []byte, err error)
// Screenshot takes a screenshot without performing any action.
Screenshot() ([]byte, error)
// DisplaySize returns the screen dimensions.
DisplaySize() DisplaySize
// Close terminates the session and releases resources.
Close() error
}
Computer is the interface for screen-based environments.
type Context ¶
type Context struct {
// ID is the provider-issued identifier returned at context-create time.
ID string `json:"id"`
// Persist controls whether changes (new cookies, storage writes) are
// saved back to the context at session close. Default true mirrors
// Browserbase's default; set false for one-shot read-only attaches.
Persist bool `json:"persist"`
}
Context binds a session to a persistent state bundle (cookies, localStorage, IndexedDB, ServiceWorkers, Cache) that survives across sessions. Per-provider mapping:
- Browserbase → browserSettings.context = {id, persist}
- Browser Engine → context = {id, persist}
- Steel → profileId / persistProfile (Steel calls these "profiles" but the lifecycle and intent match)
IDs are provider-scoped: a Browserbase context id will not resolve on Steel, and vice versa. Concurrent attaches to the same context on the same provider are unsafe (Chrome can't share a user-data-dir); each backend serializes or 409s. Local / service backends ignore Context.
type ContextInfo ¶
type ContextInfo interface {
ContextID() string
}
ContextInfo is an optional interface for computers attached to a persistent context. status surfaces the bound context id so the agent can confirm which identity it's running as. Implementations that do not support contexts (or aren't currently bound) should not implement this interface — the type assertion in status will simply skip it.
type DisplaySize ¶
DisplaySize holds screen dimensions.
type OpenOptions ¶
type OpenOptions struct {
// URL to navigate to after the session is established. Optional;
// when empty the session is opened but no navigation is issued
// (useful for resume to a session that's already on a page).
URL string
// ContextID binds the new session to a persistent context. Mutually
// exclusive with SessionID. Provider-scoped — see Context.
ContextID string
// Persist controls whether changes are saved back to the context
// at session close. Defaults true (matches Browserbase default).
Persist bool
// SessionID, when set, attaches to an existing session instead of
// creating a new one. Mutually exclusive with ContextID. Provider
// requirements vary: Browserbase needs the session to have been
// created with KeepAlive=true; Browser Engine accepts both live
// and snapshot-saved sessions; Steel and local backends reject it.
SessionID string
// Timeout sets the new session's max lifetime in seconds. Ignored
// for SessionID attaches (the timeout was set at original create).
// Zero leaves the provider's server-side default in place.
Timeout int
// Proxy, when non-nil, decides whether the new session routes
// egress through the backend's managed residential proxy. nil
// leaves the harness/backend default; &true forces on; &false
// forces off. Honored by browser-engine, browserbase, steel;
// ignored by local. Set by the agent via the browser_session
// open tool — the agent owns the policy decision.
Proxy *bool
// ProxyCountry is an ISO-2 country code for the residential
// proxy exit (e.g. "US"). Honored by browser-engine; ignored by
// browserbase + steel (they need a custom proxy list for that).
ProxyCountry string
}
OpenOptions describes a session-open intent: which url to land on, which persistent context (if any) to bind, and whether to attach to an existing session id instead of creating a new one. The agent owns these decisions — they're tool-call arguments, not factory config.
type SessionInfo ¶
type SessionInfo interface {
SessionType() string // "local", "browserbase", "service"
SessionID() string // empty for local
CurrentURL() string // current page URL
}
SessionInfo is an optional interface for computers that can report session details.
type SessionOpener ¶
type SessionOpener interface {
OpenSession(opts OpenOptions) error
}
SessionOpener is implemented by Computers that own session lifecycle. One method covers create-with-context, attach-by-id, and re-bind to a different context — all by varying OpenOptions. Implementations MUST tear down the current session (if different) before establishing the new one. Local / service backends implement this as a thin nav.
type Timeoutable ¶
Timeoutable is an optional interface for computers whose backend session has a configurable max lifetime that the agent may want to extend mid-task. Browser Engine implements this; local Chrome and providers without an API-controlled lease return ErrNotSupported.
type ToolDefinition ¶
type ToolDefinition struct {
Name string
Description string
Syntax string
Rules string
Parameters map[string]any
}
ToolDefinition describes a tool for non-Anthropic providers.
func GetComputerToolDef ¶
func GetComputerToolDef(display DisplaySize) ToolDefinition
GetComputerToolDef returns the computer_use tool definition.
func GetSessionToolDef ¶
func GetSessionToolDef() ToolDefinition
GetSessionToolDef returns the browser_session tool definition.