computer

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 18, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package computer defines the Computer interface for screen-based environments. This package contains only the interface and types — no implementations. Implementations live in github.com/apteva/computer (browserbase, service, etc.)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AnthropicBetaHeader

func AnthropicBetaHeader(toolVersion string) string

AnthropicBetaHeader returns the appropriate beta header for computer use.

func HandleComputerAction

func HandleComputerAction(comp Computer, args map[string]string) (text string, screenshot []byte, err error)

HandleComputerAction executes a screen interaction action (no navigate). Normalizes provider-specific action names (e.g. Claude's left_click → click). Retries screenshot if it fails after a click (page may be mid-navigation).

func HandleGeminiComputerAction

func HandleGeminiComputerAction(comp Computer, name string, args map[string]string) (text string, screenshot []byte, err error)

HandleGeminiComputerAction translates a Gemini Computer Use action to our Computer interface. Gemini uses normalized 0-999 coordinates; we denormalize to actual pixels.

func HandleSessionAction

func HandleSessionAction(comp Computer, args map[string]string) (text string, screenshot []byte, err error)

HandleSessionAction manages browser session lifecycle.

func IsGeminiComputerAction

func IsGeminiComputerAction(name string) bool

IsGeminiComputerAction returns true if the function name is a Gemini Computer Use predefined action.

func NormalizeActionType

func NormalizeActionType(action string) string

geminiComputerUseActions maps Gemini native Computer Use function names. NormalizeActionType maps provider-specific action names to our standard names. Claude sends left_click, right_click, etc. — we normalize to what Computer.Execute understands.

Types

type Action

type Action struct {
	Type      string `json:"type"`                // "click", "double_click", "type", "key", "scroll", "screenshot", "navigate", "wait"
	X         int    `json:"x,omitempty"`         // click/scroll coordinate
	Y         int    `json:"y,omitempty"`         // click/scroll coordinate
	Text      string `json:"text,omitempty"`      // for "type" action
	Key       string `json:"key,omitempty"`       // for "key" action (e.g. "Enter", "Escape")
	Direction string `json:"direction,omitempty"` // for "scroll": "up", "down", "left", "right"
	Amount    int    `json:"amount,omitempty"`    // scroll amount
	URL       string `json:"url,omitempty"`       // for "navigate"
	Duration  int    `json:"duration,omitempty"`  // for "wait" (milliseconds)
	// Label: Set-of-Mark target. When non-zero, click/double_click
	// resolve the target via the label→bbox map populated by the
	// most recent screenshot. Takes precedence over X/Y when set.
	// Implementations that don't support SoM fall back to X/Y.
	Label int `json:"label,omitempty"`
}

Action represents a normalized computer use action.

type AnthropicToolSpec

type AnthropicToolSpec struct {
	Type            string `json:"type"`
	Name            string `json:"name"`
	DisplayWidthPx  int    `json:"display_width_px"`
	DisplayHeightPx int    `json:"display_height_px"`
}

AnthropicToolSpec is the native Claude computer use tool format.

func GetAnthropicToolSpec

func GetAnthropicToolSpec(display DisplaySize, toolVersion string) AnthropicToolSpec

GetAnthropicToolSpec returns the native Anthropic computer use tool spec.

type Computer

type Computer interface {
	// Execute performs an action and returns a screenshot.
	Execute(action Action) (screenshot []byte, err error)

	// Screenshot takes a screenshot without performing any action.
	Screenshot() ([]byte, error)

	// DisplaySize returns the screen dimensions.
	DisplaySize() DisplaySize

	// Close terminates the session and releases resources.
	Close() error
}

Computer is the interface for screen-based environments.

type Context

type Context struct {
	// ID is the provider-issued identifier returned at context-create time.
	ID string `json:"id"`
	// Persist controls whether changes (new cookies, storage writes) are
	// saved back to the context at session close. Default true mirrors
	// Browserbase's default; set false for one-shot read-only attaches.
	Persist bool `json:"persist"`
}

Context binds a session to a persistent state bundle (cookies, localStorage, IndexedDB, ServiceWorkers, Cache) that survives across sessions. Per-provider mapping:

  • Browserbase → browserSettings.context = {id, persist}
  • Browser Engine → context = {id, persist}
  • Steel → profileId / persistProfile (Steel calls these "profiles" but the lifecycle and intent match)

IDs are provider-scoped: a Browserbase context id will not resolve on Steel, and vice versa. Concurrent attaches to the same context on the same provider are unsafe (Chrome can't share a user-data-dir); each backend serializes or 409s. Local / service backends ignore Context.

type ContextInfo

type ContextInfo interface {
	ContextID() string
}

ContextInfo is an optional interface for computers attached to a persistent context. status surfaces the bound context id so the agent can confirm which identity it's running as. Implementations that do not support contexts (or aren't currently bound) should not implement this interface — the type assertion in status will simply skip it.

type DisplaySize

type DisplaySize struct {
	Width  int `json:"width"`
	Height int `json:"height"`
}

DisplaySize holds screen dimensions.

type OpenOptions

type OpenOptions struct {
	// URL to navigate to after the session is established. Optional;
	// when empty the session is opened but no navigation is issued
	// (useful for resume to a session that's already on a page).
	URL string

	// ContextID binds the new session to a persistent context. Mutually
	// exclusive with SessionID. Provider-scoped — see Context.
	ContextID string
	// Persist controls whether changes are saved back to the context
	// at session close. Defaults true (matches Browserbase default).
	Persist bool

	// SessionID, when set, attaches to an existing session instead of
	// creating a new one. Mutually exclusive with ContextID. Provider
	// requirements vary: Browserbase needs the session to have been
	// created with KeepAlive=true; Browser Engine accepts both live
	// and snapshot-saved sessions; Steel and local backends reject it.
	SessionID string

	// Timeout sets the new session's max lifetime in seconds. Ignored
	// for SessionID attaches (the timeout was set at original create).
	// Zero leaves the provider's server-side default in place.
	Timeout int

	// Proxy, when non-nil, decides whether the new session routes
	// egress through the backend's managed residential proxy. nil
	// leaves the harness/backend default; &true forces on; &false
	// forces off. Honored by browser-engine, browserbase, steel;
	// ignored by local. Set by the agent via the browser_session
	// open tool — the agent owns the policy decision.
	Proxy *bool

	// ProxyCountry is an ISO-2 country code for the residential
	// proxy exit (e.g. "US"). Honored by browser-engine; ignored by
	// browserbase + steel (they need a custom proxy list for that).
	ProxyCountry string
}

OpenOptions describes a session-open intent: which url to land on, which persistent context (if any) to bind, and whether to attach to an existing session id instead of creating a new one. The agent owns these decisions — they're tool-call arguments, not factory config.

type Resumable

type Resumable interface {
	Resume(sessionID string) error
}

Resumable is an optional interface for computers that support session resumption.

type SessionInfo

type SessionInfo interface {
	SessionType() string // "local", "browserbase", "service"
	SessionID() string   // empty for local
	CurrentURL() string  // current page URL
}

SessionInfo is an optional interface for computers that can report session details.

type SessionOpener

type SessionOpener interface {
	OpenSession(opts OpenOptions) error
}

SessionOpener is implemented by Computers that own session lifecycle. One method covers create-with-context, attach-by-id, and re-bind to a different context — all by varying OpenOptions. Implementations MUST tear down the current session (if different) before establishing the new one. Local / service backends implement this as a thin nav.

type Timeoutable

type Timeoutable interface {
	ExtendTimeout(seconds int) error
}

Timeoutable is an optional interface for computers whose backend session has a configurable max lifetime that the agent may want to extend mid-task. Browser Engine implements this; local Chrome and providers without an API-controlled lease return ErrNotSupported.

type ToolDefinition

type ToolDefinition struct {
	Name        string
	Description string
	Syntax      string
	Rules       string
	Parameters  map[string]any
}

ToolDefinition describes a tool for non-Anthropic providers.

func GetComputerToolDef

func GetComputerToolDef(display DisplaySize) ToolDefinition

GetComputerToolDef returns the computer_use tool definition.

func GetSessionToolDef

func GetSessionToolDef() ToolDefinition

GetSessionToolDef returns the browser_session tool definition.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL