computeruse

package module
v0.0.0-...-bfc404f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 20, 2025 License: MIT Imports: 9 Imported by: 1

README

Computer Use Library

A Go library for browser-based computer use automation, designed for LLM agents (Claude Computer Use, Google Gemini, etc.). Built on go-rod for robust browser control.

Features

  • Unified API: Single set of commands that work for both Claude and Gemini with minimal adaptation
  • Flexible Coordinate System: Choose between normalized (for Gemini, 0-999 grid) or pixel-based coordinates
  • Idiomatic Go: Proper error handling and clean interface design
  • Comprehensive Actions: Supports clicking, typing, scrolling, dragging, keyboard shortcuts, and more
  • Screenshot Capability: Capture browser state for visual feedback to LLMs
  • Session Management: Easy browser lifecycle management with context support

Installation

go get github.com/PeronGH/computer-use-lib

Quick Start

package main

import (
    "context"

    computeruse "github.com/PeronGH/computer-use-lib"
)

func main() {
    // Create a new browser session
    session, err := computeruse.NewSession(context.Background(), computeruse.SessionConfig{
        ScreenWidth:          1440,
        ScreenHeight:         900,
        NormalizeCoordinates: true, // Use 0-999 grid
        InitialURL:           "https://www.google.com",
    })
    if err != nil {
        panic(err)
    }
    defer session.Close()

    // Use the session
    session.Navigate("https://example.com")
    session.ClickAt(500, 500)
    session.TypeText("Hello, World!")
    screenshot, _ := session.Screenshot()
    _ = screenshot
}

API Reference

Session Configuration
type SessionConfig struct {
    ScreenWidth          int    // Browser viewport width
    ScreenHeight         int    // Browser viewport height
    NormalizeCoordinates bool   // If true, use 0-999 grid; if false, use pixels
    InitialURL           string // Starting URL (default: "https://www.google.com")
    SearchEngineURL      string // URL for Search() action (default: "https://www.google.com")
    Headless             bool   // Run browser in headless mode
}
Available Commands

All methods return error for proper error handling.

Method Signature Claude Mapping Gemini Mapping
Screenshot Screenshot() ([]byte, error) screenshot N/A (call separately)
ClickAt ClickAt(x, y int) error left_click click_at
RightClickAt RightClickAt(x, y int) error right_click N/A
MiddleClickAt MiddleClickAt(x, y int) error middle_click N/A
DoubleClickAt DoubleClickAt(x, y int) error double_click N/A
TripleClickAt TripleClickAt(x, y int) error triple_click N/A
MouseDown MouseDown(x, y int) error left_mouse_down N/A
MouseUp MouseUp(x, y int) error left_mouse_up N/A
MouseMove MouseMove(x, y int) error mouse_move N/A
HoverAt HoverAt(x, y int) error mouse_move hover_at
ClickDrag ClickDrag(fromX, fromY, toX, toY int) error left_click_drag drag_and_drop
TypeText TypeText(text string) error type N/A
TypeTextAt TypeTextAt(x, y int, text string, clearBefore, pressEnter bool) error left_click + type + key type_text_at
Key Key(keys ...string) error key key_combination
Scroll Scroll(direction string, amount int) error scroll scroll_document
ScrollAt ScrollAt(x, y int, direction string, magnitude int) error mouse_move + scroll scroll_at
Navigate Navigate(url string) error N/A navigate
GoBack GoBack() error key ("Alt+Left") go_back
GoForward GoForward() error key ("Alt+Right") go_forward
Search Search() error N/A search
GetURL GetURL() (string, error) N/A N/A
Close Close() error N/A N/A

Architecture

The library provides a unified API layer that translates high-level actions into go-rod browser commands:

LLM Agent (Claude/Gemini)
         ↓
Computer Use Library API
         ↓
go-rod (Browser Control)
         ↓
Chrome/Chromium Browser

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Session

type Session struct {
	// contains filtered or unexported fields
}

Session represents a browser automation session

func NewSession

func NewSession(ctx context.Context, config SessionConfig) (*Session, error)

NewSession creates a new browser session with the given configuration

func (*Session) ClickAt

func (s *Session) ClickAt(x, y int) error

ClickAt performs a left click at the specified coordinates

func (*Session) ClickDrag

func (s *Session) ClickDrag(fromX, fromY, toX, toY int) error

ClickDrag performs a click and drag operation from one coordinate to another

func (*Session) Close

func (s *Session) Close() error

Close closes the browser session

func (*Session) DoubleClickAt

func (s *Session) DoubleClickAt(x, y int) error

DoubleClickAt performs a double click at the specified coordinates

func (*Session) GetURL

func (s *Session) GetURL() (string, error)

GetURL returns the current page URL

func (*Session) GoBack

func (s *Session) GoBack() error

GoBack navigates back in browser history

func (*Session) GoForward

func (s *Session) GoForward() error

GoForward navigates forward in browser history

func (*Session) HoverAt

func (s *Session) HoverAt(x, y int) error

HoverAt is an alias for MouseMove, hovers at the specified coordinates

func (*Session) Key

func (s *Session) Key(keys ...string) error

Key presses a key or key combination Examples: Key("Enter"), Key("Control", "C"), Key("Alt", "F4")

func (*Session) MiddleClickAt

func (s *Session) MiddleClickAt(x, y int) error

MiddleClickAt performs a middle click at the specified coordinates

func (*Session) MouseDown

func (s *Session) MouseDown(x, y int) error

MouseDown presses the left mouse button at the specified coordinates

func (*Session) MouseMove

func (s *Session) MouseMove(x, y int) error

MouseMove moves the cursor to the specified coordinates

func (*Session) MouseUp

func (s *Session) MouseUp(x, y int) error

MouseUp releases the left mouse button at the specified coordinates

func (*Session) Navigate

func (s *Session) Navigate(url string) error

Navigate navigates the browser to the specified URL

func (*Session) RightClickAt

func (s *Session) RightClickAt(x, y int) error

RightClickAt performs a right click at the specified coordinates

func (*Session) Screenshot

func (s *Session) Screenshot() ([]byte, error)

Screenshot captures the current browser viewport as a PNG image Returns the PNG image data as a byte slice

func (*Session) Scroll

func (s *Session) Scroll(direction string, amount int) error

Scroll scrolls the page in the specified direction by the given amount direction: "up", "down", "left", "right" amount: scroll distance (in pixels if not normalized, or 0-999 if normalized)

func (*Session) ScrollAt

func (s *Session) ScrollAt(x, y int, direction string, magnitude int) error

ScrollAt scrolls at a specific location on the page x, y: coordinates to scroll at direction: "up", "down", "left", "right" magnitude: scroll amount (0-999 if normalized, pixels otherwise)

func (*Session) Search

func (s *Session) Search() error

Search navigates to the configured search engine URL

func (*Session) TripleClickAt

func (s *Session) TripleClickAt(x, y int) error

TripleClickAt performs a triple click at the specified coordinates

func (*Session) TypeText

func (s *Session) TypeText(text string) error

TypeText types the given text string

func (*Session) TypeTextAt

func (s *Session) TypeTextAt(x, y int, text string, clearBefore, pressEnter bool) error

TypeTextAt clicks at the specified coordinates and types text clearBefore: if true, selects all and deletes before typing pressEnter: if true, presses Enter after typing

type SessionConfig

type SessionConfig struct {
	ScreenWidth          int    // Browser viewport width
	ScreenHeight         int    // Browser viewport height
	NormalizeCoordinates bool   // If true, use 0-999 grid; if false, use pixels
	InitialURL           string // Starting URL (default: "https://www.google.com")
	SearchEngineURL      string // URL for Search() action (default: "https://www.google.com")
	Headless             bool   // Run browser in headless mode
}

SessionConfig holds configuration for a browser session

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL