bua

package module
v0.0.0-...-2fe4d4b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 4, 2026 License: MIT Imports: 9 Imported by: 0

README ΒΆ

BUA - Browser Use Agent for Go

πŸ€– Make websites accessible for AI agents. Automate the web with natural language.

Installation β€’ Quick Start β€’ Features β€’ Configuration β€’ Examples β€’ Architecture

Go Version License Gemini


Why BUA?

Traditional browser automation is fragile. CSS selectors break. XPaths change. Every website update means rewriting scripts.

BUA changes the game. Instead of writing brittle selectors, describe what you want in plain English:

result, _ := agent.Run(ctx, "Go to Amazon and find the best-rated wireless headphones under $100")

The AI agent sees the page, understands your intent, and adapts to any layout. No selectors. No maintenance. Just results.


✨ What Makes BUA Special

Feature Traditional Automation BUA
Selector Maintenance Constant updates needed Zero maintenance
Dynamic Content Complex waits & retries AI understands state
Multi-step Workflows Hundreds of lines One sentence
Layout Changes Scripts break Adapts automatically
New Sites Write new selectors Works immediately

🎯 Perfect For

  • Web Scraping - Extract data from any site without writing parsers
  • Form Automation - Fill applications, registrations, checkout flows
  • E2E Testing - Test user journeys with natural language
  • Data Entry - Automate repetitive web-based tasks
  • Research - Gather information across multiple sources
  • Monitoring - Track prices, inventory, content changes

πŸš€ Installation

go get github.com/anxuanzi/bua

Prerequisites

  • Go 1.25+
  • Chrome/Chromium installed on your system
  • Gemini API Key from Google AI Studio

⚑ Quick Start

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"time"

	"github.com/anxuanzi/bua"
)

func main() {
	// Create agent with your Gemini API key
	agent, err := bua.New(bua.Config{
		APIKey:   os.Getenv("GEMINI_API_KEY"),
		Headless: false, // Watch the magic happen
		Debug:    true,
	})
	if err != nil {
		log.Fatal(err)
	}
	defer agent.Close()

	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
	defer cancel()

	// Start the browser
	if err := agent.Start(ctx); err != nil {
		log.Fatal(err)
	}

	// Run a task with natural language
	result, err := agent.Run(ctx,
		"Go to Hacker News and find the top 3 stories about AI")
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("βœ… Success: %v\n", result.Success)
	fmt.Printf("πŸ“Š Steps taken: %d\n", len(result.Steps))
	fmt.Printf("⏱️  Duration: %v\n", result.Duration)
}

That's it. The agent navigates to Hacker News, scans the stories, identifies AI-related content, and returns the results.


πŸ› οΈ Features

🧠 Intelligent Navigation

BUA doesn't just click buttonsβ€”it understands web pages:

// The agent figures out HOW to accomplish the task
agent.Run(ctx, "Find flights from NYC to London for next weekend, sort by price")

// Multi-step workflows handled automatically
agent.Run(ctx, "Log into my account, go to settings, and change my timezone to PST")

πŸ‘οΈ Vision-Enabled

Screenshots are analyzed by the LLM for visual understanding:

cfg := bua.Config{
Preset: bua.PresetQuality, // High-res screenshots
// Vision is enabled by default
}

πŸ₯· Stealth Mode

Built-in anti-detection measures help avoid bot blocking:

  • Navigator property spoofing
  • WebGL fingerprint masking
  • Plugin emulation
  • Human-like mouse movements
  • Random action delays

πŸ“Έ Screenshot Annotations

Visual debugging with element indices overlaid on screenshots:

cfg := bua.Config{
ShowAnnotations: true, // See what the AI sees
ScreenshotDir:   "./debug",
}

Annotated Screenshot

πŸŽ›οΈ Flexible Presets

Optimize for speed, cost, or quality:

Preset Tokens Screenshot Best For
PresetFast 8K None (text-only) Simple tasks, lowest cost
PresetEfficient 16K 800px @ 60% Balanced cost/capability
PresetBalanced 32K 1280px @ 75% Default - most tasks
PresetQuality 64K 1920px @ 85% Complex visual tasks
PresetMax 128K 2560px @ 95% Maximum accuracy

πŸ” Sensitive Data Protection

Automatic redaction of sensitive information in logs:

// API keys, passwords, SSNs, credit cards are automatically masked
// <secret type="api_key">[REDACTED]</secret>

πŸ—‚οΈ Tab Management

Handle complex multi-tab workflows:

// Open comparison shopping tabs
tab1, _ := agent.NewTab(ctx, "https://amazon.com")
tab2, _ := agent.NewTab(ctx, "https://ebay.com")

agent.SwitchTab(tab1)
agent.Run(ctx, "Search for 'mechanical keyboard'")

agent.SwitchTab(tab2)
agent.Run(ctx, "Search for 'mechanical keyboard' and compare prices")

πŸ’Ύ Session Persistence

Save and restore browser sessions:

cfg := bua.Config{
ProfileName: "my-shopping-session",
ProfileDir:  "~/.bua/profiles",
// Cookies, localStorage, login state preserved
}

βš™οΈ Configuration

Full Configuration Options

cfg := bua.Config{
// Required
APIKey: "your-gemini-api-key",

// LLM Settings
Model: "gemini-2.5-flash", // or "gemini-2.0-flash", etc.

// Browser Settings
Headless:    false,        // true for background operation
ProfileName: "persistent", // empty = temporary profile
ProfileDir:  "~/.bua/profiles",
Viewport:    &bua.Viewport{Width: 1920, Height: 1080},

// Agent Behavior
MaxSteps:    100, // Max actions before giving up
Preset:      bua.PresetBalanced,

// Screenshot Settings
ScreenshotDir:      "./screenshots",
ScreenshotMaxWidth: 1280,
ScreenshotQuality:  75,
TextOnly:           false, // true disables screenshots
ShowAnnotations:    false, // true shows element indices

// Visual Feedback
ShowHighlight:       true,
HighlightDurationMs: 300,

// Debugging
Debug: true,
}

Environment Variables

export GEMINI_API_KEY="your-api-key-here"

πŸ“– Examples

Example 1: Web Research

result, _ := agent.Run(ctx, `
    Go to Wikipedia and find information about the Go programming language.
    Extract the release date, original author, and main features.
`)

fmt.Println(result.Data) // Extracted information

Example 2: E-commerce Automation

result, _ := agent.Run(ctx, `
    Go to Amazon, search for "USB-C hub", filter by 4+ stars,
    and find the cheapest option with Prime shipping.
`)

for _, step := range result.Steps {
fmt.Printf("[%d] %s: %s\n", step.Number, step.Action, step.NextGoal)
}

Example 3: Form Filling

result, _ := agent.Run(ctx, `
    Go to the contact form at example.com/contact.
    Fill in:
    - Name: John Doe
    - Email: john@example.com
    - Message: I'm interested in your services
    Then submit the form.
`)

Example 4: Multi-Page Workflow

// Navigate first
agent.Navigate(ctx, "https://github.com/login")

// Then automate
result, _ := agent.Run(ctx, `
    Log in with username 'myuser' and password from the password field.
    After logging in, go to my repositories and find the most starred one.
`)

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Your Application                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                   BUA Public API                             β”‚
β”‚         bua.New() β†’ Start() β†’ Run() β†’ Close()               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                   Agent Layer                                β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚    β”‚ BrowserAgentβ”‚  β”‚ ADK Toolkit  β”‚  β”‚ Message Builderβ”‚   β”‚
β”‚    β”‚  (LLM Loop) β”‚  β”‚ (20+ Tools)  β”‚  β”‚  (History)     β”‚   β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                   Browser Layer                              β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚    β”‚   Browser   β”‚  β”‚    Page      β”‚  β”‚   Stealth      β”‚   β”‚
β”‚    β”‚ (Lifecycle) β”‚  β”‚ (Actions)    β”‚  β”‚   (Evasion)    β”‚   β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                   Support Layer                              β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚    β”‚     DOM     β”‚  β”‚  Screenshot  β”‚                        β”‚
β”‚    β”‚ (Extraction)β”‚  β”‚ (Annotation) β”‚                        β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              go-rod (Chrome DevTools Protocol)              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                   Chrome / Chromium                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent Loop

Task: "Search for Go tutorials"
           β”‚
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Get Page State     │◄─────────────────────┐
β”‚   (DOM + Screenshot) β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
           β”‚                                  β”‚
           β–Ό                                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚   LLM Reasoning      β”‚                      β”‚
β”‚   (Gemini + Tools)   β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
           β”‚                                  β”‚
           β–Ό                                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚   Execute Action     β”‚                      β”‚
β”‚   (click, type, etc) β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
           β”‚                                  β”‚
           β–Ό                                  β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Task Done?   │───No───►│ Update State  β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚Yes                      β”‚
           β–Ό                         β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
    β”‚ Return Resultβ”‚                 β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
                                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ Available Tools

The agent has access to 20+ browser automation tools:

Category Tools
Navigation navigate, go_back, go_forward, reload
Interaction click, type_text, clear_and_type, hover, double_click, focus
Scrolling scroll, scroll_to_element
Keyboard send_keys (Enter, Tab, Escape, etc.)
Observation get_page_state, screenshot, extract_content
JavaScript evaluate_js
Tabs new_tab, switch_tab, close_tab, list_tabs
Completion done

πŸ“Š Comparison with Browser-Use (Python)

BUA is inspired by the popular browser-use Python library. Here's how they compare:

Aspect Browser-Use (Python) BUA (Go)
Language Python 3.11+ Go 1.25+
LLM Support OpenAI, Claude, Gemini, Ollama Gemini (via ADK), other models soon.
Browser Engine Playwright go-rod (CDP direct)
Performance Good Excellent (compiled, no runtime)
Deployment Python environment Single binary
Memory Higher (Python + Node.js) Lower (native Go)
Concurrency asyncio Native goroutines
Anti-Detection βœ… βœ…
Vision Support βœ… βœ…
Custom Tools βœ… βœ…

Why Choose BUA?

  • πŸš€ Performance: Go's compiled nature means faster startup and lower memory
  • πŸ“¦ Simple Deployment: Single binary, no Python/Node.js dependencies
  • ⚑ Concurrency: Native goroutines for parallel operations
  • πŸ”’ Type Safety: Catch errors at compile time
  • 🏒 Enterprise Ready: Common choice for production services

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments


Built with ❀️ for the Go community

Report Bug β€’ Request Feature

Documentation ΒΆ

Index ΒΆ

Constants ΒΆ

This section is empty.

Variables ΒΆ

View Source
var (
	// ErrMissingAPIKey is returned when Config.APIKey is not set.
	ErrMissingAPIKey = errors.New("bua: API key is required")

	// ErrNotStarted is returned when Run is called before Start.
	ErrNotStarted = errors.New("bua: agent not started, call Start() first")

	// ErrAlreadyStarted is returned when Start is called twice.
	ErrAlreadyStarted = errors.New("bua: agent already started")

	// ErrMaxStepsReached is returned when the agent exceeds MaxSteps.
	ErrMaxStepsReached = errors.New("bua: maximum steps reached without completing task")

	// ErrBrowserClosed is returned when the browser is unexpectedly closed.
	ErrBrowserClosed = errors.New("bua: browser was closed")

	// ErrElementNotFound is returned when an element index is invalid.
	ErrElementNotFound = errors.New("bua: element not found")

	// ErrElementNotVisible is returned when an element is not visible.
	ErrElementNotVisible = errors.New("bua: element is not visible")

	// ErrNavigationFailed is returned when page navigation fails.
	ErrNavigationFailed = errors.New("bua: navigation failed")

	// ErrTimeout is returned when an operation times out.
	ErrTimeout = errors.New("bua: operation timed out")

	// ErrHumanTakeoverTimeout is returned when human intervention times out.
	ErrHumanTakeoverTimeout = errors.New("bua: human takeover timed out")
)

Common errors returned by the bua package.

Functions ΒΆ

This section is empty.

Types ΒΆ

type Agent ΒΆ

type Agent struct {
	// contains filtered or unexported fields
}

Agent is the main interface for browser automation with LLM.

func New ΒΆ

func New(cfg Config) (*Agent, error)

New creates a new browser automation agent. Call Start() before using Run().

func (*Agent) Close ΒΆ

func (a *Agent) Close() error

Close shuts down the browser and cleans up resources.

func (*Agent) CloseTab ΒΆ

func (a *Agent) CloseTab(tabID string) error

CloseTab closes a tab by ID.

func (*Agent) GetTitle ΒΆ

func (a *Agent) GetTitle() string

GetTitle returns the current page title.

func (*Agent) GetURL ΒΆ

func (a *Agent) GetURL() string

GetURL returns the current page URL.

func (*Agent) IsStarted ΒΆ

func (a *Agent) IsStarted() bool

IsStarted returns whether the agent has been started.

func (*Agent) ListTabs ΒΆ

func (a *Agent) ListTabs() []TabInfo

ListTabs returns information about all open tabs.

func (*Agent) Navigate ΒΆ

func (a *Agent) Navigate(ctx context.Context, url string) error

Navigate opens a URL in the browser. This is a convenience method for direct navigation without a task.

func (*Agent) NewTab ΒΆ

func (a *Agent) NewTab(ctx context.Context, url string) (string, error)

NewTab opens a new browser tab.

func (*Agent) Run ΒΆ

func (a *Agent) Run(ctx context.Context, task string) (*Result, error)

Run executes a task described in natural language. Returns a Result containing the outcome and execution details.

func (*Agent) Start ΒΆ

func (a *Agent) Start(ctx context.Context) error

Start launches the browser and initializes the agent.

func (*Agent) SwitchTab ΒΆ

func (a *Agent) SwitchTab(tabID string) error

SwitchTab switches to a different tab by ID.

func (*Agent) WithContext ΒΆ

func (a *Agent) WithContext(ctx context.Context) *ContextualAgent

WithContext returns a helper for chaining operations with context.

type Config ΒΆ

type Config struct {
	// APIKey is the Gemini API key (required).
	APIKey string

	// Model is the Gemini model to use. Default: "gemini-2.5-flash".
	Model string

	// Headless runs the browser without a visible window. Default: false.
	Headless bool

	// Debug enables verbose logging. Default: false.
	Debug bool

	// ProfileName specifies a named browser profile for session persistence.
	// Empty string uses a temporary profile that is deleted on close.
	ProfileName string

	// ProfileDir is the directory to store browser profiles.
	// Default: ~/.bua/profiles
	ProfileDir string

	// Viewport sets the browser viewport dimensions.
	// Default: 1280x720
	Viewport *Viewport

	// MaxSteps is the maximum number of agent steps before giving up.
	// Default: 100
	MaxSteps int

	// Preset configures token/quality tradeoffs.
	// Default: PresetBalanced
	Preset Preset

	// MaxTokens is the maximum token budget for context.
	// Set automatically based on Preset if not specified.
	MaxTokens int

	// MaxElements is the maximum number of elements to include in state.
	// Set automatically based on Preset if not specified.
	MaxElements int

	// ScreenshotMaxWidth is the maximum width for screenshots.
	// Set automatically based on Preset if not specified.
	ScreenshotMaxWidth int

	// ScreenshotQuality is the JPEG quality (1-100) for screenshots.
	// Set automatically based on Preset if not specified.
	ScreenshotQuality int

	// TextOnly disables screenshots entirely for minimum token usage.
	// Set automatically based on Preset if not specified.
	TextOnly bool

	// ShowAnnotations displays element indices on the page during execution.
	// Useful for debugging. Default: false.
	ShowAnnotations bool

	// ShowHighlight highlights elements before actions.
	// Default: true.
	ShowHighlight bool

	// HighlightDuration is how long to show action highlights.
	// Default: 300ms.
	HighlightDurationMs int

	// ScreenshotDir is the directory to save screenshots.
	// Default: system temp directory.
	ScreenshotDir string
}

Config holds agent configuration.

type ContextualAgent ΒΆ

type ContextualAgent struct {
	// contains filtered or unexported fields
}

ContextualAgent wraps Agent with a context for convenience methods.

func (*ContextualAgent) Navigate ΒΆ

func (ca *ContextualAgent) Navigate(url string) error

Navigate opens a URL using the stored context.

func (*ContextualAgent) NewTab ΒΆ

func (ca *ContextualAgent) NewTab(url string) (string, error)

NewTab opens a new tab using the stored context.

func (*ContextualAgent) Run ΒΆ

func (ca *ContextualAgent) Run(task string) (*Result, error)

Run executes a task using the stored context.

type Preset ΒΆ

type Preset string

Preset defines token/quality tradeoffs for different use cases.

const (
	// PresetFast uses text-only mode for lowest token usage.
	PresetFast Preset = "fast"

	// PresetEfficient uses low quality screenshots.
	PresetEfficient Preset = "efficient"

	// PresetBalanced is the default with good balance of quality and cost.
	PresetBalanced Preset = "balanced"

	// PresetQuality uses higher quality screenshots.
	PresetQuality Preset = "quality"

	// PresetMax uses maximum quality for complex pages.
	PresetMax Preset = "max"
)

type Result ΒΆ

type Result struct {
	// Success indicates whether the task completed successfully.
	Success bool

	// Data contains the extracted data or task output.
	// The type depends on what the agent was asked to do.
	Data any

	// Error contains the error message if Success is false.
	Error string

	// Steps contains the sequence of actions taken during execution.
	Steps []Step

	// Duration is the total execution time.
	Duration time.Duration

	// TokensUsed is the approximate number of tokens consumed.
	TokensUsed int

	// ScreenshotPaths contains paths to saved screenshots.
	ScreenshotPaths []string
}

Result represents the outcome of a task execution.

type Step ΒΆ

type Step struct {
	// Number is the step index (1-based).
	Number int

	// Action is the tool that was called (e.g., "click", "type_text").
	Action string

	// Target describes what the action was performed on.
	Target string

	// Thinking contains the agent's reasoning for this step.
	Thinking string

	// Evaluation is the agent's assessment of the previous action.
	Evaluation string

	// NextGoal describes what the agent planned to do.
	NextGoal string

	// Memory contains what the agent chose to remember.
	Memory string

	// URL is the page URL at this step.
	URL string

	// Title is the page title at this step.
	Title string

	// ScreenshotPath is the path to the screenshot for this step.
	ScreenshotPath string

	// Duration is how long this step took.
	Duration time.Duration

	// Error contains any error that occurred during this step.
	Error string
}

Step represents a single action in the execution sequence.

type TabInfo ΒΆ

type TabInfo struct {
	ID     string
	URL    string
	Title  string
	Active bool
}

TabInfo contains information about a browser tab.

type Viewport ΒΆ

type Viewport struct {
	Width  int
	Height int
}

Viewport defines browser viewport dimensions.

func DefaultViewport ΒΆ

func DefaultViewport() Viewport

DefaultViewport returns the default viewport size.

Directories ΒΆ

Path Synopsis
examples
01_quick_start command
02_annotations command
tests
e2e command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL