browse

package module
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 26, 2026 License: MIT Imports: 19 Imported by: 0

README

Scout

Scout

AI-powered browser automation for Go. Pure CDP over WebSocket — no rod, no chromedp, no Node.js.

Release License CI Go Reference Coverage Security

A single scout binary gives you a full CLI, a 66-tool MCP server, and a Go library with Gin-like middleware composition.

brew install felixgeelhaar/tap/scout

Quick Start

# CLI — visible browser, one-shot commands
scout observe https://example.com          # structured page snapshot
scout markdown https://news.ycombinator.com # page as compact markdown
scout screenshot https://github.com         # save screenshot
scout extract https://example.com h1        # extract element text
scout frameworks https://react.dev          # detect React, Vue, etc.

# MCP Server — give AI agents browser superpowers
claude mcp add scout -- scout mcp serve

# Browser UI — conversational browser automation
scout ui serve --provider=ollama --model=mistral
cd ui && npm install && npm run dev  # open http://localhost:3000

Install

# Homebrew
brew install felixgeelhaar/tap/scout

# Direct binary
curl -fsSL https://raw.githubusercontent.com/felixgeelhaar/scout/main/install.sh | bash

# Go
go install github.com/felixgeelhaar/scout/cmd/scout@latest

# As a library
go get github.com/felixgeelhaar/scout

MCP Server — 66 Tools

Single binary, zero runtime dependencies. Configure in any MCP client:

claude mcp add scout -- scout mcp serve           # Claude Code
{"mcpServers": {"scout": {"command": "scout", "args": ["mcp", "serve"]}}}

Tool Categories

Category Tools
Navigation navigate, observe, observe_diff, observe_with_budget
Interaction click, click_label, type, hover, double_click, right_click, select_option, scroll_to, scroll_by, focus, drag_drop, dispatch_event
Forms fill_form, fill_form_semantic, discover_form
Extraction extract, extract_all, extract_table, auto_extract, scroll_and_collect, markdown, readable_text, accessibility_tree
Capture screenshot, annotated_screenshot, pdf
Network enable_network_capture, network_requests
Tabs open_tab, switch_tab, close_tab, list_tabs
Frameworks wait_spa, detect_frameworks, component_state, app_state
Playback start_recording, stop_recording, save_playbook, replay_playbook
Smart Helpers dismiss_cookies, check_readiness, suggest_selectors, session_history
Vision hybrid_observe, find_by_coordinates
Batch execute_batch
Iframe switch_to_frame, switch_to_main_frame
Trace start_trace, stop_trace
Diagnostics detect_dialog, detect_auth_wall, console_errors, compare_tabs, upload_file
Utility has_element, wait_for, configure, web_vitals, select_by_prompt

All tools have MCP annotations (ReadOnly, OpenWorld, ClosedWorld, Idempotent) for smart auto-approval. Read-only tools like observe, extract, and screenshot run without permission prompts.

Runtime Configuration

Switch between headless and visible browser without restarting:

Agent: configure(headless: false)   → browser window appears
Agent: navigate("https://...")       → watch it work
Agent: configure(headless: true)     → back to headless

Browser UI

A conversational browser automation interface. Type natural language, watch the browser respond in real-time.

# Start the AG-UI server (Go backend)
scout ui serve --provider=ollama --model=mistral    # local, no API key
scout ui serve --provider=claude                     # needs ANTHROPIC_API_KEY
scout ui serve --provider=openai --model=gpt-4o     # needs OPENAI_API_KEY
scout ui serve --provider=groq --base-url=https://api.groq.com/openai --model=llama-3.3-70b-versatile

# Start the Vue frontend
cd ui && npm install && npm run dev                  # http://localhost:3000

The UI streams AG-UI protocol events over SSE:

  • Chat panel with markdown rendering and quick-action pills
  • Live browser viewport with screenshot streaming and URL bar
  • Activity timeline showing tool calls in real-time
  • Stop button to cancel mid-stream

The Go server handles the agentic loop: LLM decides which scout tools to call, executes them, streams browser state deltas back to the frontend. Supports any OpenAI-compatible endpoint via --base-url.

Agent Package

High-level Go API for AI agents. Structured output, auto-wait, goroutine-safe.

session, _ := agent.NewSession(agent.SessionConfig{Headless: true})
defer session.Close()

// Navigate and observe
session.Navigate("https://example.com")
obs, _ := session.Observe()               // links, inputs, buttons, text + action costs

// DOM diff — only what changed (saves 50-80% tokens)
session.Click("#submit")
_, diff, _ := session.ObserveDiff()
// diff.Classification: "modal_appeared"
// diff.Summary: "Modal/dialog appeared: Login required"

// Semantic form filling — no CSS selectors
session.FillFormSemantic(map[string]string{
    "Email": "user@example.com", "Password": "secret",
})

// Visual grounding — click by number, not selector
result, _ := session.AnnotatedScreenshot()  // numbered labels on elements
session.ClickLabel(7)                        // click element [7]

// Multi-tab coordination
session.OpenTab("pricing", "https://example.com/pricing")
session.SwitchTab("default")

// Framework detection (19 frameworks)
frameworks, _ := session.DetectedFrameworks() // ["react", "nextjs"]
state, _ := session.ComponentState("#app")    // read React/Vue state

// Network capture — read API responses directly
session.EnableNetworkCapture("/api/")
captured := session.CapturedRequests("/api/users")

// Action replay — record once, replay without LLM
session.StartRecordingPlaybook("login-flow")
// ... do stuff ...
pb, _ := session.StopRecordingPlaybook()
agent.SavePlaybook(pb, "login.json")
// Later: session.ReplayPlaybook(pb)  // 100x cheaper

// Persistent profiles
session.SaveProfile("session.json")   // cookies + localStorage
session.LoadProfile("session.json")

// Content distillation (5 levels)
session.Markdown()          // ~2-8KB compact markdown
session.ReadableText()      // ~1-4KB main content only
session.AccessibilityTree() // ~1-4KB semantic tree
session.ObserveWithBudget(500) // fit in ~500 tokens

Core Library

Gin-like Engine/Context/Group/HandlerFunc with middleware composition:

engine := browse.Default(browse.WithHeadless(true))
engine.MustLaunch()
defer engine.Close()

engine.Use(middleware.Stealth())
engine.Use(middleware.Retry(middleware.RetryConfig{MaxAttempts: 3}))
engine.Use(middleware.Timeout(30 * time.Second))

admin := engine.Group("admin", middleware.BasicAuth("admin", "secret"))
admin.Task("export", func(c *browse.Context) {
    c.MustNavigate("https://app.example.com/admin")
    table, _ := c.ExtractTable("#users")
    c.Set("data", table)
})

engine.RunGroup("admin")

Middleware

Category Middleware
Resilience Retry, Timeout, CircuitBreaker, RateLimit, Bulkhead
Auth BearerAuth, BasicAuth, CookieAuth, HeaderAuth
Anti-detection Stealth (10 patches: webdriver, plugins, WebGL, etc.)
Network BlockResources, WaitNetworkIdle
Utilities ScreenshotOnError, SlowMotion, Viewport

CLI

CLI defaults to visible browser (--headless to hide):

scout navigate <url>                  # page info as JSON
scout observe <url>                   # structured observation
scout markdown <url>                  # compact markdown
scout screenshot <url> [--output f]   # save screenshot
scout pdf <url> [--output f]          # save PDF
scout extract <url> <selector>        # extract text
scout eval <url> <expression>         # run JavaScript
scout form discover <url>             # discover form fields
scout frameworks <url>                # detect frameworks
scout watch <url> [--interval=5s]     # live-watch page changes
scout pipe <command> [selector]       # batch process URLs from stdin
scout record <url> [--output f]       # interactive recording → playbook
scout mcp serve                       # start MCP server
scout version                         # print version

Architecture

scout/
├── browse.go, engine.go, context.go   # Gin-like API
├── page.go, selection.go              # CDP page & element interaction
├── recorder.go                        # Video recording (screencast → MP4/GIF)
├── middleware/                        # stealth, resilience, auth, network
├── agent/                             # AI agent API (50+ methods)
│   ├── session.go                     # Session lifecycle, Navigate, Click, Type
│   ├── observe.go, diff.go            # Observe, ObserveDiff, cost estimation
│   ├── content.go                     # Markdown, ReadableText, AccessibilityTree
│   ├── form.go                        # DiscoverForm, FillFormSemantic, MatchFormField
│   ├── annotate.go                    # AnnotatedScreenshot, ClickLabel
│   ├── network.go                     # EnableNetworkCapture, CapturedRequests
│   ├── spa.go                         # DetectedFrameworks, ComponentState, GetAppState
│   ├── tabs.go                        # OpenTab, SwitchTab, CloseTab, ListTabs
│   ├── playbook.go                    # StartRecording, ReplayPlaybook, SavePlaybook
│   ├── interact.go                    # Hover, DragDrop, SelectOption, ScrollTo
│   ├── profile.go                     # CaptureProfile, ApplyProfile, SaveProfile
│   ├── selector.go                    # Playwright :text() selector translation
│   ├── budget.go                      # ObserveWithBudget, EstimateTokens
│   ├── nlselect.go                    # SelectByPrompt, fuzzy NL element matching
│   ├── batch.go                       # ExecuteBatch, sequential multi-action
│   ├── vision.go                      # HybridObserve, FindByCoordinates
│   ├── trace.go                       # StartTrace, StopTrace, action tracing
│   ├── iframe.go                      # SwitchToFrame, SwitchToMainFrame
│   └── vitals.go                      # WebVitals (LCP/CLS/INP)
├── internal/cdp/                      # WebSocket CDP client (context-aware)
├── internal/launcher/                 # Chrome process management
├── cmd/scout/                         # CLI + MCP server (66 tools)
└── docs/                              # Landing page (GitHub Pages)

License

MIT

Documentation

Overview

Package browse provides a Gin-like API for browser automation using pure CDP over WebSocket.

browse-go applies Gin's middleware/context/group patterns to browser automation, giving Go developers a familiar, composable way to script browser interactions.

engine := browse.Default(browse.WithHeadless(true))
engine.MustLaunch()
defer engine.Close()

engine.Task("search", func(c *browse.Context) {
    c.MustNavigate("https://example.com")
    c.El("input[name=q]").MustInput("hello")
    c.El("button[type=submit]").MustClick()
})

engine.Run("search")

Index

Constants

View Source
const (
	EventStart   statekit.EventType = "START"
	EventSuccess statekit.EventType = "SUCCESS"
	EventFail    statekit.EventType = "FAIL"
	EventRetry   statekit.EventType = "RETRY"
	EventAbort   statekit.EventType = "ABORT"
	EventReset   statekit.EventType = "RESET"
)

Task lifecycle events

View Source
const (
	StatePending  statekit.StateID = "pending"
	StateRunning  statekit.StateID = "running"
	StateSuccess  statekit.StateID = "success"
	StateFailed   statekit.StateID = "failed"
	StateAborted  statekit.StateID = "aborted"
	StateRetrying statekit.StateID = "retrying"
)

Task lifecycle states

Variables

View Source
var DefaultURLValidator = URLValidator{AllowPrivateIPs: false}

DefaultURLValidator blocks private IPs and non-http(s) schemes.

Functions

func NewTaskLifecycle

func NewTaskLifecycle(taskName string) (*statekit.MachineConfig[TaskLifecycleContext], error)

NewTaskLifecycle creates a statekit machine modeling the task execution lifecycle.

pending → START → running
running → SUCCESS → success (final)
running → FAIL → failed (final)
running → ABORT → aborted (final)
running → RETRY → retrying
retrying → START → running
failed → RESET → pending

func SetLogger

func SetLogger(l *bolt.Logger)

SetLogger replaces the default logger used by the Logger and Recovery middleware.

Types

type Browser

type Browser interface {
	NewPage() (*Page, error)
	NewPageAt(url string) (*Page, error)
	ExistingPage() (*Page, error)
	Close() error
}

Browser is the interface for browser lifecycle management. Engine implements this. The agent package depends on this interface rather than on *Engine directly, enabling testing without a real browser.

type BulkheadFullError

type BulkheadFullError struct {
	TaskName string
}

BulkheadFullError is returned when the bulkhead is at capacity.

func (*BulkheadFullError) Error

func (e *BulkheadFullError) Error() string

type CircuitOpenError

type CircuitOpenError struct {
	TaskName string
}

CircuitOpenError is returned when the circuit breaker is open.

func (*CircuitOpenError) Error

func (e *CircuitOpenError) Error() string

type ClipRegion

type ClipRegion struct {
	X, Y, Width, Height float64
}

ClipRegion defines a rectangular area for screenshot clipping.

type Context

type Context struct {
	// contains filtered or unexported fields
}

Context carries the page state, middleware chain, and data for a single task execution.

func NewTestContext

func NewTestContext(taskName string, handlers HandlersChain) *Context

NewTestContext creates a Context without a page for testing middleware chains.

func (*Context) Abort

func (c *Context) Abort()

Abort stops the middleware chain from continuing.

func (*Context) AbortWithError

func (c *Context) AbortWithError(err error)

AbortWithError stops the chain and records an error.

func (*Context) Cookies

func (c *Context) Cookies() ([]Cookie, error)

Cookies returns all cookies for the current page.

func (*Context) El

func (c *Context) El(selector string) *Selection

El returns a Selection for the first element matching the CSS selector.

func (*Context) ElAll

func (c *Context) ElAll(selector string) *SelectionAll

ElAll returns a SelectionAll for all elements matching the CSS selector.

func (*Context) Errors

func (c *Context) Errors() []error

Errors returns all errors recorded on this context.

func (*Context) Eval

func (c *Context) Eval(js string) (any, error)

Eval executes JavaScript on the page and returns the result.

func (*Context) ExtractTable

func (c *Context) ExtractTable(tableSelector string) (*Table, error)

ExtractTable extracts data from an HTML table element. It reads <th> cells for headers and <td> cells for row data.

func (*Context) FillForm

func (c *Context) FillForm(fields map[string]string) error

FillForm fills multiple form fields at once. The map keys are CSS selectors, values are the text to input.

func (*Context) Get

func (c *Context) Get(key string) (any, bool)

Get retrieves a value by key. The second return value indicates existence.

func (*Context) GetString

func (c *Context) GetString(key string) string

GetString returns the string value for key, or "" if not found.

func (*Context) GoContext

func (c *Context) GoContext() context.Context

GoContext returns the underlying context.Context for cancellation propagation. Use this in middleware that wraps fortify or other context-aware libraries.

func (*Context) HTML

func (c *Context) HTML() (string, error)

HTML returns the full page HTML.

func (*Context) HasEl

func (c *Context) HasEl(selector string) bool

HasEl checks whether at least one element matches the selector.

func (*Context) IsAborted

func (c *Context) IsAborted() bool

IsAborted returns whether the chain has been aborted.

func (*Context) MustNavigate

func (c *Context) MustNavigate(url string) *Context

MustNavigate calls Navigate and panics on error.

func (*Context) Navigate

func (c *Context) Navigate(url string) error

Navigate loads the given URL and waits for the page to be ready.

func (*Context) Next

func (c *Context) Next()

Next calls the next handler in the middleware chain.

func (*Context) PDF

func (c *Context) PDF() ([]byte, error)

PDF generates a PDF of the current page with default options.

func (*Context) PDFTo

func (c *Context) PDFTo(path string) error

PDFTo generates a PDF and writes it to the given file path.

func (*Context) Page

func (c *Context) Page() *Page

Page returns the underlying Page for advanced usage.

func (*Context) RestoreIndex

func (c *Context) RestoreIndex(idx int)

RestoreIndex resets the handler chain position for re-execution. Used by resilience middleware to replay the downstream handler chain.

func (*Context) SaveIndex

func (c *Context) SaveIndex() int

SaveIndex returns the current handler index so it can be restored for retry. Used by resilience middleware (retry, timeout, circuit breaker, bulkhead).

func (*Context) Screenshot

func (c *Context) Screenshot() ([]byte, error)

Screenshot captures the full page as PNG bytes.

func (*Context) ScreenshotElement

func (c *Context) ScreenshotElement(selector string) ([]byte, error)

ScreenshotElement captures a screenshot of a single element.

func (*Context) ScreenshotFullPage

func (c *Context) ScreenshotFullPage() ([]byte, error)

ScreenshotFullPage captures the entire scrollable page as PNG bytes.

func (*Context) ScreenshotTo

func (c *Context) ScreenshotTo(path string) error

ScreenshotTo captures a screenshot and writes it to the given file path.

func (*Context) Set

func (c *Context) Set(key string, value any)

Set stores a key-value pair on the context.

func (*Context) SetCookie

func (c *Context) SetCookie(cookie Cookie) error

SetCookie sets a cookie on the current page.

func (*Context) StartRecording

func (c *Context) StartRecording(opts RecorderOptions) (*Recorder, error)

StartRecording begins capturing screencast frames for video. Returns a Recorder that must be stopped and saved.

func (*Context) TaskName

func (c *Context) TaskName() string

TaskName returns the name of the currently executing task.

func (*Context) URL

func (c *Context) URL() string

URL returns the current page URL.

func (*Context) WaitLoad

func (c *Context) WaitLoad() error

WaitLoad waits for the page load event.

func (*Context) WaitNavigation

func (c *Context) WaitNavigation() error

WaitNavigation waits for a navigation to complete after performing an action. Call this after clicking a link that triggers a page load.

func (*Context) WaitSelector

func (c *Context) WaitSelector(selector string) error

WaitSelector waits until an element matching the selector appears in the DOM.

func (*Context) WaitStable

func (c *Context) WaitStable() error

WaitStable waits until the page DOM is stable.

type Cookie struct {
	Name     string  `json:"name"`
	Value    string  `json:"value"`
	Domain   string  `json:"domain,omitempty"`
	Path     string  `json:"path,omitempty"`
	Expires  float64 `json:"expires,omitempty"`
	Secure   bool    `json:"secure,omitempty"`
	HTTPOnly bool    `json:"httpOnly,omitempty"`
	SameSite string  `json:"sameSite,omitempty"`
}

Cookie represents a browser cookie.

type ElementNotFoundError

type ElementNotFoundError struct {
	Selector string
}

ElementNotFoundError is returned when a selector matches no elements.

func (*ElementNotFoundError) Error

func (e *ElementNotFoundError) Error() string

type Engine

type Engine struct {
	// contains filtered or unexported fields
}

Engine manages the browser lifecycle, global middleware, and task registry.

func Default

func Default(opts ...Option) *Engine

Default creates a new Engine with Logger and Recovery middleware attached.

func New

func New(opts ...Option) *Engine

New creates a new Engine with no middleware attached.

func (*Engine) Close

func (e *Engine) Close() error

Close shuts down the browser and releases resources.

func (*Engine) ExistingPage added in v1.2.1

func (e *Engine) ExistingPage() (*Page, error)

ExistingPage attaches to an existing browser page (e.g. the initial about:blank tab). Returns nil if no existing page target is found.

func (*Engine) Group

func (e *Engine) Group(name string, middleware ...HandlerFunc) *Group

Group creates a named group with optional middleware.

func (*Engine) Launch

func (e *Engine) Launch() error

Launch starts the browser or connects to a remote CDP endpoint. If WithRemoteCDP was set, connects to the remote WebSocket URL instead of launching Chrome.

func (*Engine) MustLaunch

func (e *Engine) MustLaunch() *Engine

MustLaunch calls Launch and panics on error.

func (*Engine) NewPage

func (e *Engine) NewPage() (*Page, error)

NewPage creates a new browser page/tab at about:blank.

func (*Engine) NewPageAt added in v0.9.1

func (e *Engine) NewPageAt(url string) (*Page, error)

NewPageAt creates a new browser page/tab and navigates directly to the URL. Faster than NewPage + Navigate because Chrome loads the URL during target creation.

func (*Engine) Run

func (e *Engine) Run(taskName string) error

Run executes a single task by name.

func (*Engine) RunAll

func (e *Engine) RunAll() error

RunAll executes all registered tasks (root and groups). When WithPoolSize is set, tasks run concurrently up to the pool limit. Otherwise, tasks run sequentially.

func (*Engine) RunGroup

func (e *Engine) RunGroup(groupName string) error

RunGroup executes all tasks within a named group.

func (*Engine) Task

func (e *Engine) Task(name string, handlers ...HandlerFunc)

Task registers a named task with handlers on the engine (root group).

func (*Engine) Use

func (e *Engine) Use(middleware ...HandlerFunc)

Use appends global middleware to the engine.

type Group

type Group struct {
	// contains filtered or unexported fields
}

Group represents a named collection of tasks with shared middleware.

func (*Group) Group

func (g *Group) Group(name string, middleware ...HandlerFunc) *Group

Group creates a sub-group that inherits this group's middleware.

func (*Group) Task

func (g *Group) Task(name string, handlers ...HandlerFunc)

Task registers a named task within this group. The final handler chain is: engine middleware + group middleware + task handlers.

func (*Group) Use

func (g *Group) Use(middleware ...HandlerFunc)

Use appends middleware to this group. Group middleware runs after engine middleware.

type HandlerFunc

type HandlerFunc func(*Context)

HandlerFunc defines the handler function signature for middleware and tasks.

func Logger

func Logger() HandlerFunc

Logger returns middleware that logs task execution using bolt.

func Recovery

func Recovery() HandlerFunc

Recovery returns middleware that recovers from panics and records the error.

type HandlersChain

type HandlersChain []HandlerFunc

HandlersChain is a slice of HandlerFunc used to build middleware chains.

type NavigationError struct {
	URL string
	Err error
}

NavigationError is returned when page navigation fails.

func (e *NavigationError) Error() string
func (e *NavigationError) Unwrap() error

type Option

type Option func(*options)

Option configures an Engine.

func WithAllowPrivateIPs

func WithAllowPrivateIPs(allow bool) Option

WithAllowPrivateIPs permits navigation to private/loopback IP addresses. By default, navigation to private IPs is blocked to prevent SSRF. Enable for testing or internal network automation.

func WithHeadless

func WithHeadless(h bool) Option

WithHeadless sets whether the browser runs in headless mode.

func WithPoolSize

func WithPoolSize(n int) Option

WithPoolSize sets the number of reusable pages in the page pool. When > 0, RunAll executes tasks concurrently up to this limit. Default 0 means sequential execution with no pooling.

func WithProxy

func WithProxy(proxy string) Option

WithProxy routes browser traffic through the specified proxy server. Format: "http://host:port" or "socks5://host:port".

func WithRemoteCDP

func WithRemoteCDP(wsURL string) Option

WithRemoteCDP connects to an already-running Chrome instance via WebSocket URL instead of launching a local browser. Use with Browserbase, Steel, or self-hosted Chrome.

engine := browse.New(browse.WithRemoteCDP("ws://localhost:9222/devtools/browser/..."))

func WithSlowMotion

func WithSlowMotion(d time.Duration) Option

WithSlowMotion adds artificial delay between actions.

func WithTimeout

func WithTimeout(d time.Duration) Option

WithTimeout sets the default timeout for browser operations.

func WithUserAgent

func WithUserAgent(ua string) Option

WithUserAgent sets a custom User-Agent string for all pages.

func WithViewport

func WithViewport(width, height int) Option

WithViewport sets the browser viewport dimensions.

type PDFOptions

type PDFOptions struct {
	Landscape       bool
	PrintBackground bool
	Scale           float64
	PaperWidth      float64 // inches, default 8.5
	PaperHeight     float64 // inches, default 11
	MarginTop       float64 // inches, default 0.4
	MarginBottom    float64 // inches, default 0.4
	MarginLeft      float64 // inches, default 0.4
	MarginRight     float64 // inches, default 0.4
	PageRanges      string  // e.g. "1-5", "1,3,5-7"
}

PDFOptions configures PDF generation.

type Page

type Page struct {
	// contains filtered or unexported fields
}

Page wraps a CDP session for a single browser tab.

func (*Page) Call

func (p *Page) Call(method string, params any) (json.RawMessage, error)

Call sends a raw CDP command scoped to this page's session. This is an escape hatch for advanced CDP operations not covered by the Page API.

func (*Page) Close

func (p *Page) Close() error

Close closes the page/tab and cleans up resources.

func (*Page) Cookies

func (p *Page) Cookies() ([]Cookie, error)

Cookies returns all cookies for the current page.

func (*Page) Evaluate

func (p *Page) Evaluate(expression string) (any, error)

Evaluate executes JavaScript and returns the result value.

func (*Page) HTML

func (p *Page) HTML() (string, error)

HTML returns the full page HTML.

func (*Page) Navigate

func (p *Page) Navigate(rawURL string) error

Navigate loads the given URL and waits for the page to finish loading. Only http:// and https:// URLs are allowed. Private IPs are blocked by default.

func (*Page) OnSession

func (p *Page) OnSession(method string, handler func(params map[string]any)) func()

OnSession registers an event handler scoped to this page's session. Events from other pages/sessions are filtered out. Returns an unsubscribe function to remove the handler.

func (*Page) PDF

func (p *Page) PDF() ([]byte, error)

PDF generates a PDF of the current page with default options.

func (*Page) PDFWithOptions

func (p *Page) PDFWithOptions(opts PDFOptions) ([]byte, error)

PDFWithOptions generates a PDF with the given options.

func (*Page) QuerySelector

func (p *Page) QuerySelector(selector string) (int64, error)

QuerySelector finds the first element matching the CSS selector and returns its node ID.

func (*Page) QuerySelectorAll

func (p *Page) QuerySelectorAll(selector string) ([]int64, error)

QuerySelectorAll finds all elements matching the CSS selector.

func (*Page) QuerySelectorPiercing added in v0.5.0

func (p *Page) QuerySelectorPiercing(selector string) (int64, error)

QuerySelectorPiercing finds the first element matching the selector, piercing through shadow DOM boundaries. Uses DOM.getFlattenedDocument with pierce:true for a single-call flattened DOM traversal, falling back to JS-based search if the flattened approach finds no match.

func (*Page) ResolveNode

func (p *Page) ResolveNode(nodeID int64) (string, error)

ResolveNode resolves a DOM nodeId to a Runtime remote object ID.

func (*Page) Screenshot

func (p *Page) Screenshot() ([]byte, error)

Screenshot captures the page as a PNG image (viewport only, no size limit).

func (*Page) ScreenshotCompact

func (p *Page) ScreenshotCompact() ([]byte, error)

ScreenshotCompact captures the page with a 5MB size limit. Automatically switches to JPEG and downscales if needed. Use this for LLM/agent contexts where size matters.

func (*Page) ScreenshotElement

func (p *Page) ScreenshotElement(nodeID int64) ([]byte, error)

ScreenshotElement captures a screenshot of a specific element by its node ID.

func (*Page) ScreenshotFullPage

func (p *Page) ScreenshotFullPage() ([]byte, error)

ScreenshotFullPage captures the entire scrollable page as a PNG image.

func (*Page) ScreenshotWithOptions

func (p *Page) ScreenshotWithOptions(opts ScreenshotOptions) ([]byte, error)

ScreenshotWithOptions captures the page with the given options. If MaxSize is set and the result exceeds it, the image is automatically re-captured with progressive quality/resolution reduction.

func (*Page) SetCookie

func (p *Page) SetCookie(c Cookie) error

SetCookie sets a cookie on the page.

func (*Page) SetUserAgent

func (p *Page) SetUserAgent(ua string) error

SetUserAgent sets the user agent string for this page.

func (*Page) SetViewport

func (p *Page) SetViewport(width, height int) error

SetViewport sets the page viewport dimensions.

func (*Page) URL

func (p *Page) URL() (string, error)

URL returns the current page URL.

func (*Page) WaitForSelector

func (p *Page) WaitForSelector(selector string) error

WaitForSelector waits until an element matching the selector exists in the DOM.

func (*Page) WaitLoad

func (p *Page) WaitLoad() error

WaitLoad waits for the page load event (document.readyState == "complete").

func (*Page) WaitStable

func (p *Page) WaitStable(d time.Duration) error

WaitStable waits until no DOM mutations occur for the given duration. Has a hard timeout of max(d*3, 3s) to prevent hanging on SPAs with constant updates.

type RateLimitError

type RateLimitError struct {
	TaskName string
}

RateLimitError is returned when a task is rejected by the rate limiter.

func (*RateLimitError) Error

func (e *RateLimitError) Error() string

type Recorder

type Recorder struct {
	// contains filtered or unexported fields
}

Recorder captures screencast frames from a page and assembles them into a video.

func NewRecorder

func NewRecorder(page *Page, opts RecorderOptions) (*Recorder, error)

NewRecorder creates a recorder for the given page. Frames are saved to a temporary directory until Stop is called.

func (*Recorder) Cleanup

func (r *Recorder) Cleanup() error

Cleanup removes the temporary frames directory.

func (*Recorder) FrameCount

func (r *Recorder) FrameCount() int64

FrameCount returns the number of frames captured so far.

func (*Recorder) Frames

func (r *Recorder) Frames() ([]string, error)

Frames returns all captured frame file paths.

func (*Recorder) FramesDir

func (r *Recorder) FramesDir() string

FramesDir returns the directory containing captured frames.

func (*Recorder) SaveGIF

func (r *Recorder) SaveGIF(outputPath string, fps int) error

SaveGIF assembles captured frames into an animated GIF using ffmpeg.

func (*Recorder) SaveVideo

func (r *Recorder) SaveVideo(outputPath string, fps int) error

SaveVideo assembles captured frames into an MP4 video using ffmpeg. Returns the path to the generated video file. Requires ffmpeg to be installed on the system.

func (*Recorder) Start

func (r *Recorder) Start() error

Start begins capturing screencast frames.

func (*Recorder) Stop

func (r *Recorder) Stop() error

Stop ends the screencast capture and waits for in-flight frame acks to complete.

type RecorderOptions

type RecorderOptions struct {
	// Format is "jpeg" (default, smaller) or "png" (lossless).
	Format string
	// Quality is JPEG quality 1-100. Default 80.
	Quality int
	// MaxWidth limits the frame width. 0 means no limit.
	MaxWidth int
	// MaxHeight limits the frame height. 0 means no limit.
	MaxHeight int
}

RecorderOptions configures video recording.

type ScreenshotOptions

type ScreenshotOptions struct {
	// Format is "png" (default) or "jpeg".
	Format string
	// Quality is JPEG quality 1-100. Ignored for PNG.
	Quality int
	// FullPage captures the entire scrollable page, not just the viewport.
	FullPage bool
	// Clip captures a specific region of the page.
	Clip *ClipRegion
	// MaxSize is the maximum allowed size in bytes. If the screenshot exceeds
	// this limit, it is automatically re-captured as JPEG with progressively
	// lower quality and downscaled resolution until it fits.
	// 0 means no limit. Recommended: 5*1024*1024 (5MB) for LLM contexts.
	MaxSize int
	// MaxWidth downscales the capture to this width if set. Height scales proportionally.
	// Applied via CDP's clip.scale parameter. 0 means no downscaling.
	MaxWidth int
}

ScreenshotOptions configures screenshot capture.

type Selection

type Selection struct {
	// contains filtered or unexported fields
}

Selection wraps a single page element with a fluent, chainable API.

func NewSelection

func NewSelection(page *Page, nodeID int64, selector string) *Selection

NewSelection creates a Selection for the given page and node ID.

func (*Selection) Attr

func (s *Selection) Attr(name string) (string, error)

Attr returns the value of the given attribute.

func (*Selection) Clear

func (s *Selection) Clear() error

Clear clears the element's value.

func (*Selection) Click

func (s *Selection) Click() error

Click clicks the element.

func (*Selection) Err

func (s *Selection) Err() error

Err returns the accumulated error, if any.

func (*Selection) Hover

func (s *Selection) Hover() error

Hover moves the mouse over the element.

func (*Selection) Input

func (s *Selection) Input(text string) error

Input focuses the element, clears it, and types the given text.

func (*Selection) MustClick

func (s *Selection) MustClick() *Selection

MustClick clicks the element and panics on error.

func (*Selection) MustInput

func (s *Selection) MustInput(text string) *Selection

MustInput types text and panics on error.

func (*Selection) MustText

func (s *Selection) MustText() string

MustText returns the element's text content, panicking on error.

func (*Selection) Screenshot

func (s *Selection) Screenshot() ([]byte, error)

Screenshot captures a screenshot of this element only.

func (*Selection) Text

func (s *Selection) Text() (string, error)

Text returns the element's text content.

func (*Selection) Value

func (s *Selection) Value() (string, error)

Value returns the element's value property (for inputs).

func (*Selection) Visible

func (s *Selection) Visible() (bool, error)

Visible reports whether the element is visible.

func (*Selection) WaitEnabled

func (s *Selection) WaitEnabled() *Selection

WaitEnabled waits until the element is enabled.

func (*Selection) WaitStable

func (s *Selection) WaitStable() *Selection

WaitStable waits until the element's position is stable.

func (*Selection) WaitVisible

func (s *Selection) WaitVisible() *Selection

WaitVisible waits until the element is visible.

type SelectionAll

type SelectionAll struct {
	// contains filtered or unexported fields
}

SelectionAll wraps multiple elements for batch operations.

func (*SelectionAll) At

func (sa *SelectionAll) At(i int) *Selection

At returns the element at the given index.

func (*SelectionAll) Count

func (sa *SelectionAll) Count() int

Count returns the number of matched elements.

func (*SelectionAll) Each

func (sa *SelectionAll) Each(fn func(int, *Selection)) error

Each iterates over each matched element with its index.

func (*SelectionAll) Filter

func (sa *SelectionAll) Filter(fn func(*Selection) bool) *SelectionAll

Filter returns a new SelectionAll containing only elements that pass the predicate.

func (*SelectionAll) First

func (sa *SelectionAll) First() *Selection

First returns the first matched element.

func (*SelectionAll) Last

func (sa *SelectionAll) Last() *Selection

Last returns the last matched element.

func (*SelectionAll) Texts

func (sa *SelectionAll) Texts() ([]string, error)

Texts returns the text content of all matched elements.

type Table

type Table struct {
	Headers []string
	Rows    [][]string
}

Table represents extracted HTML table data.

type TaskLifecycleContext

type TaskLifecycleContext struct {
	TaskName string
	Attempt  int
	LastErr  error
}

TaskLifecycleContext holds the context for a task's state machine.

type TaskTracker

type TaskTracker struct {
	// contains filtered or unexported fields
}

TaskTracker wraps a statekit Interpreter to track task execution state.

func NewTaskTracker

func NewTaskTracker(taskName string) (*TaskTracker, error)

NewTaskTracker creates a tracker for the given task name.

func (*TaskTracker) Abort

func (t *TaskTracker) Abort()

Abort transitions the task to aborted state.

func (*TaskTracker) Context

func (t *TaskTracker) Context() TaskLifecycleContext

Context returns the current task lifecycle context.

func (*TaskTracker) Fail

func (t *TaskTracker) Fail(err error)

Fail transitions the task to failed state with an error.

func (*TaskTracker) IsDone

func (t *TaskTracker) IsDone() bool

IsDone returns true if the task is in a terminal state.

func (*TaskTracker) Matches

func (t *TaskTracker) Matches(state statekit.StateID) bool

Matches checks if the task is in the given state.

func (*TaskTracker) Reset

func (t *TaskTracker) Reset()

Reset transitions a failed task back to pending.

func (*TaskTracker) Retry

func (t *TaskTracker) Retry()

Retry transitions the task to retrying state.

func (*TaskTracker) Start

func (t *TaskTracker) Start()

Start transitions the task to running state.

func (*TaskTracker) State

func (t *TaskTracker) State() statekit.StateID

State returns the current state ID.

func (*TaskTracker) Stop

func (t *TaskTracker) Stop()

Stop cleans up the interpreter.

func (*TaskTracker) Success

func (t *TaskTracker) Success()

Success transitions the task to success state.

type TimeoutError

type TimeoutError struct {
	Operation string
	Selector  string
}

TimeoutError is returned when an operation exceeds its deadline.

func (*TimeoutError) Error

func (e *TimeoutError) Error() string

type URLValidator

type URLValidator struct {
	AllowPrivateIPs bool
}

URLValidator controls URL validation for navigation. Set AllowPrivateIPs to true to permit loopback/private IP navigation (e.g., for testing).

func (URLValidator) Validate

func (v URLValidator) Validate(rawURL string) error

Validate checks that a URL is safe for navigation. Blocks non-http(s) schemes and private/loopback IPs (unless AllowPrivateIPs is set).

Directories

Path Synopsis
Package agent provides a high-level, agent-optimized API for browser automation.
Package agent provides a high-level, agent-optimized API for browser automation.
cmd
scout command
scout is the CLI for AI-powered browser automation.
scout is the CLI for AI-powered browser automation.
examples
demo command
Example: demo showcases browse-go's middleware, groups, and task composition.
Example: demo showcases browse-go's middleware, groups, and task composition.
login command
Example: login demonstrates authenticating to a web application.
Example: login demonstrates authenticating to a web application.
scrape command
Example: scrape demonstrates extracting structured data from a web page.
Example: scrape demonstrates extracting structured data from a web page.
internal
agui
Package agui implements an AG-UI protocol server for scout browser automation.
Package agui implements an AG-UI protocol server for scout browser automation.
cdp
Package cdp provides a low-level Chrome DevTools Protocol client over WebSocket.
Package cdp provides a low-level Chrome DevTools Protocol client over WebSocket.
launcher
Package launcher finds and starts a Chrome/Chromium process.
Package launcher finds and starts a Chrome/Chromium process.
wait
Package wait provides context-aware auto-wait utilities for page readiness.
Package wait provides context-aware auto-wait utilities for page readiness.
Package middleware provides reusable middleware for browse-go tasks.
Package middleware provides reusable middleware for browse-go tasks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL