browser

package
v0.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package browser provides a chromedp-backed browser automation provider. It exposes small, composable trpc-agent tools that an AI agent can invoke to navigate pages, interact with elements, extract content, and take screenshots. Without this package the agent has no way to observe or manipulate live web pages.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AllTools

func AllTools(b *Browser) []tool.CallableTool

AllTools returns every browser tool wired to the given Browser instance. This is a convenience function for registering all tools at once.

func NewClickTool

func NewClickTool(b *Browser) tool.CallableTool

NewClickTool creates the browser_click tool. It waits for the element to become visible and then clicks it. Without this tool the agent cannot interact with buttons, links, or other clickable elements.

func NewEvalJSTool

func NewEvalJSTool(b *Browser) tool.CallableTool

NewEvalJSTool creates the browser_eval_js tool. It evaluates an arbitrary JavaScript expression in the page context. This is the escape hatch for any interaction that the other tools cannot cover.

func NewNavigateTool

func NewNavigateTool(b *Browser) tool.CallableTool

NewNavigateTool creates the browser_navigate tool. It opens the requested URL in the shared browser tab. Without this tool the agent has no way to load a web page.

func NewReadHTMLTool

func NewReadHTMLTool(b *Browser) tool.CallableTool

NewReadHTMLTool creates the browser_read_html tool. It returns the outer HTML of an element, useful when the agent needs structural information. Without this tool the agent can only see text, not the underlying markup.

func NewReadTextTool

func NewReadTextTool(b *Browser) tool.CallableTool

NewReadTextTool creates the browser_read_text tool. It extracts the visible text content of an element. Without this tool the agent cannot read page content as plain text.

func NewScreenshotTool

func NewScreenshotTool(b *Browser) tool.CallableTool

NewScreenshotTool creates the browser_screenshot tool. It captures a PNG screenshot of the viewport or a specific element and returns it as base64. Without this tool the agent has no visual feedback of the page state.

func NewTypeTool

func NewTypeTool(b *Browser) tool.CallableTool

NewTypeTool creates the browser_type tool. It focuses the element and types the given text. Without this tool the agent cannot fill out forms.

func NewWaitTool

func NewWaitTool(b *Browser) tool.CallableTool

NewWaitTool creates the browser_wait tool. It allows the agent to pause execution until a specific condition is met (time, selector visible, or network idle).

Types

type Browser

type Browser struct {
	// contains filtered or unexported fields
}

Browser manages a shared chromedp browser session. All tools operate on the same browser tab so that navigation state is preserved across calls. Without this struct every tool call would launch a new browser, losing cookies, logins, and page context.

func New

func New(ctx context.Context, opts ...Option) (*Browser, error)

New allocates a new Chrome browser process (headless by default) and returns a Browser that tools can share. Callers MUST call Close when finished to avoid leaking Chrome processes.

func (*Browser) Close

func (b *Browser) Close()

Close tears down the browser process and releases all resources. It is safe to call multiple times.

func (*Browser) GetTools

func (b *Browser) GetTools() []tool.Tool

GetTools satisfies the tools.ToolProviders interface so a Browser instance can be passed directly to tools.NewRegistry. Without this, browser tool construction would be inlined in the registry.

func (*Browser) NewTab

func (b *Browser) NewTab(parent context.Context) (context.Context, context.CancelFunc, error)

NewTab creates a new isolated browser context (tab). The caller is responsible for cancelling the returned context to close the tab. The tab will also be closed if the underlying browser context is cancelled (for example, via Close).

Note: The 'parent' argument is currently ignored for the purpose of browser inheritance to ensure the tab belongs to this Browser instance. If you need to tie the tab to an existing context's lifecycle, wrap the returned context with context.WithCancel/WithTimeout using your parent context as the reference (though hooking them up directly is not supported by chromedp structure).

type ClickRequest

type ClickRequest struct {
	Selector string `json:"selector" jsonschema:"description=CSS selector of the element to click,required"`
}

ClickRequest is the input for the browser_click tool.

type ClickResponse

type ClickResponse struct {
	Status string `json:"status"`
}

ClickResponse is the output for the browser_click tool.

type Config

type Config struct {
	BlockedDomains []string `yaml:"blocked_domains,omitempty" toml:"blocked_domains,omitempty"`
}

Config holds configuration for the browser tool provider. BlockedDomains prevents the agent from navigating to specific domains (e.g. internal admin panels, payment processors). Matching is suffix-based so "example.com" also blocks "sub.example.com".

type EvalJSRequest

type EvalJSRequest struct {
	Expression string `json:"expression" jsonschema:"description=JavaScript expression to evaluate in the page context,required"`
}

EvalJSRequest is the input for the browser_eval_js tool.

type EvalJSResponse

type EvalJSResponse struct {
	Result string `json:"result"`
}

EvalJSResponse is the output for the browser_eval_js tool.

type NavigateRequest struct {
	URL string `json:"url" jsonschema:"description=The URL to navigate to,required"`
}

NavigateRequest is the input for the browser_navigate tool.

type NavigateResponse struct {
	Status string `json:"status"`
	URL    string `json:"url"`
}

NavigateResponse is the output for the browser_navigate tool.

type Option

type Option func(*browserOpts)

Option configures a Browser instance.

func WithBlockedDomains

func WithBlockedDomains(domains []string) Option

WithBlockedDomains sets domains that the browser is not allowed to navigate to. Matching is suffix-based: "example.com" blocks both "example.com" and "sub.example.com". This is a safety measure to prevent the agent from accessing sensitive internal services.

func WithHeadless

func WithHeadless(v bool) Option

WithHeadless controls whether the browser runs without a visible window. It defaults to true. Setting this to false is useful during local debugging.

func WithTimeout

func WithTimeout(d time.Duration) Option

WithTimeout overrides the default per-action timeout of 30 seconds.

func WithViewport

func WithViewport(width, height int) Option

WithViewport sets the browser window size.

type ReadHTMLRequest

type ReadHTMLRequest struct {
	Selector string `json:"selector" jsonschema:"description=CSS selector of the element whose outer HTML to read,required"`
}

ReadHTMLRequest is the input for the browser_read_html tool.

type ReadHTMLResponse

type ReadHTMLResponse struct {
	HTML string `json:"html"`
}

ReadHTMLResponse is the output for the browser_read_html tool.

type ReadTextRequest

type ReadTextRequest struct {
	Selector string `json:"selector" jsonschema:"description=CSS selector of the element whose visible text to read,required"`
}

ReadTextRequest is the input for the browser_read_text tool.

type ReadTextResponse

type ReadTextResponse struct {
	Text string `json:"text"`
}

ReadTextResponse is the output for the browser_read_text tool.

type ScreenshotRequest

type ScreenshotRequest struct {
	Selector string `` /* 146-byte string literal not displayed */
}

ScreenshotRequest is the input for the browser_screenshot tool.

type ScreenshotResponse

type ScreenshotResponse struct {
	ImageBase64 string `json:"image_base64"`
}

ScreenshotResponse is the output for the browser_screenshot tool.

type TypeRequest

type TypeRequest struct {
	Selector string `json:"selector" jsonschema:"description=CSS selector of the input element,required"`
	Text     string `json:"text" jsonschema:"description=Text to type into the element,required"`
}

TypeRequest is the input for the browser_type tool.

type TypeResponse

type TypeResponse struct {
	Status string `json:"status"`
}

TypeResponse is the output for the browser_type tool.

type WaitRequest

type WaitRequest struct {
	Selector    string `json:"selector,omitempty" jsonschema:"description=CSS selector to wait for visibility."`
	Duration    string `json:"duration,omitempty" jsonschema:"description=Duration to wait (e.g. '2s', '500ms')."`
	NetworkIdle bool   `json:"network_idle,omitempty" jsonschema:"description=If true, wait for network (HTML+images+CSS) to be idle."`
}

WaitRequest is the input for the browser_wait tool.

type WaitResponse

type WaitResponse struct {
	Status string `json:"status"`
}

WaitResponse is the output for the browser_wait tool.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL