Documentation
¶
Overview ¶
Package browser provides a chromedp-backed browser automation provider. It exposes small, composable trpc-agent tools that an AI agent can invoke to navigate pages, interact with elements, extract content, and take screenshots. Without this package the agent has no way to observe or manipulate live web pages.
Index ¶
- func AllTools(b *Browser) []tool.CallableTool
- func NewClickTool(b *Browser) tool.CallableTool
- func NewEvalJSTool(b *Browser) tool.CallableTool
- func NewNavigateTool(b *Browser) tool.CallableTool
- func NewReadHTMLTool(b *Browser) tool.CallableTool
- func NewReadTextTool(b *Browser) tool.CallableTool
- func NewScreenshotTool(b *Browser) tool.CallableTool
- func NewTypeTool(b *Browser) tool.CallableTool
- func NewWaitTool(b *Browser) tool.CallableTool
- type Browser
- type ClickRequest
- type ClickResponse
- type Config
- type EvalJSRequest
- type EvalJSResponse
- type NavigateRequest
- type NavigateResponse
- type Option
- type ReadHTMLRequest
- type ReadHTMLResponse
- type ReadTextRequest
- type ReadTextResponse
- type ScreenshotRequest
- type ScreenshotResponse
- type TypeRequest
- type TypeResponse
- type WaitRequest
- type WaitResponse
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AllTools ¶
func AllTools(b *Browser) []tool.CallableTool
AllTools returns every browser tool wired to the given Browser instance. This is a convenience function for registering all tools at once.
func NewClickTool ¶
func NewClickTool(b *Browser) tool.CallableTool
NewClickTool creates the browser_click tool. It waits for the element to become visible and then clicks it. Without this tool the agent cannot interact with buttons, links, or other clickable elements.
func NewEvalJSTool ¶
func NewEvalJSTool(b *Browser) tool.CallableTool
NewEvalJSTool creates the browser_eval_js tool. It evaluates an arbitrary JavaScript expression in the page context. This is the escape hatch for any interaction that the other tools cannot cover.
func NewNavigateTool ¶
func NewNavigateTool(b *Browser) tool.CallableTool
NewNavigateTool creates the browser_navigate tool. It opens the requested URL in the shared browser tab. Without this tool the agent has no way to load a web page.
func NewReadHTMLTool ¶
func NewReadHTMLTool(b *Browser) tool.CallableTool
NewReadHTMLTool creates the browser_read_html tool. It returns the outer HTML of an element, useful when the agent needs structural information. Without this tool the agent can only see text, not the underlying markup.
func NewReadTextTool ¶
func NewReadTextTool(b *Browser) tool.CallableTool
NewReadTextTool creates the browser_read_text tool. It extracts the visible text content of an element. Without this tool the agent cannot read page content as plain text.
func NewScreenshotTool ¶
func NewScreenshotTool(b *Browser) tool.CallableTool
NewScreenshotTool creates the browser_screenshot tool. It captures a PNG screenshot of the viewport or a specific element and returns it as base64. Without this tool the agent has no visual feedback of the page state.
func NewTypeTool ¶
func NewTypeTool(b *Browser) tool.CallableTool
NewTypeTool creates the browser_type tool. It focuses the element and types the given text. Without this tool the agent cannot fill out forms.
func NewWaitTool ¶
func NewWaitTool(b *Browser) tool.CallableTool
NewWaitTool creates the browser_wait tool. It allows the agent to pause execution until a specific condition is met (time, selector visible, or network idle).
Types ¶
type Browser ¶
type Browser struct {
// contains filtered or unexported fields
}
Browser manages a shared chromedp browser session. All tools operate on the same browser tab so that navigation state is preserved across calls. Without this struct every tool call would launch a new browser, losing cookies, logins, and page context.
func New ¶
New allocates a new Chrome browser process (headless by default) and returns a Browser that tools can share. Callers MUST call Close when finished to avoid leaking Chrome processes.
func (*Browser) Close ¶
func (b *Browser) Close()
Close tears down the browser process and releases all resources. It is safe to call multiple times.
func (*Browser) GetTools ¶
GetTools satisfies the tools.ToolProviders interface so a Browser instance can be passed directly to tools.NewRegistry. Without this, browser tool construction would be inlined in the registry.
func (*Browser) NewTab ¶
NewTab creates a new isolated browser context (tab). The caller is responsible for cancelling the returned context to close the tab. The tab will also be closed if the underlying browser context is cancelled (for example, via Close).
Note: The 'parent' argument is currently ignored for the purpose of browser inheritance to ensure the tab belongs to this Browser instance. If you need to tie the tab to an existing context's lifecycle, wrap the returned context with context.WithCancel/WithTimeout using your parent context as the reference (though hooking them up directly is not supported by chromedp structure).
type ClickRequest ¶
type ClickRequest struct {
Selector string `json:"selector" jsonschema:"description=CSS selector of the element to click,required"`
}
ClickRequest is the input for the browser_click tool.
type ClickResponse ¶
type ClickResponse struct {
Status string `json:"status"`
}
ClickResponse is the output for the browser_click tool.
type Config ¶
type Config struct {
BlockedDomains []string `yaml:"blocked_domains,omitempty" toml:"blocked_domains,omitempty"`
}
Config holds configuration for the browser tool provider. BlockedDomains prevents the agent from navigating to specific domains (e.g. internal admin panels, payment processors). Matching is suffix-based so "example.com" also blocks "sub.example.com".
type EvalJSRequest ¶
type EvalJSRequest struct {
Expression string `json:"expression" jsonschema:"description=JavaScript expression to evaluate in the page context,required"`
}
EvalJSRequest is the input for the browser_eval_js tool.
type EvalJSResponse ¶
type EvalJSResponse struct {
Result string `json:"result"`
}
EvalJSResponse is the output for the browser_eval_js tool.
type NavigateRequest ¶
type NavigateRequest struct {
}
NavigateRequest is the input for the browser_navigate tool.
type NavigateResponse ¶
type NavigateResponse struct {
}
NavigateResponse is the output for the browser_navigate tool.
type Option ¶
type Option func(*browserOpts)
Option configures a Browser instance.
func WithBlockedDomains ¶
WithBlockedDomains sets domains that the browser is not allowed to navigate to. Matching is suffix-based: "example.com" blocks both "example.com" and "sub.example.com". This is a safety measure to prevent the agent from accessing sensitive internal services.
func WithHeadless ¶
WithHeadless controls whether the browser runs without a visible window. It defaults to true. Setting this to false is useful during local debugging.
func WithTimeout ¶
WithTimeout overrides the default per-action timeout of 30 seconds.
func WithViewport ¶
WithViewport sets the browser window size.
type ReadHTMLRequest ¶
type ReadHTMLRequest struct {
Selector string `json:"selector" jsonschema:"description=CSS selector of the element whose outer HTML to read,required"`
}
ReadHTMLRequest is the input for the browser_read_html tool.
type ReadHTMLResponse ¶
type ReadHTMLResponse struct {
HTML string `json:"html"`
}
ReadHTMLResponse is the output for the browser_read_html tool.
type ReadTextRequest ¶
type ReadTextRequest struct {
Selector string `json:"selector" jsonschema:"description=CSS selector of the element whose visible text to read,required"`
}
ReadTextRequest is the input for the browser_read_text tool.
type ReadTextResponse ¶
type ReadTextResponse struct {
Text string `json:"text"`
}
ReadTextResponse is the output for the browser_read_text tool.
type ScreenshotRequest ¶
type ScreenshotRequest struct {
Selector string `` /* 146-byte string literal not displayed */
}
ScreenshotRequest is the input for the browser_screenshot tool.
type ScreenshotResponse ¶
type ScreenshotResponse struct {
ImageBase64 string `json:"image_base64"`
}
ScreenshotResponse is the output for the browser_screenshot tool.
type TypeRequest ¶
type TypeRequest struct {
Selector string `json:"selector" jsonschema:"description=CSS selector of the input element,required"`
Text string `json:"text" jsonschema:"description=Text to type into the element,required"`
}
TypeRequest is the input for the browser_type tool.
type TypeResponse ¶
type TypeResponse struct {
Status string `json:"status"`
}
TypeResponse is the output for the browser_type tool.
type WaitRequest ¶
type WaitRequest struct {
Selector string `json:"selector,omitempty" jsonschema:"description=CSS selector to wait for visibility."`
Duration string `json:"duration,omitempty" jsonschema:"description=Duration to wait (e.g. '2s', '500ms')."`
NetworkIdle bool `json:"network_idle,omitempty" jsonschema:"description=If true, wait for network (HTML+images+CSS) to be idle."`
}
WaitRequest is the input for the browser_wait tool.
type WaitResponse ¶
type WaitResponse struct {
Status string `json:"status"`
}
WaitResponse is the output for the browser_wait tool.