README
ΒΆ
kbr (ko-browser)
A simple, fast, token-efficient browser for AI agents β CLI + Go Library.
δΈζζζ‘£ β’ Quick Start β’ Agent Notes β’ Commands β’ Library β’ Snapshot Format Spec
ko-browser is a browser automation tool built in Go for AI agents. It exposes the same core model in two forms:
- a CLI for shell-driven agent workflows
- a Go library for embedding browser control into agent runtimes and tools
Its custom accessibility-tree snapshot format reduces prompt footprint by 46%+ compared with more verbose alternatives.
β¨ Key Features
- π Single binary β no Node.js, no Playwright runtime needed
- π€ AI-optimized snapshot format β
id: role "name" statessaves 46%+ tokens - π¦ Dual-use β works as a CLI tool AND a Go library (
go get) - β‘ Fast startup β ~50ms (Go binary) vs ~500ms (Node.js-based tools)
- π’ Simple element references β
click 5 - π Optional OCR β Tesseract integration via
-tags=ocrbuild flag for image-heavy pages - π ~86 commands β broad coverage for browser automation workflows
Snapshot Format Comparison
ββ kbr (46% fewer tokens) βββββββββββ ββ verbose snapshot output βββββββββββ
β Page: "Example" β β - document "Example" β
β β β - navigation "main": β
β 1: link "Home" β β - link "Home" [ref=@e1] β
β 2: link "About" β β - link "About" [ref=@e2] β
β 3: textbox "Search" focused β β - search: β
β 4: button "Go" β β - textbox "Search" [ref=@e3] β
β β β - button "Go" [ref=@e4] β
ββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββ
π Read the full Snapshot Format Specification for detailed design decisions, BNF grammar, and examples. (δΈζη)
π¦ Installation
Choose an install path
| Use case | Recommended path | Notes |
|---|---|---|
| macOS local usage | Homebrew | Installs an OCR-enabled build and pulls in tesseract |
| Linux/macOS manual deployment | GitHub Releases | Download the prebuilt archive and make sure Tesseract runtime libraries are present |
| Go-based integration or custom builds | From source | Best when embedding kbr into your own toolchain |
Homebrew
brew tap libi/tap
brew install ko-browser
# or without tapping first:
brew install libi/tap/ko-browser
Homebrew installs
kbrwith OCR enabled. It pulls intesseractautomatically and builds from source. The tap repository is libi/homebrew-tap.
Pre-built binaries
Download from GitHub Releases:
Release binaries are also built with OCR enabled. Install Tesseract first so the runtime OCR libraries are available:
brew install tesseracton macOS,apt install libtesseract-devon Linux.
# macOS (Apple Silicon)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-darwin-arm64.tar.gz
tar xzf ko-browser-darwin-arm64.tar.gz
mv kbr /usr/local/bin/kbr
# macOS (Intel)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-darwin-amd64.tar.gz
tar xzf ko-browser-darwin-amd64.tar.gz
mv kbr /usr/local/bin/kbr
# Linux (amd64)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-linux-amd64.tar.gz
tar xzf ko-browser-linux-amd64.tar.gz
mv kbr /usr/local/bin/kbr
From source
# Install kbr with OCR support (requires Tesseract to be installed)
CGO_ENABLED=1 go install -tags=ocr github.com/libi/ko-browser/cmd/kbr@latest
OCR is required for the published packages. This requires Tesseract:
brew install tesseract(macOS) /apt install libtesseract-dev(Linux).
Install Chrome (if not already installed)
kbr install # check & download Chromium
kbr install --with-deps # also install system dependencies (Linux)
Runtime requirements
- Chrome or Chromium is required. Run
kbr installif you do not already have a compatible browser. - OCR-enabled builds expect Tesseract runtime libraries on the host.
- On Linux CI or sandboxed environments, Chrome may require
--no-sandbox-style fallbacks depending on the runner configuration.
π Quick Start
Agent loop
This is the most common shell-driven agent workflow:
kbr open https://example.com
kbr snapshot
kbr click 5
kbr type 8 "hello world"
kbr press Enter
kbr wait load
kbr get text 12
Use snapshot to get the current page structure, act on numeric refs, then re-snapshot after the page changes.
CLI
# Open a page and take a snapshot
kbr open https://www.google.com
kbr snapshot
# Output:
# Page: "Google"
#
# 1: link "Gmail"
# 2: link "Images"
# 3: textbox "Search" focused
# 4: button "Google Search"
# 5: button "I'm Feeling Lucky"
# Interact with elements by ID
kbr click 3
kbr type 3 "ko-browser github"
kbr press Enter
# Take a screenshot
kbr screenshot result.png
# Close the browser
kbr close
Go Library
package main
import (
"fmt"
"log"
"github.com/libi/ko-browser/browser"
)
func main() {
b, err := browser.New(browser.Options{Headless: true})
if err != nil {
log.Fatal(err)
}
defer b.Close()
b.Open("https://www.google.com")
snap, _ := b.Snapshot()
fmt.Println(snap.Text)
// 1: link "Gmail"
// 2: link "Images"
// 3: textbox "Search" focused
// ...
b.Click(3)
b.Type(3, "ko-browser github")
b.Press("Enter")
}
π Commands
Common flows
| Goal | Commands |
|---|---|
| Navigate and inspect | open, snapshot, get title, get url |
| Interact with forms | click, type, fill, select, check, press |
| Wait for state changes | wait load, wait selector, wait text, wait url |
| Debug and verify | screenshot, console messages, errors list, highlight |
| Work with sessions | tab *, cookies *, storage *, state * |
Core Interaction
| Command | CLI Usage | Library API |
|---|---|---|
| open | kbr open <url> |
b.Open(url) |
| click | kbr click <id> |
b.Click(id) |
| dblclick | kbr dblclick <id> |
b.DblClick(id) |
| type | kbr type <id> "text" |
b.Type(id, text) |
| fill | kbr fill <id> "text" |
b.Fill(id, text) |
| press | kbr press <key> |
b.Press(key) |
| hover | kbr hover <id> |
b.Hover(id) |
| focus | kbr focus <id> |
b.Focus(id) |
| check | kbr check <id> |
b.Check(id) |
| uncheck | kbr uncheck <id> |
b.Uncheck(id) |
| select | kbr select <id> "val" |
b.Select(id, vals...) |
| scroll | kbr scroll down 500 |
b.Scroll("down", 500) |
| drag | kbr drag <src> <dst> |
b.Drag(srcID, dstID) |
| close | kbr close |
b.Close() |
Keyboard
kbr press Enter
kbr press Control+a
kbr keyboard type "Hello, World!"
kbr keyboard inserttext "pasted content"
Snapshot & Screenshot
kbr snapshot # full accessibility tree
kbr snapshot -i # interactive elements only
kbr snapshot -c # compact mode
kbr snapshot -d 5 # max depth 5
kbr snapshot -C # include cursor elements
kbr snapshot -s "#main" # scope to CSS selector
kbr snapshot --ocr # with OCR for images
kbr screenshot out.png # viewport screenshot
kbr screenshot --full out.png # full page
kbr screenshot --annotate out.png # annotated with element IDs
Navigation
kbr back
kbr forward
kbr reload
Get Information
kbr get title # page title
kbr get url # current URL
kbr get text <id> # element inner text
kbr get html <id> # element innerHTML
kbr get value <id> # input value
kbr get attr <id> href # element attribute
kbr get count ".item" # count matching elements
kbr get box <id> # bounding box
kbr get styles <id> # computed styles
kbr get cdp-url # CDP WebSocket URL
Check State
kbr is visible <id> # β true/false
kbr is enabled <id>
kbr is checked <id>
Find Elements
kbr find role button --name Submit
kbr find text "Sign In"
kbr find label "Email"
kbr find placeholder "Search..."
kbr find alt "Logo"
kbr find title "tooltip"
kbr find testid "login-form"
kbr find first ".card"
kbr find last ".card"
kbr find nth 2 ".card"
# Add --exact for exact text matching
kbr find text "Sign In" --exact
Wait
kbr wait time 2s # wait 2 seconds
kbr wait selector "#loading" # wait for element
kbr wait url "**/dashboard" # wait for URL match
kbr wait load # wait for networkidle
kbr wait text "Welcome" # wait for text
kbr wait fn "window.ready" # wait for JS expression
kbr wait hidden "#spinner" # wait for element to hide
kbr wait download ./file.pdf # wait for download
Mouse (Low-level)
kbr mouse move 100 200
kbr mouse click 100 200
kbr mouse down 100 200
kbr mouse up 100 200
kbr mouse wheel 100 200 0 300 # x y deltaX deltaY
Tabs
kbr tab list
kbr tab new https://example.com
kbr tab switch 2
kbr tab close 1
Network
kbr network route "**/api/*" --action block
kbr network unroute "**/api/*"
kbr network requests
kbr network start-logging
kbr network clear
Storage & Cookies
kbr cookies get
kbr cookies set session_id "abc123" --domain example.com
kbr cookies delete session_id
kbr cookies clear
kbr storage get theme --type local
kbr storage set theme "dark" --type local
kbr storage delete theme --type local
kbr storage clear --type local
kbr storage list --type session
Browser Settings
kbr set viewport 1920 1080
kbr set viewport 1920 1080 2 # with 2x scale
kbr set device "iPhone 12"
kbr set geo 37.7749 -122.4194
kbr set offline true
kbr set headers '{"X-Custom":"value"}'
kbr set credentials admin secret
kbr set media dark
kbr set colorscheme dark
File Operations
kbr upload <id> ./document.pdf
kbr download <id> ./output/
JavaScript Evaluation
kbr eval "document.title"
kbr eval -b "ZG9jdW1lbnQudGl0bGU=" # base64 encoded
cat script.js | kbr eval --stdin
Diff
kbr diff snapshot # compare with last snapshot
kbr diff snapshot --baseline before.txt
kbr diff screenshot --baseline before.png
kbr diff url https://v1.example.com https://v2.example.com
Debug & Clipboard
kbr console messages
kbr console clear
kbr errors list
kbr highlight <id>
kbr inspect # open DevTools
kbr clipboard read
kbr clipboard write "text"
kbr clipboard copy # Ctrl+C
kbr clipboard paste # Ctrl+V
Trace & Record
kbr trace start
kbr trace stop ./trace.zip
kbr profiler start
kbr profiler stop ./profile.json
kbr record start ./recording
kbr record stop
Auth
kbr auth save github --url https://github.com/login --username user --password pass
kbr auth login github
kbr auth list
kbr auth show github
kbr auth delete github
Session Management
kbr session # show current session
kbr session list # list all active sessions
kbr --session test open example.com # use named session
Selector Syntax
kbr supports three selector formats, auto-detected:
| Input | Type | Example |
|---|---|---|
| Number | Snapshot ID | kbr click 5 |
| CSS | CSS Selector | kbr click "#submit" |
| XPath | XPath | kbr click "//button[@type='submit']" |
π§ Global Options
| Flag | Description |
|---|---|
--session <name> |
Isolated session name (default: "default") |
--headed |
Show browser window |
--json |
JSON output |
--timeout <duration> |
Operation timeout (default: 30s) |
--profile <path> |
Persistent Chrome user data directory |
--state <path> |
Load saved browser state (cookies + localStorage) |
--config <path> |
Config file path |
--user-agent <ua> |
Custom User-Agent |
--proxy <url> |
Proxy server URL |
--proxy-bypass <hosts> |
Hosts to bypass proxy |
--ignore-https-errors |
Ignore HTTPS certificate errors |
--allow-file-access |
Allow file:// URLs |
--extension <path> |
Load Chrome extension (repeatable) |
--args <args> |
Extra Chrome arguments |
--download-path <path> |
Default download directory |
--screenshot-dir <path> |
Default screenshot output directory |
--screenshot-format <fmt> |
Screenshot format: png, jpeg |
--content-boundaries |
Wrap output with boundary markers |
--debug |
Debug output |
π Library API
Installation
go get github.com/libi/ko-browser@latest
Quick Example
package main
import (
"fmt"
"time"
"github.com/libi/ko-browser/browser"
)
func main() {
b, _ := browser.New(browser.Options{
Headless: true,
Timeout: 30 * time.Second,
})
defer b.Close()
// Navigate
b.Open("https://www.baidu.com")
// Snapshot β interact
snap, _ := b.Snapshot()
fmt.Println(snap.Text)
// Page: "ηΎεΊ¦δΈδΈοΌδ½ ε°±η₯ι"
//
// 1: link "ζ°ι»"
// 2: link "hao123"
// 3: textbox "ζη΄’" focused
// 4: button "ηΎεΊ¦δΈδΈ"
b.Click(3)
b.Type(3, "hello world")
b.Press("Enter")
// Wait & screenshot
b.WaitLoad()
b.Screenshot("result.png")
}
Advanced: Direct AX Tree Access
import (
"github.com/libi/ko-browser/axtree"
"github.com/libi/ko-browser/browser"
"github.com/libi/ko-browser/ocr"
)
b, _ := browser.New(browser.Options{Headless: true})
defer b.Close()
b.Open("https://example.com")
// Low-level AX Tree API
rawNodes, _ := axtree.Extract(b.Context())
tree := axtree.BuildAndFilter(rawNodes)
text := axtree.Format(tree)
idMap := axtree.BuildIDMap(tree)
// Snapshot with OCR
snap, _ := b.Snapshot(browser.SnapshotOptions{
EnableOCR: true,
OCRLanguages: []string{"eng", "chi_sim"},
})
Connect to Existing Browser
// Connect via CDP WebSocket
b, _ := browser.Connect("ws://localhost:9222/devtools/browser/...", browser.Options{})
defer b.Close()
snap, _ := b.Snapshot()
fmt.Println(snap.Text)
Full API Reference
Click to expand all Browser methods
Navigation
Open(url string) errorBack() errorForward() errorReload() error
Snapshot
Snapshot(opts ...SnapshotOptions) (*SnapshotResult, error)
Interaction
Click(id int) errorClickNewTab(id int) errorDblClick(id int) errorType(id int, text string) errorFill(id int, text string) errorPress(key string) errorKeyboardType(text string) errorKeyboardInsertText(text string) errorHover(id int) errorFocus(id int) errorCheck(id int) errorUncheck(id int) errorSelect(id int, values ...string) errorScroll(direction string, pixels int) errorScrollIntoView(id int) error
Mouse
MouseMove(x, y float64) errorMouseClick(x, y float64, opts ...MouseOptions) errorMouseDown(x, y float64, opts ...MouseOptions) errorMouseUp(x, y float64, opts ...MouseOptions) errorMouseWheel(x, y, deltaX, deltaY float64) errorDrag(srcID, dstID int) errorDragCoords(srcX, srcY, dstX, dstY float64) error
Query
GetTitle() (string, error)GetURL() (string, error)GetText(id int) (string, error)GetHTML(id int) (string, error)GetValue(id int) (string, error)GetAttr(id int, name string) (string, error)GetCount(cssSelector string) (int, error)GetBox(id int) (*BoxResult, error)GetStyles(id int) (string, error)GetCDPURL() (string, error)
State
IsVisible(id int) (bool, error)IsEnabled(id int) (bool, error)IsChecked(id int) (bool, error)
Find
FindRole(role, name string, opts ...FindOption) (*FindResults, error)FindText(text string, opts ...FindOption) (*FindResults, error)FindLabel(label string, opts ...FindOption) (*FindResults, error)FindPlaceholder(text string, opts ...FindOption) (*FindResults, error)FindAlt(text string, opts ...FindOption) (*FindResults, error)FindTitle(text string, opts ...FindOption) (*FindResults, error)FindTestID(testID string) (*FindResults, error)FindFirst(css string) (*FindResults, error)FindLast(css string) (*FindResults, error)FindNth(css string, n int) (*FindResults, error)
Wait
Wait(d time.Duration) errorWaitSelector(css string, timeout ...time.Duration) errorWaitURL(pattern string, timeout ...time.Duration) errorWaitLoad(timeout ...time.Duration) errorWaitText(text string, timeout ...time.Duration) errorWaitFunc(expression string, timeout ...time.Duration) errorWaitHidden(css string, timeout ...time.Duration) errorWaitDownload(savePath string, timeout ...time.Duration) (string, error)
Screenshot & PDF
Screenshot(path string, opts ...ScreenshotOptions) errorScreenshotToBytes(opts ...ScreenshotOptions) ([]byte, error)ScreenshotAnnotated(path string, opts ...ScreenshotOptions) errorPDF(path string, opts ...PDFOptions) error
Eval
Eval(expression string) (string, error)
File
Upload(id int, files ...string) errorUploadCSS(css string, files ...string) errorDownload(id int, saveDir string, opts ...DownloadOptions) (string, error)
Tabs
TabList() ([]TabInfo, error)TabNew(url string) errorTabClose(index int) errorTabSwitch(index int) error
Network
NetworkRoute(pattern string, action RouteAction) errorNetworkUnroute(pattern string) errorNetworkRequests() ([]NetworkRequest, error)NetworkStartLogging() errorNetworkClearRequests()
Storage & Cookies
CookiesGet() ([]CookieInfo, error)CookieSet(cookie CookieInfo) errorCookieDelete(name string) errorCookiesClear() errorStorageGet(storageType, key string) (string, error)StorageSet(storageType, key, value string) errorStorageDelete(storageType, key string) errorStorageClear(storageType string) errorStorageGetAll(storageType string) (map[string]string, error)
Settings
SetViewport(width, height int, scale ...float64) errorSetDevice(name string) errorSetGeo(lat, lon float64) errorClearGeo() errorSetOffline(offline bool) errorSetHeaders(headers map[string]string) errorSetCredentials(user, pass string) errorSetMedia(features ...MediaFeature) errorSetColorScheme(scheme string) error
Debug
ConsoleStart() errorConsoleMessages() ([]ConsoleMessage, error)ConsoleMessagesByLevel(level string) ([]ConsoleMessage, error)ConsoleClear()PageErrors() ([]PageError, error)PageErrorsClear()Highlight(id int) errorOpenDevTools() error
Clipboard
ClipboardRead() (string, error)ClipboardWrite(text string) errorClipboardCopy() errorClipboardPaste() error
Diff
DiffSnapshot(opts ...DiffSnapshotOptions) (*DiffSnapshotResult, error)DiffScreenshot(opts DiffScreenshotOptions) (*DiffScreenshotResult, error)DiffURL(url1, url2 string, opts ...DiffURLOptions) (*DiffURLResult, error)
Trace & Record
TraceStart(categories ...string) errorTraceStop(outputPath string) errorProfilerStart() errorProfilerStop(outputPath string) errorRecordStart(outputPath string) errorRecordStop() (int, error)
State
ExportState(outputPath string) errorImportState(inputPath string) errorApplyState(state *BrowserState) error
βοΈ Configuration
kbr loads configuration in this priority (low β high):
~/.ko-browser/config.jsonβ user-level defaults./ko-browser.jsonβ project-level overrides- Environment variables (
KO_BROWSER_*) - CLI flags β override everything
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data",
"screenshotFormat": "jpeg",
"downloadPath": "./downloads"
}
ποΈ Architecture
kbr
βββ cmd/kbr/ β
CLI entry point β `go install .../cmd/kbr@latest`
βββ browser/ β
Public Go library β core browser API
βββ axtree/ β
Public β AX Tree extraction, filtering, formatting
βββ selector/ β
Public β element selector parsing (ID/CSS/XPath)
βββ ocr/ β
Public β Tesseract OCR engine (build tag: ocr)
βββ cmd/ CLI β cobra command definitions
βββ internal/ CLI-only β daemon, session management
The browser/, axtree/, selector/, and ocr/ packages are all public and importable via go get. The internal/ package is only used by the CLI daemon. OCR requires -tags=ocr at build time.
Agent Notes
If you are an agent reading this README, use kbr as an interactive browser with a compact text interface.
- Prefer
kbr snapshotas the primary page representation. It is the most token-efficient output and gives stable numeric refs. - Use numeric refs like
kbr click 5,kbr type 8 "hello", andkbr get text 12after taking a fresh snapshot. - Re-snapshot after navigation or DOM mutations. Element IDs are snapshot-local, not permanent selectors.
- Use
kbr get,kbr find, andkbr waitfor targeted reads instead of repeatedly taking full screenshots. - Use OCR only when the page content is image-heavy or inaccessible through the DOM/accessibility tree.
- Prefer CSS/XPath selectors only when you already know the target precisely or when numeric refs are unavailable.
- The published Homebrew package and release binaries are built with OCR enabled, so Tesseract runtime libraries are expected on the host.
Install as an agent skill
If your agent framework supports local skills, install kbr first, then copy the repository skill directory into your agent's skills directory as ko-browser/.
<agent-skills-dir>/
ko-browser/
SKILL.md
The source skill lives here:
Example:
mkdir -p <agent-skills-dir>/ko-browser
cp skills/ko-browser/SKILL.md <agent-skills-dir>/ko-browser/SKILL.md
After that, make sure the kbr binary is available in the agent runtime PATH. For agent hosts that also support executable-tool registration, you can expose kbr directly in addition to the skill file.
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
git clone https://github.com/libi/ko-browser.git
cd ko-browser
go build -o kbr ./cmd/kbr/ # without OCR
go build -tags=ocr -o kbr ./cmd/kbr/ # with OCR
go test ./tests/ -v -timeout 180s
π License
MIT