README
ΒΆ
kbr (ko-browser)
A simple, fast, token-efficient browser for AI agents β CLI + Go Library
Quick Start β’ Commands β’ Library β’ Snapshot Format Spec β’ δΈζζζ‘£
ko-browser is a browser automation tool built in Go, designed for AI agents. It provides both a CLI tool and a Go library with a custom accessibility tree snapshot format that saves 46%+ tokens compared to alternatives.
β¨ Key Features
- π Single binary β no Node.js, no Playwright runtime needed
- π€ AI-optimized snapshot format β
id: role "name" statessaves 46%+ tokens - π¦ Dual-use β works as a CLI tool AND a Go library (
go get) - β‘ Fast startup β ~50ms (Go binary) vs ~500ms (Node.js-based tools)
- π’ Simple element references β
click 5 - π Optional OCR β Tesseract integration via
-tags=ocrbuild flag for image-heavy pages - π ~86 commands β full parity with agent-browser v0.19.0
Snapshot Format Comparison
ββ kbr (46% fewer tokens) βββββββββββ ββ agent-browser βββββββββββββββββββββ
β Page: "Example" β β - document "Example" β
β β β - navigation "main": β
β 1: link "Home" β β - link "Home" [ref=@e1] β
β 2: link "About" β β - link "About" [ref=@e2] β
β 3: textbox "Search" focused β β - search: β
β 4: button "Go" β β - textbox "Search" [ref=@e3] β
β β β - button "Go" [ref=@e4] β
ββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββ
π Read the full Snapshot Format Specification for detailed design decisions, BNF grammar, and examples. (δΈζη)
π¦ Installation
Homebrew
brew tap libi/tap
brew install ko-browser
Homebrew installs
kbrwith OCR enabled. It pulls intesseractautomatically and builds from source.
Pre-built binaries
Download from GitHub Releases:
Release binaries are also built with OCR enabled. Install Tesseract first so the runtime OCR libraries are available:
brew install tesseracton macOS,apt install libtesseract-devon Linux.
# macOS (Apple Silicon)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-darwin-arm64.tar.gz
tar xzf ko-browser-darwin-arm64.tar.gz
mv kbr /usr/local/bin/kbr
# macOS (Intel)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-darwin-amd64.tar.gz
tar xzf ko-browser-darwin-amd64.tar.gz
mv kbr /usr/local/bin/kbr
# Linux (amd64)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-linux-amd64.tar.gz
tar xzf ko-browser-linux-amd64.tar.gz
mv kbr /usr/local/bin/kbr
From source
# Install kbr with OCR support (requires Tesseract to be installed)
CGO_ENABLED=1 go install -tags=ocr github.com/libi/ko-browser/cmd/kbr@latest
OCR is required for the published packages. This requires Tesseract:
brew install tesseract(macOS) /apt install libtesseract-dev(Linux).
Install Chrome (if not already installed)
kbr install # check & download Chromium
kbr install --with-deps # also install system dependencies (Linux)
π Quick Start
CLI
# Open a page and take a snapshot
kbr open https://www.google.com
kbr snapshot
# Output:
# Page: "Google"
#
# 1: link "Gmail"
# 2: link "Images"
# 3: textbox "Search" focused
# 4: button "Google Search"
# 5: button "I'm Feeling Lucky"
# Interact with elements by ID
kbr click 3
kbr type 3 "ko-browser github"
kbr press Enter
# Take a screenshot
kbr screenshot result.png
# Close the browser
kbr close
Go Library
package main
import (
"fmt"
"log"
"github.com/libi/ko-browser/browser"
)
func main() {
b, err := browser.New(browser.Options{Headless: true})
if err != nil {
log.Fatal(err)
}
defer b.Close()
b.Open("https://www.google.com")
snap, _ := b.Snapshot()
fmt.Println(snap.Text)
// 1: link "Gmail"
// 2: link "Images"
// 3: textbox "Search" focused
// ...
b.Click(3)
b.Type(3, "ko-browser github")
b.Press("Enter")
}
π Commands
Core Interaction
| Command | CLI Usage | Library API |
|---|---|---|
| open | kbr open <url> |
b.Open(url) |
| click | kbr click <id> |
b.Click(id) |
| dblclick | kbr dblclick <id> |
b.DblClick(id) |
| type | kbr type <id> "text" |
b.Type(id, text) |
| fill | kbr fill <id> "text" |
b.Fill(id, text) |
| press | kbr press <key> |
b.Press(key) |
| hover | kbr hover <id> |
b.Hover(id) |
| focus | kbr focus <id> |
b.Focus(id) |
| check | kbr check <id> |
b.Check(id) |
| uncheck | kbr uncheck <id> |
b.Uncheck(id) |
| select | kbr select <id> "val" |
b.Select(id, vals...) |
| scroll | kbr scroll down 500 |
b.Scroll("down", 500) |
| drag | kbr drag <src> <dst> |
b.Drag(srcID, dstID) |
| close | kbr close |
b.Close() |
Keyboard
kbr press Enter
kbr press Control+a
kbr keyboard type "Hello, World!"
kbr keyboard inserttext "pasted content"
Snapshot & Screenshot
kbr snapshot # full accessibility tree
kbr snapshot -i # interactive elements only
kbr snapshot -c # compact mode
kbr snapshot -d 5 # max depth 5
kbr snapshot -C # include cursor elements
kbr snapshot -s "#main" # scope to CSS selector
kbr snapshot --ocr # with OCR for images
kbr screenshot out.png # viewport screenshot
kbr screenshot --full out.png # full page
kbr screenshot --annotate out.png # annotated with element IDs
Navigation
kbr back
kbr forward
kbr reload
Get Information
kbr get title # page title
kbr get url # current URL
kbr get text <id> # element inner text
kbr get html <id> # element innerHTML
kbr get value <id> # input value
kbr get attr <id> href # element attribute
kbr get count ".item" # count matching elements
kbr get box <id> # bounding box
kbr get styles <id> # computed styles
kbr get cdp-url # CDP WebSocket URL
Check State
kbr is visible <id> # β true/false
kbr is enabled <id>
kbr is checked <id>
Find Elements
kbr find role button --name Submit
kbr find text "Sign In"
kbr find label "Email"
kbr find placeholder "Search..."
kbr find alt "Logo"
kbr find title "tooltip"
kbr find testid "login-form"
kbr find first ".card"
kbr find last ".card"
kbr find nth 2 ".card"
# Add --exact for exact text matching
kbr find text "Sign In" --exact
Wait
kbr wait time 2s # wait 2 seconds
kbr wait selector "#loading" # wait for element
kbr wait url "**/dashboard" # wait for URL match
kbr wait load # wait for networkidle
kbr wait text "Welcome" # wait for text
kbr wait fn "window.ready" # wait for JS expression
kbr wait hidden "#spinner" # wait for element to hide
kbr wait download ./file.pdf # wait for download
Mouse (Low-level)
kbr mouse move 100 200
kbr mouse click 100 200
kbr mouse down 100 200
kbr mouse up 100 200
kbr mouse wheel 100 200 0 300 # x y deltaX deltaY
Tabs
kbr tab list
kbr tab new https://example.com
kbr tab switch 2
kbr tab close 1
Network
kbr network route "**/api/*" --action block
kbr network unroute "**/api/*"
kbr network requests
kbr network start-logging
kbr network clear
Storage & Cookies
kbr cookies get
kbr cookies set session_id "abc123" --domain example.com
kbr cookies delete session_id
kbr cookies clear
kbr storage get theme --type local
kbr storage set theme "dark" --type local
kbr storage delete theme --type local
kbr storage clear --type local
kbr storage list --type session
Browser Settings
kbr set viewport 1920 1080
kbr set viewport 1920 1080 2 # with 2x scale
kbr set device "iPhone 12"
kbr set geo 37.7749 -122.4194
kbr set offline true
kbr set headers '{"X-Custom":"value"}'
kbr set credentials admin secret
kbr set media dark
kbr set colorscheme dark
File Operations
kbr upload <id> ./document.pdf
kbr download <id> ./output/
JavaScript Evaluation
kbr eval "document.title"
kbr eval -b "ZG9jdW1lbnQudGl0bGU=" # base64 encoded
cat script.js | kbr eval --stdin
Diff
kbr diff snapshot # compare with last snapshot
kbr diff snapshot --baseline before.txt
kbr diff screenshot --baseline before.png
kbr diff url https://v1.example.com https://v2.example.com
Debug & Clipboard
kbr console messages
kbr console clear
kbr errors list
kbr highlight <id>
kbr inspect # open DevTools
kbr clipboard read
kbr clipboard write "text"
kbr clipboard copy # Ctrl+C
kbr clipboard paste # Ctrl+V
Trace & Record
kbr trace start
kbr trace stop ./trace.zip
kbr profiler start
kbr profiler stop ./profile.json
kbr record start ./recording
kbr record stop
Auth
kbr auth save github --url https://github.com/login --username user --password pass
kbr auth login github
kbr auth list
kbr auth show github
kbr auth delete github
Session Management
kbr session # show current session
kbr session list # list all active sessions
kbr --session test open example.com # use named session
Selector Syntax
kbr supports three selector formats, auto-detected:
| Input | Type | Example |
|---|---|---|
| Number | Snapshot ID | kbr click 5 |
| CSS | CSS Selector | kbr click "#submit" |
| XPath | XPath | kbr click "//button[@type='submit']" |
π§ Global Options
| Flag | Description |
|---|---|
--session <name> |
Isolated session name (default: "default") |
--headed |
Show browser window |
--json |
JSON output |
--timeout <duration> |
Operation timeout (default: 30s) |
--profile <path> |
Persistent Chrome user data directory |
--state <path> |
Load saved browser state (cookies + localStorage) |
--config <path> |
Config file path |
--user-agent <ua> |
Custom User-Agent |
--proxy <url> |
Proxy server URL |
--proxy-bypass <hosts> |
Hosts to bypass proxy |
--ignore-https-errors |
Ignore HTTPS certificate errors |
--allow-file-access |
Allow file:// URLs |
--extension <path> |
Load Chrome extension (repeatable) |
--args <args> |
Extra Chrome arguments |
--download-path <path> |
Default download directory |
--screenshot-dir <path> |
Default screenshot output directory |
--screenshot-format <fmt> |
Screenshot format: png, jpeg |
--content-boundaries |
Wrap output with boundary markers |
--debug |
Debug output |
π Library API
Installation
go get github.com/libi/ko-browser@latest
Quick Example
package main
import (
"fmt"
"time"
"github.com/libi/ko-browser/browser"
)
func main() {
b, _ := browser.New(browser.Options{
Headless: true,
Timeout: 30 * time.Second,
})
defer b.Close()
// Navigate
b.Open("https://www.baidu.com")
// Snapshot β interact
snap, _ := b.Snapshot()
fmt.Println(snap.Text)
// Page: "ηΎεΊ¦δΈδΈοΌδ½ ε°±η₯ι"
//
// 1: link "ζ°ι»"
// 2: link "hao123"
// 3: textbox "ζη΄’" focused
// 4: button "ηΎεΊ¦δΈδΈ"
b.Click(3)
b.Type(3, "hello world")
b.Press("Enter")
// Wait & screenshot
b.WaitLoad()
b.Screenshot("result.png")
}
Advanced: Direct AX Tree Access
import (
"github.com/libi/ko-browser/axtree"
"github.com/libi/ko-browser/browser"
"github.com/libi/ko-browser/ocr"
)
b, _ := browser.New(browser.Options{Headless: true})
defer b.Close()
b.Open("https://example.com")
// Low-level AX Tree API
rawNodes, _ := axtree.Extract(b.Context())
tree := axtree.BuildAndFilter(rawNodes)
text := axtree.Format(tree)
idMap := axtree.BuildIDMap(tree)
// Snapshot with OCR
snap, _ := b.Snapshot(browser.SnapshotOptions{
EnableOCR: true,
OCRLanguages: []string{"eng", "chi_sim"},
})
Connect to Existing Browser
// Connect via CDP WebSocket
b, _ := browser.Connect("ws://localhost:9222/devtools/browser/...", browser.Options{})
defer b.Close()
snap, _ := b.Snapshot()
fmt.Println(snap.Text)
Full API Reference
Click to expand all Browser methods
Navigation
Open(url string) errorBack() errorForward() errorReload() error
Snapshot
Snapshot(opts ...SnapshotOptions) (*SnapshotResult, error)
Interaction
Click(id int) errorClickNewTab(id int) errorDblClick(id int) errorType(id int, text string) errorFill(id int, text string) errorPress(key string) errorKeyboardType(text string) errorKeyboardInsertText(text string) errorHover(id int) errorFocus(id int) errorCheck(id int) errorUncheck(id int) errorSelect(id int, values ...string) errorScroll(direction string, pixels int) errorScrollIntoView(id int) error
Mouse
MouseMove(x, y float64) errorMouseClick(x, y float64, opts ...MouseOptions) errorMouseDown(x, y float64, opts ...MouseOptions) errorMouseUp(x, y float64, opts ...MouseOptions) errorMouseWheel(x, y, deltaX, deltaY float64) errorDrag(srcID, dstID int) errorDragCoords(srcX, srcY, dstX, dstY float64) error
Query
GetTitle() (string, error)GetURL() (string, error)GetText(id int) (string, error)GetHTML(id int) (string, error)GetValue(id int) (string, error)GetAttr(id int, name string) (string, error)GetCount(cssSelector string) (int, error)GetBox(id int) (*BoxResult, error)GetStyles(id int) (string, error)GetCDPURL() (string, error)
State
IsVisible(id int) (bool, error)IsEnabled(id int) (bool, error)IsChecked(id int) (bool, error)
Find
FindRole(role, name string, opts ...FindOption) (*FindResults, error)FindText(text string, opts ...FindOption) (*FindResults, error)FindLabel(label string, opts ...FindOption) (*FindResults, error)FindPlaceholder(text string, opts ...FindOption) (*FindResults, error)FindAlt(text string, opts ...FindOption) (*FindResults, error)FindTitle(text string, opts ...FindOption) (*FindResults, error)FindTestID(testID string) (*FindResults, error)FindFirst(css string) (*FindResults, error)FindLast(css string) (*FindResults, error)FindNth(css string, n int) (*FindResults, error)
Wait
Wait(d time.Duration) errorWaitSelector(css string, timeout ...time.Duration) errorWaitURL(pattern string, timeout ...time.Duration) errorWaitLoad(timeout ...time.Duration) errorWaitText(text string, timeout ...time.Duration) errorWaitFunc(expression string, timeout ...time.Duration) errorWaitHidden(css string, timeout ...time.Duration) errorWaitDownload(savePath string, timeout ...time.Duration) (string, error)
Screenshot & PDF
Screenshot(path string, opts ...ScreenshotOptions) errorScreenshotToBytes(opts ...ScreenshotOptions) ([]byte, error)ScreenshotAnnotated(path string, opts ...ScreenshotOptions) errorPDF(path string, opts ...PDFOptions) error
Eval
Eval(expression string) (string, error)
File
Upload(id int, files ...string) errorUploadCSS(css string, files ...string) errorDownload(id int, saveDir string, opts ...DownloadOptions) (string, error)
Tabs
TabList() ([]TabInfo, error)TabNew(url string) errorTabClose(index int) errorTabSwitch(index int) error
Network
NetworkRoute(pattern string, action RouteAction) errorNetworkUnroute(pattern string) errorNetworkRequests() ([]NetworkRequest, error)NetworkStartLogging() errorNetworkClearRequests()
Storage & Cookies
CookiesGet() ([]CookieInfo, error)CookieSet(cookie CookieInfo) errorCookieDelete(name string) errorCookiesClear() errorStorageGet(storageType, key string) (string, error)StorageSet(storageType, key, value string) errorStorageDelete(storageType, key string) errorStorageClear(storageType string) errorStorageGetAll(storageType string) (map[string]string, error)
Settings
SetViewport(width, height int, scale ...float64) errorSetDevice(name string) errorSetGeo(lat, lon float64) errorClearGeo() errorSetOffline(offline bool) errorSetHeaders(headers map[string]string) errorSetCredentials(user, pass string) errorSetMedia(features ...MediaFeature) errorSetColorScheme(scheme string) error
Debug
ConsoleStart() errorConsoleMessages() ([]ConsoleMessage, error)ConsoleMessagesByLevel(level string) ([]ConsoleMessage, error)ConsoleClear()PageErrors() ([]PageError, error)PageErrorsClear()Highlight(id int) errorOpenDevTools() error
Clipboard
ClipboardRead() (string, error)ClipboardWrite(text string) errorClipboardCopy() errorClipboardPaste() error
Diff
DiffSnapshot(opts ...DiffSnapshotOptions) (*DiffSnapshotResult, error)DiffScreenshot(opts DiffScreenshotOptions) (*DiffScreenshotResult, error)DiffURL(url1, url2 string, opts ...DiffURLOptions) (*DiffURLResult, error)
Trace & Record
TraceStart(categories ...string) errorTraceStop(outputPath string) errorProfilerStart() errorProfilerStop(outputPath string) errorRecordStart(outputPath string) errorRecordStop() (int, error)
State
ExportState(outputPath string) errorImportState(inputPath string) errorApplyState(state *BrowserState) error
βοΈ Configuration
kbr loads configuration in this priority (low β high):
~/.ko-browser/config.jsonβ user-level defaults./ko-browser.jsonβ project-level overrides- Environment variables (
KO_BROWSER_*) - CLI flags β override everything
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data",
"screenshotFormat": "jpeg",
"downloadPath": "./downloads"
}
ποΈ Architecture
kbr
βββ cmd/kbr/ β
CLI entry point β `go install .../cmd/kbr@latest`
βββ browser/ β
Public Go library β core browser API
βββ axtree/ β
Public β AX Tree extraction, filtering, formatting
βββ selector/ β
Public β element selector parsing (ID/CSS/XPath)
βββ ocr/ β
Public β Tesseract OCR engine (build tag: ocr)
βββ cmd/ CLI β cobra command definitions
βββ internal/ CLI-only β daemon, session management
The browser/, axtree/, selector/, and ocr/ packages are all public and importable via go get. The internal/ package is only used by the CLI daemon. OCR requires -tags=ocr at build time.
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
git clone https://github.com/libi/ko-browser.git
cd ko-browser
go build -o kbr ./cmd/kbr/ # without OCR
go build -tags=ocr -o kbr ./cmd/kbr/ # with OCR
go test ./tests/ -v -timeout 180s
π License
MIT