ko-browser

module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2026 License: MIT

README ΒΆ

kbr (ko-browser)

A simple, fast, token-efficient browser for AI agents β€” CLI + Go Library

Quick Start β€’ Commands β€’ Library β€’ Snapshot Format Spec β€’ δΈ­ζ–‡ζ–‡ζ‘£

Release CI Go Reference Go Report Card


ko-browser is a browser automation tool built in Go, designed for AI agents. It provides both a CLI tool and a Go library with a custom accessibility tree snapshot format that saves 46%+ tokens compared to alternatives.

✨ Key Features

  • πŸš€ Single binary β€” no Node.js, no Playwright runtime needed
  • πŸ€– AI-optimized snapshot format β€” id: role "name" states saves 46%+ tokens
  • πŸ“¦ Dual-use β€” works as a CLI tool AND a Go library (go get)
  • ⚑ Fast startup β€” ~50ms (Go binary) vs ~500ms (Node.js-based tools)
  • πŸ”’ Simple element references β€” click 5
  • πŸ” Optional OCR β€” Tesseract integration via -tags=ocr build flag for image-heavy pages
  • 🌐 ~86 commands β€” full parity with agent-browser v0.19.0

Snapshot Format Comparison

β”Œβ”€ kbr (46% fewer tokens) ──────────┐  β”Œβ”€ agent-browser ────────────────────┐
β”‚ Page: "Example"                    β”‚  β”‚ - document "Example"               β”‚
β”‚                                    β”‚  β”‚   - navigation "main":             β”‚
β”‚ 1: link "Home"                     β”‚  β”‚     - link "Home" [ref=@e1]        β”‚
β”‚ 2: link "About"                    β”‚  β”‚     - link "About" [ref=@e2]       β”‚
β”‚ 3: textbox "Search" focused        β”‚  β”‚   - search:                        β”‚
β”‚ 4: button "Go"                     β”‚  β”‚     - textbox "Search" [ref=@e3]   β”‚
β”‚                                    β”‚  β”‚     - button "Go" [ref=@e4]        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“– Read the full Snapshot Format Specification for detailed design decisions, BNF grammar, and examples. (δΈ­ζ–‡η‰ˆ)


πŸ“¦ Installation

Homebrew

brew tap libi/tap
brew install ko-browser

Homebrew installs kbr with OCR enabled. It pulls in tesseract automatically and builds from source.

Pre-built binaries

Download from GitHub Releases:

Release binaries are also built with OCR enabled. Install Tesseract first so the runtime OCR libraries are available: brew install tesseract on macOS, apt install libtesseract-dev on Linux.

# macOS (Apple Silicon)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-darwin-arm64.tar.gz
tar xzf ko-browser-darwin-arm64.tar.gz
mv kbr /usr/local/bin/kbr

# macOS (Intel)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-darwin-amd64.tar.gz
tar xzf ko-browser-darwin-amd64.tar.gz
mv kbr /usr/local/bin/kbr

# Linux (amd64)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-linux-amd64.tar.gz
tar xzf ko-browser-linux-amd64.tar.gz
mv kbr /usr/local/bin/kbr

From source

# Install kbr with OCR support (requires Tesseract to be installed)
CGO_ENABLED=1 go install -tags=ocr github.com/libi/ko-browser/cmd/kbr@latest

OCR is required for the published packages. This requires Tesseract: brew install tesseract (macOS) / apt install libtesseract-dev (Linux).

Install Chrome (if not already installed)

kbr install              # check & download Chromium
kbr install --with-deps  # also install system dependencies (Linux)

πŸš€ Quick Start

CLI

# Open a page and take a snapshot
kbr open https://www.google.com
kbr snapshot

# Output:
# Page: "Google"
#
# 1: link "Gmail"
# 2: link "Images"
# 3: textbox "Search" focused
# 4: button "Google Search"
# 5: button "I'm Feeling Lucky"

# Interact with elements by ID
kbr click 3
kbr type 3 "ko-browser github"
kbr press Enter

# Take a screenshot
kbr screenshot result.png

# Close the browser
kbr close

Go Library

package main

import (
    "fmt"
    "log"

    "github.com/libi/ko-browser/browser"
)

func main() {
    b, err := browser.New(browser.Options{Headless: true})
    if err != nil {
        log.Fatal(err)
    }
    defer b.Close()

    b.Open("https://www.google.com")

    snap, _ := b.Snapshot()
    fmt.Println(snap.Text)
    // 1: link "Gmail"
    // 2: link "Images"
    // 3: textbox "Search" focused
    // ...

    b.Click(3)
    b.Type(3, "ko-browser github")
    b.Press("Enter")
}

πŸ“– Commands

Core Interaction

Command CLI Usage Library API
open kbr open <url> b.Open(url)
click kbr click <id> b.Click(id)
dblclick kbr dblclick <id> b.DblClick(id)
type kbr type <id> "text" b.Type(id, text)
fill kbr fill <id> "text" b.Fill(id, text)
press kbr press <key> b.Press(key)
hover kbr hover <id> b.Hover(id)
focus kbr focus <id> b.Focus(id)
check kbr check <id> b.Check(id)
uncheck kbr uncheck <id> b.Uncheck(id)
select kbr select <id> "val" b.Select(id, vals...)
scroll kbr scroll down 500 b.Scroll("down", 500)
drag kbr drag <src> <dst> b.Drag(srcID, dstID)
close kbr close b.Close()

Keyboard

kbr press Enter
kbr press Control+a
kbr keyboard type "Hello, World!"
kbr keyboard inserttext "pasted content"

Snapshot & Screenshot

kbr snapshot                     # full accessibility tree
kbr snapshot -i                  # interactive elements only
kbr snapshot -c                  # compact mode
kbr snapshot -d 5                # max depth 5
kbr snapshot -C                  # include cursor elements
kbr snapshot -s "#main"          # scope to CSS selector
kbr snapshot --ocr               # with OCR for images

kbr screenshot out.png           # viewport screenshot
kbr screenshot --full out.png    # full page
kbr screenshot --annotate out.png # annotated with element IDs

Navigation

kbr back
kbr forward
kbr reload

Get Information

kbr get title                    # page title
kbr get url                      # current URL
kbr get text <id>                # element inner text
kbr get html <id>                # element innerHTML
kbr get value <id>               # input value
kbr get attr <id> href           # element attribute
kbr get count ".item"            # count matching elements
kbr get box <id>                 # bounding box
kbr get styles <id>              # computed styles
kbr get cdp-url                  # CDP WebSocket URL

Check State

kbr is visible <id>              # β†’ true/false
kbr is enabled <id>
kbr is checked <id>

Find Elements

kbr find role button --name Submit
kbr find text "Sign In"
kbr find label "Email"
kbr find placeholder "Search..."
kbr find alt "Logo"
kbr find title "tooltip"
kbr find testid "login-form"
kbr find first ".card"
kbr find last ".card"
kbr find nth 2 ".card"
# Add --exact for exact text matching
kbr find text "Sign In" --exact

Wait

kbr wait time 2s                 # wait 2 seconds
kbr wait selector "#loading"     # wait for element
kbr wait url "**/dashboard"      # wait for URL match
kbr wait load                    # wait for networkidle
kbr wait text "Welcome"          # wait for text
kbr wait fn "window.ready"       # wait for JS expression
kbr wait hidden "#spinner"       # wait for element to hide
kbr wait download ./file.pdf     # wait for download

Mouse (Low-level)

kbr mouse move 100 200
kbr mouse click 100 200
kbr mouse down 100 200
kbr mouse up 100 200
kbr mouse wheel 100 200 0 300    # x y deltaX deltaY

Tabs

kbr tab list
kbr tab new https://example.com
kbr tab switch 2
kbr tab close 1

Network

kbr network route "**/api/*" --action block
kbr network unroute "**/api/*"
kbr network requests
kbr network start-logging
kbr network clear

Storage & Cookies

kbr cookies get
kbr cookies set session_id "abc123" --domain example.com
kbr cookies delete session_id
kbr cookies clear

kbr storage get theme --type local
kbr storage set theme "dark" --type local
kbr storage delete theme --type local
kbr storage clear --type local
kbr storage list --type session

Browser Settings

kbr set viewport 1920 1080
kbr set viewport 1920 1080 2     # with 2x scale
kbr set device "iPhone 12"
kbr set geo 37.7749 -122.4194
kbr set offline true
kbr set headers '{"X-Custom":"value"}'
kbr set credentials admin secret
kbr set media dark
kbr set colorscheme dark

File Operations

kbr upload <id> ./document.pdf
kbr download <id> ./output/

JavaScript Evaluation

kbr eval "document.title"
kbr eval -b "ZG9jdW1lbnQudGl0bGU="   # base64 encoded
cat script.js | kbr eval --stdin

Diff

kbr diff snapshot                              # compare with last snapshot
kbr diff snapshot --baseline before.txt
kbr diff screenshot --baseline before.png
kbr diff url https://v1.example.com https://v2.example.com

Debug & Clipboard

kbr console messages
kbr console clear
kbr errors list
kbr highlight <id>
kbr inspect                      # open DevTools

kbr clipboard read
kbr clipboard write "text"
kbr clipboard copy               # Ctrl+C
kbr clipboard paste              # Ctrl+V

Trace & Record

kbr trace start
kbr trace stop ./trace.zip

kbr profiler start
kbr profiler stop ./profile.json

kbr record start ./recording
kbr record stop

Auth

kbr auth save github --url https://github.com/login --username user --password pass
kbr auth login github
kbr auth list
kbr auth show github
kbr auth delete github

Session Management

kbr session                      # show current session
kbr session list                 # list all active sessions
kbr --session test open example.com  # use named session

Selector Syntax

kbr supports three selector formats, auto-detected:

Input Type Example
Number Snapshot ID kbr click 5
CSS CSS Selector kbr click "#submit"
XPath XPath kbr click "//button[@type='submit']"

πŸ”§ Global Options

Flag Description
--session <name> Isolated session name (default: "default")
--headed Show browser window
--json JSON output
--timeout <duration> Operation timeout (default: 30s)
--profile <path> Persistent Chrome user data directory
--state <path> Load saved browser state (cookies + localStorage)
--config <path> Config file path
--user-agent <ua> Custom User-Agent
--proxy <url> Proxy server URL
--proxy-bypass <hosts> Hosts to bypass proxy
--ignore-https-errors Ignore HTTPS certificate errors
--allow-file-access Allow file:// URLs
--extension <path> Load Chrome extension (repeatable)
--args <args> Extra Chrome arguments
--download-path <path> Default download directory
--screenshot-dir <path> Default screenshot output directory
--screenshot-format <fmt> Screenshot format: png, jpeg
--content-boundaries Wrap output with boundary markers
--debug Debug output

πŸ“š Library API

Installation

go get github.com/libi/ko-browser@latest

Quick Example

package main

import (
    "fmt"
    "time"

    "github.com/libi/ko-browser/browser"
)

func main() {
    b, _ := browser.New(browser.Options{
        Headless: true,
        Timeout:  30 * time.Second,
    })
    defer b.Close()

    // Navigate
    b.Open("https://www.baidu.com")

    // Snapshot β†’ interact
    snap, _ := b.Snapshot()
    fmt.Println(snap.Text)
    // Page: "η™ΎεΊ¦δΈ€δΈ‹οΌŒδ½ ε°±ηŸ₯道"
    //
    // 1: link "ζ–°ι—»"
    // 2: link "hao123"
    // 3: textbox "搜紒" focused
    // 4: button "η™ΎεΊ¦δΈ€δΈ‹"

    b.Click(3)
    b.Type(3, "hello world")
    b.Press("Enter")

    // Wait & screenshot
    b.WaitLoad()
    b.Screenshot("result.png")
}

Advanced: Direct AX Tree Access

import (
    "github.com/libi/ko-browser/axtree"
    "github.com/libi/ko-browser/browser"
    "github.com/libi/ko-browser/ocr"
)

b, _ := browser.New(browser.Options{Headless: true})
defer b.Close()
b.Open("https://example.com")

// Low-level AX Tree API
rawNodes, _ := axtree.Extract(b.Context())
tree := axtree.BuildAndFilter(rawNodes)
text := axtree.Format(tree)
idMap := axtree.BuildIDMap(tree)

// Snapshot with OCR
snap, _ := b.Snapshot(browser.SnapshotOptions{
    EnableOCR:    true,
    OCRLanguages: []string{"eng", "chi_sim"},
})

Connect to Existing Browser

// Connect via CDP WebSocket
b, _ := browser.Connect("ws://localhost:9222/devtools/browser/...", browser.Options{})
defer b.Close()

snap, _ := b.Snapshot()
fmt.Println(snap.Text)

Full API Reference

Click to expand all Browser methods

Navigation

  • Open(url string) error
  • Back() error
  • Forward() error
  • Reload() error

Snapshot

  • Snapshot(opts ...SnapshotOptions) (*SnapshotResult, error)

Interaction

  • Click(id int) error
  • ClickNewTab(id int) error
  • DblClick(id int) error
  • Type(id int, text string) error
  • Fill(id int, text string) error
  • Press(key string) error
  • KeyboardType(text string) error
  • KeyboardInsertText(text string) error
  • Hover(id int) error
  • Focus(id int) error
  • Check(id int) error
  • Uncheck(id int) error
  • Select(id int, values ...string) error
  • Scroll(direction string, pixels int) error
  • ScrollIntoView(id int) error

Mouse

  • MouseMove(x, y float64) error
  • MouseClick(x, y float64, opts ...MouseOptions) error
  • MouseDown(x, y float64, opts ...MouseOptions) error
  • MouseUp(x, y float64, opts ...MouseOptions) error
  • MouseWheel(x, y, deltaX, deltaY float64) error
  • Drag(srcID, dstID int) error
  • DragCoords(srcX, srcY, dstX, dstY float64) error

Query

  • GetTitle() (string, error)
  • GetURL() (string, error)
  • GetText(id int) (string, error)
  • GetHTML(id int) (string, error)
  • GetValue(id int) (string, error)
  • GetAttr(id int, name string) (string, error)
  • GetCount(cssSelector string) (int, error)
  • GetBox(id int) (*BoxResult, error)
  • GetStyles(id int) (string, error)
  • GetCDPURL() (string, error)

State

  • IsVisible(id int) (bool, error)
  • IsEnabled(id int) (bool, error)
  • IsChecked(id int) (bool, error)

Find

  • FindRole(role, name string, opts ...FindOption) (*FindResults, error)
  • FindText(text string, opts ...FindOption) (*FindResults, error)
  • FindLabel(label string, opts ...FindOption) (*FindResults, error)
  • FindPlaceholder(text string, opts ...FindOption) (*FindResults, error)
  • FindAlt(text string, opts ...FindOption) (*FindResults, error)
  • FindTitle(text string, opts ...FindOption) (*FindResults, error)
  • FindTestID(testID string) (*FindResults, error)
  • FindFirst(css string) (*FindResults, error)
  • FindLast(css string) (*FindResults, error)
  • FindNth(css string, n int) (*FindResults, error)

Wait

  • Wait(d time.Duration) error
  • WaitSelector(css string, timeout ...time.Duration) error
  • WaitURL(pattern string, timeout ...time.Duration) error
  • WaitLoad(timeout ...time.Duration) error
  • WaitText(text string, timeout ...time.Duration) error
  • WaitFunc(expression string, timeout ...time.Duration) error
  • WaitHidden(css string, timeout ...time.Duration) error
  • WaitDownload(savePath string, timeout ...time.Duration) (string, error)

Screenshot & PDF

  • Screenshot(path string, opts ...ScreenshotOptions) error
  • ScreenshotToBytes(opts ...ScreenshotOptions) ([]byte, error)
  • ScreenshotAnnotated(path string, opts ...ScreenshotOptions) error
  • PDF(path string, opts ...PDFOptions) error

Eval

  • Eval(expression string) (string, error)

File

  • Upload(id int, files ...string) error
  • UploadCSS(css string, files ...string) error
  • Download(id int, saveDir string, opts ...DownloadOptions) (string, error)

Tabs

  • TabList() ([]TabInfo, error)
  • TabNew(url string) error
  • TabClose(index int) error
  • TabSwitch(index int) error

Network

  • NetworkRoute(pattern string, action RouteAction) error
  • NetworkUnroute(pattern string) error
  • NetworkRequests() ([]NetworkRequest, error)
  • NetworkStartLogging() error
  • NetworkClearRequests()

Storage & Cookies

  • CookiesGet() ([]CookieInfo, error)
  • CookieSet(cookie CookieInfo) error
  • CookieDelete(name string) error
  • CookiesClear() error
  • StorageGet(storageType, key string) (string, error)
  • StorageSet(storageType, key, value string) error
  • StorageDelete(storageType, key string) error
  • StorageClear(storageType string) error
  • StorageGetAll(storageType string) (map[string]string, error)

Settings

  • SetViewport(width, height int, scale ...float64) error
  • SetDevice(name string) error
  • SetGeo(lat, lon float64) error
  • ClearGeo() error
  • SetOffline(offline bool) error
  • SetHeaders(headers map[string]string) error
  • SetCredentials(user, pass string) error
  • SetMedia(features ...MediaFeature) error
  • SetColorScheme(scheme string) error

Debug

  • ConsoleStart() error
  • ConsoleMessages() ([]ConsoleMessage, error)
  • ConsoleMessagesByLevel(level string) ([]ConsoleMessage, error)
  • ConsoleClear()
  • PageErrors() ([]PageError, error)
  • PageErrorsClear()
  • Highlight(id int) error
  • OpenDevTools() error

Clipboard

  • ClipboardRead() (string, error)
  • ClipboardWrite(text string) error
  • ClipboardCopy() error
  • ClipboardPaste() error

Diff

  • DiffSnapshot(opts ...DiffSnapshotOptions) (*DiffSnapshotResult, error)
  • DiffScreenshot(opts DiffScreenshotOptions) (*DiffScreenshotResult, error)
  • DiffURL(url1, url2 string, opts ...DiffURLOptions) (*DiffURLResult, error)

Trace & Record

  • TraceStart(categories ...string) error
  • TraceStop(outputPath string) error
  • ProfilerStart() error
  • ProfilerStop(outputPath string) error
  • RecordStart(outputPath string) error
  • RecordStop() (int, error)

State

  • ExportState(outputPath string) error
  • ImportState(inputPath string) error
  • ApplyState(state *BrowserState) error

βš™οΈ Configuration

kbr loads configuration in this priority (low β†’ high):

  1. ~/.ko-browser/config.json β€” user-level defaults
  2. ./ko-browser.json β€” project-level overrides
  3. Environment variables (KO_BROWSER_*)
  4. CLI flags β€” override everything
{
  "headed": true,
  "proxy": "http://localhost:8080",
  "profile": "./browser-data",
  "screenshotFormat": "jpeg",
  "downloadPath": "./downloads"
}

πŸ—οΈ Architecture

kbr
β”œβ”€β”€ cmd/kbr/      β˜… CLI entry point β€” `go install .../cmd/kbr@latest`
β”œβ”€β”€ browser/      β˜… Public Go library β€” core browser API
β”œβ”€β”€ axtree/       β˜… Public β€” AX Tree extraction, filtering, formatting
β”œβ”€β”€ selector/     β˜… Public β€” element selector parsing (ID/CSS/XPath)
β”œβ”€β”€ ocr/          β˜… Public β€” Tesseract OCR engine (build tag: ocr)
β”œβ”€β”€ cmd/            CLI β€” cobra command definitions
└── internal/       CLI-only β€” daemon, session management

The browser/, axtree/, selector/, and ocr/ packages are all public and importable via go get. The internal/ package is only used by the CLI daemon. OCR requires -tags=ocr at build time.


🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

git clone https://github.com/libi/ko-browser.git
cd ko-browser
go build -o kbr ./cmd/kbr/              # without OCR
go build -tags=ocr -o kbr ./cmd/kbr/     # with OCR
go test ./tests/ -v -timeout 180s

πŸ“„ License

MIT


Directories ΒΆ

Path Synopsis
cmd
kbr command
internal
ocr

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL