unblink

module
v0.23.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 5, 2026 License: MIT

README

CI Release Go Reference

A pure-Go (no cgo, no Chromium) "web browser" whose purpose is not visual rendering but exposing web information to an AI model. unblink fetches a page, parses it, strips visual-only junk (scripts, styles, navigation, ads), and hands the model a clean, token-budgeted Markdown representation — over the Model Context Protocol (MCP).

An AI doesn't need pixels. It needs structured meaning.

fetch(url) → parse HTML5 → [optionally execute JS] → semantic reduction → emit Markdown

Status

v0.23.0. The full pipeline works end to end: 18 MCP tools covering reading, navigation, sessions, forms, structured data, schema extraction, page inspection, site discovery, and search. The static read path turns most server-rendered pages into clean Markdown with zero JavaScript; the JavaScript engine — on by default (opt out with --disable-js) — renders mainstream SPA frameworks and powers live interactive sessions (interact), and can runtime-load a WebExtension (e.g. uBlock Origin Lite) for ad/tracker blocking. It ships as one static binary (~37 MB) with a ~29 MB idle footprint — measured head-to-head on identical pages, roughly 5× lighter idle and 10× faster to start than a headless Chromium, turning SPA fixtures around in ~2–10 ms per render (ahead of the warm-browser MCP tools on the same corpus — the settle proves idleness instead of waiting out a quiet window, ADR 0004), and it reads a nav-heavy page for ~1% of the tokens of a browser-tool accessibility snapshot (measured, against all four alternatives). See docs/architecture.md for the full design (phases 0–26, plus the extract/collections and WebExtensions work) and its non-goals, and docs/comparison.md for how unblink compares to other AI web-browsing tools (Playwright MCP, Charlotte, Obscura, Lightpanda).

Requirements

The official MCP Go SDK requires Go ≥ 1.25. The Makefile sets GOTOOLCHAIN=auto, so the go command downloads the toolchain pinned in go.mod automatically — you do not need to install Go 1.25 yourself, and your global go env is left untouched. (If you run go directly rather than via make, prefix commands with GOTOOLCHAIN=auto.)

Install

GOTOOLCHAIN=auto go install github.com/christopherdavenport/unblink/cmd/unblink@latest

Or run the multi-arch (amd64/arm64) Docker image — no Go toolchain needed:

docker run -i --rm ghcr.io/christopherdavenport/unblink:latest --version

Or download a prebuilt binary from the GitHub releases page, or build from source (see Build & run).

Use it with an MCP client

unblink speaks MCP over stdio, so any MCP-capable client launches it as a subprocess. For Claude Code:

claude mcp add unblink -- /path/to/unblink
# or, via Docker (no install):
claude mcp add unblink -- docker run -i --rm ghcr.io/christopherdavenport/unblink:latest

For Claude Desktop (or any client using the mcpServers config shape), add to claude_desktop_config.json:

{
  "mcpServers": {
    "unblink": {
      "command": "/path/to/unblink",
      "args": []
    }
    // or, via Docker:
    // "unblink": {
    //   "command": "docker",
    //   "args": ["run", "-i", "--rm", "ghcr.io/christopherdavenport/unblink:latest"]
    // }
  }
}

JavaScript rendering is on by default; add --disable-js for the zero-JavaScript static read path (lighter, still handles most server-rendered pages). Add flags like --search-provider or --tls-mimic to args as needed — see Configuration.

unblink is also listed in the MCP registry as io.github.ChristopherDavenport/unblink, and the repo ships a Claude Code plugin manifest (.claude-plugin/plugin.json) pinned to the current release image.

Build & run

make build           # -> bin/unblink
make test            # run the test suite
./bin/unblink            # serve MCP over stdio (JavaScript rendering on by default)
./bin/unblink --disable-js  # zero-JavaScript static read path only
./bin/unblink --version

JavaScript rendering (pure Go, no cgo, no Chromium) runs a page's scripts against a hand-rolled DOM over the parsed tree: inline and external scripts, ES modules (<script type=module>, import/export, dynamic import(), import maps — bundled with esbuild), window.fetch + XMLHttpRequest, DOM events with full capture/bubble propagation (delegated listeners, once/passive/{signal} options, AbortController, typed Event subclasses), and document.cookie (backed by the session jar). Page-JS network requests are guarded — requests to private/loopback/metadata IPs are blocked and a per-render download budget applies (--js-no-network, --js-allow-private, --js-max-bytes). A background pool of fresh runtimes keeps render latency low (--js-prewarm, 0 disables); the per-render budget defaults to 5s (--js-timeout). With a session, interact keeps a live runtime alive for the page so JS state persists across calls (a true browser-tab session); live runtimes are capped (--js-max-live, LRU torn down) and both window.localStorage and window.sessionStorage persist per session (a session is a tab), so SPA auth/state flows survive across calls. Common globals that bundles use without feature-detection are covered: structuredClone, a connection-less WebSocket stub (error→close), inert Worker, append-mode document.write, and hashchange. The engine (on by default) renders the mainstream SPA frameworks (React, Vue, Preact, Svelte, Lit / web components) via a flat-DOM model — a real Node/Element/HTMLElement prototype chain, MutationObserver, custom-element upgrade, and an encapsulating, composed Shadow DOM: each shadow root is a detached subtree (so page JS querySelector respects the boundary), and a compose pass flattens it — resolving <slot> distribution — into the light tree for extraction. Events cross the boundary correctly (composed path, target retargeting, composedPath()), and declarative Shadow DOM (<template shadowrootmode>) renders on the static no-JS path. Layout/geometry is constant-stubbed (no pixel layout engine), and canvas/WebGL, Workers/WebSocket/IndexedDB, and Shadow-DOM style scoping (:host/::slotted/::part) remain out of scope.

Try it

unblink speaks MCP over stdio. Point any MCP-capable client at the unblink binary, or drive it by hand:

{ printf '%s\n' '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18","capabilities":{},"clientInfo":{"name":"probe","version":"0"}}}'; \
  sleep 0.3; \
  printf '%s\n' '{"jsonrpc":"2.0","method":"notifications/initialized"}'; \
  printf '%s\n' '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"read","arguments":{"url":"https://example.com"}}}'; \
  sleep 2; } | ./bin/unblink
Tools

Every page tool accepts an optional session (any string — cookies and history persist across calls; auto-created on first use), use_current (act on the session's current page instead of fetching a URL), and render (run the page's JavaScript first — on by default; pass render=false to skip it, or start the server with --disable-js to turn JS off entirely).

Idle sessions are evicted (default 30 minutes, tune with --session-ttl / --session-cap); an evicted id then errors with session_expired and must be re-created via session(action=new) — credentials are never carried over silently. Errors follow a stable error [code]: message convention (bad_input, session_expired, no_current_page, js_required, not_configured, blocked (SSRF guard), cursor_expired, timeout, fetch_failed…), and every tool carries MCP annotations (read-only vs state-changing) so hosts can gate sensitive actions. JS render diagnostics (framework, js_errors, article_fallback) are reported in read/interact results rather than logged away.

Tool Input Returns
read { url?, session?, use_current?, mode?, format?, selector?, max_tokens?, cursor?, wait_for?, wait_text?, wait_timeout?, headers?, auth? } Main content (mode=article, default) or whole page (full) as Markdown, paginated via cursor. format=raw_html returns the unreduced source (optionally scoped by a CSS selector) — the escape hatch for scripts/forms/SSR-embedded JSON that reduction strips; format=text returns visible plain text. wait_for (CSS selector) / wait_text hold the JS render open until that content hydrates (forces the render even if render=false); wait_timeout (seconds, capped ~30s) extends the wait, and wait_met in the result reports whether it appeared. headers/auth attach one-shot credentials for a stateless gated GET (see Authentication).
browse { url?, session?, use_current?, headers?, auth? } Cheap orientation: title, description, lang, heading outline, link/form/image counts, excerpt, semantic region map, plus llms_txt/robots presence hints. Also returns a collections inventory — auto-detected repeating record-sets (product lists, search results, table-like rows), each with a ready-to-use extract schema (root selector + field selectors, tagged with its region). Hand one straight to extract — no need to read raw HTML to find selectors.
links { url?, session?, use_current?, filter?, internal_only?, limit? } The page's links (text + absolute href), optionally filtered. limit defaults to 200 (cap 1000); total/truncated report the rest.
forms { url?, session?, use_current? } The page's forms and their fields (name, type, required, options).
find { url?, session?, use_current?, query, max_hits? } Matching text snippets with the heading path locating each.
site { url?, session?, use_current? } A host's agent-facing metadata: robots.txt summary (allow/disallow for a browser agent, crawl-delay, sitemaps) + llms.txt content + whether llms-full.txt exists. Context only — never blocks a fetch.
click { session, link_index? | match?, render? } Follows a link from the session's current page (cookies carried); returns a summary. The destination's JavaScript runs by default; pass render=false to skip it.
submit_form { session, form?, values?, files?, render? } Submits a form from the current page (cookies carried); returns a summary. Forms declaring enctype=multipart/form-data are encoded as multipart automatically; files attaches uploads ({field, filename?, mime?, content | content_base64}, capped 8 files / 4 MiB — content is supplied inline, never read from disk; needs a POST form). The result page's JavaScript runs by default; pass render=false to skip it.
controls { url?, session?, use_current? } Non-link interactive controls (buttons, role=button, onclick/tabindex, submit/reset inputs, tabs, summaries), each with a stable CSS selector for interact.
interact { session, selector, event?, value?, key? } Dispatches an interaction at a selector and runs the page's JS so its handlers fire, then returns the updated page. event defaults to click, which emulates a full primary-button press (pointerdownmousedown→focus→pointerupmouseupclick) so press/pointer-based widgets (react-aria/Radix tabs, toggles, menus) actually activate — not just plain onclick; also hover (reveal hover menus/tooltips), focus (focus-triggered dropdowns), input, change, keydown/keyup/keypress, submit. For key events, key names the key (Enter, Escape, ArrowDown, a single character…; defaults to Enter) so handlers reading e.key/e.keyCode fire — and Enter on a control inside a form submits it; combine with value to type-then-press ({event:"keydown", value:"query", key:"Enter"} drives a search box). The session keeps a live JS runtime, so state (variables, listeners, timers, fetched data) persists across calls. Needs JS (on by default; --disable-js turns it off). Does not navigate — but a handler that requests a cross-document navigation (location.href/assign/replace) surfaces the target as pending_navigation so you can follow it with read/click.
data { url?, session?, use_current?, kind? } Machine-readable structured data embedded in a page: JSON-LD (schema.org), HTML data tables (caption/headers/rows), and microdata (itemscope/itemprop). kind selects jsonld, tables, microdata, or all (default). HTML only. (Tables: colspan and rowspan are expanded onto the real grid; microdata itemref unsupported; JSON-LD @graph is flattened. raw_html returns source with relative URLs left as-is.)
extract { url?, session?, use_current?, root?, fields, limit? } Caller-directed structured extraction with a CSS-selector schema: fields maps each output name to a selector (a string takes the element's collapsed text; {selector, attr} takes an attribute value instead). An optional root selector emits one record per matching container (e.g. root="li.product"); omit it to treat the whole document as one record. Returns an array of records; limit defaults to 50 (cap 200) and truncated reports whether more matched. Read-only, HTML only, no JS injection — pure selection over the parsed DOM. Fields that match nothing are omitted; attribute values are returned verbatim (not URL-resolved). Use data for auto-discovered JSON-LD/tables/microdata; call browse first to have unblink propose a root+fields schema for you.
requests { url?, session?, use_current? } The network requests the page's JavaScript made while rendering (fetch/XHR, scripts, modules, dynamic imports), each with method/url/status. The escape hatch for data-driven pages: render once, see the JSON endpoint the page fetched, then read it directly instead of scraping the hydrated DOM. Requires JavaScript (not exposed under --disable-js).
console { url?, session?, use_current?, level? } The page's captured console output (log/info/warn/error/debug) from its JavaScript render, in order — for debugging why a page rendered as it did (boot errors, failed data loads, framework warnings). level filters to one severity. Requires JavaScript (not exposed under --disable-js).
cookies { session, action?, url?, cookies? } Inspect or change a session's cookies, scoped to an origin (url, else the session's current page). action=list (default) returns the jar's cookies as name/value; set adds/updates the cookies you pass; clear expires them. Requires a session.
session { action: new|list|state|history|back|forward|close, session?, url?, headers?, cookies?, auth? } Manage a session's lifecycle and navigation. new accepts url + headers/cookies/auth to attach credentials for that origin (see Authentication); re-creating a live id with new credentials errors (close it first). list returns every live session's state (including live_js, whether a persistent runtime is attached).
map { url, max_urls?, max_depth? } Discover a site's URLs: harvests sitemap.xml (robots.txt + /sitemap.xml, following sitemap indexes) and crawls same-origin links breadth-first from the seed. Returns a bounded, de-duplicated list tagged source=sitemap|crawl with depth. Exposure-grade — surfaces robots.txt but never gates on it. Send an MCP progress token (_meta.progressToken) to stream progress while the walk (up to 60s) runs.
search { query, count?, site? } Web search via the configured provider (SearXNG or Brave): ranked results (title, url, snippet). site restricts to one domain. Requires --search-provider (see Search); not exposed otherwise.
Tool selection

Every tool the server advertises costs the model context on each turn, so unblink exposes only tools that can actually do something, and lets you narrow that further:

  • Unusable tools are hidden automatically. search isn't exposed without a --search-provider, and interact/requests/console aren't exposed under --disable-js — a tool that could only return an error is noise in the tool list.
  • Pick a subset with --tools. Pass a comma-separated list of tool names and/or presets. Presets: core (read, browse, find), read-only (every read-only tool — no sessions, navigation, or form/interaction writes), and full (everything, the default). --disable-tools subtracts from the set.
./bin/unblink --tools core                 # minimal reading surface: read, browse, find
./bin/unblink --tools read-only            # all read-only tools, no state changes
./bin/unblink --tools core,data            # a preset plus an extra tool
./bin/unblink --disable-tools map,search   # everything except these

A tool named explicitly that can't run (e.g. --tools interact with --disable-js) is dropped with a stderr warning; unknown names are ignored with a warning.

Authentication

Reach gated pages and JSON APIs by attaching credentials. Prefer a session — set them once and they persist across calls, out of every per-call payload:

// session(action=new): bearer/basic + custom headers + cookies, all scoped to url
{ "action": "new", "session": "api",
  "url": "https://api.example.com",              // the origin credentials are pinned to
  "auth": { "type": "bearer", "token_env": "API_TOKEN" },   // or {type:"basic", username, password_env}
  "headers": { "X-Api-Key": "…" },
  "cookies": [ { "name": "sid", "value": "…" } ] }

Then any page tool using session: "api" carries the credentials. read/browse also accept one-shot headers/auth for a quick stateless gated GET.

  • Secrets stay out of the transcript. Give a secret literally (token/ password) or, better, by env-var name (token_env/password_env) — unblink reads the value from its own environment. Session state reports only a redacted auth_type/auth_scope, never the secret; credential query params are masked in logs.
  • Origin-scoped, no cross-origin leak. Credentials are pinned to the url's origin: they are sent only there and are stripped on any cross-origin redirect (including a same-domain port change), so a bearer token can't be exfiltrated to another host. A url is required whenever you supply auth/headers/cookies.

The search tool is opt-in — unblink stays fully self-contained until you point it at a provider, so it never reaches an external service by default:

# Self-hosted SearXNG (no key):
./bin/unblink --search-provider=searxng --search-endpoint=https://searx.example/

# Brave Search API (key from the environment, never a flag):
UNBLINK_SEARCH_API_KEY=… ./bin/unblink --search-provider=brave

Without --search-provider the search tool isn't exposed at all (a tool that could only return "not configured" is context cost with no value — see Tool selection). The API key is read only from UNBLINK_SEARCH_API_KEY, sent only as a request header, and never logged. Search only queries the provider — result URLs are fetched later by read/browse through the SSRF-guarded path.

The map tool needs no configuration: it discovers URLs from a site's own sitemap.xml and same-origin links, bounded by max_urls/max_depth.

Networking

The HTTP layer decodes brotli/gzip/deflate, can rate-limit per host (opt-in politeness limiter, --rate-limit; off by default so throughput is bounded by the site, not by unblink), retries transient failures (429/5xx, honoring Retry-After), and logs (structured, to stderr — --log-level). --tls-mimic presents a browser fingerprint to get past naive anti-bot blocks: a Chrome JA3/JA4 ClientHello (utls) plus best-effort Chrome-tuned HTTP/2 SETTINGS and request headers (sec-ch-ua, Sec-Fetch-*). Full h2 fingerprint fidelity (SETTINGS order, window sizes, pseudo-header order) and Cloudflare/Turnstile remain out of scope. Tunable: --rate-limit, --rate-burst, --retries.

Every page fetch runs behind an SSRF dial guard that rejects connections to private/loopback/link-local/metadata IPs — including CGNAT/100.64.0.0/10 (Alibaba metadata) — checked against the resolved address. It is on by default; pass --allow-private for localhost/internal targets. (In-page JavaScript subrequests are guarded separately by --js-allow-private.)

Because unblink feeds pages to an AI agent, returned web content is treated as untrusted by default: it is wrapped in a provenance/"untrusted content" fence so the model treats it as data (not instructions), human-hidden text and comments are stripped, and Markdown image beacons (![](url) — a zero-click data-exfil channel) are defanged to inert text. This is defense-in-depth against indirect prompt injection, not a guarantee. Pass --no-safe-output to get the raw, unmodified reduction instead. Page JavaScript cannot read host files (require is disabled) or read HttpOnly cookies via document.cookie.

robots.txt and llms.txt are surfaced as context, never enforced — unblink reports a host's crawl rules (and allowed_for_us for the path) but never blocks a fetch on them. browse folds in lightweight presence hints (host-cached, so repeat browses are free); --no-site-hints disables that probe while the site tool stays available.

Extensions (ad-blocking)

unblink can load WebExtensions at runtime to extend what it does with best-in-class third-party tooling — most usefully, ad/tracker blocking. Point --extension at an unpacked extension directory or a .xpi/.crx/.zip archive (repeatable), or --extensions-dir at a folder of them:

./bin/unblink --extension ./ublock-origin-lite

unblink ships no extension code — you supply it — which keeps GPL-licensed extensions (uBlock Origin is GPL-3) fully separate from unblink's MIT source, the same way a browser loads a user-installed add-on (see ADR 0010).

Recommended: uBlock Origin Lite (MV3) — verified working. Download uBOLite_*.chromium.zip from uBlockOrigin/uBOL-home releases and point --extension at it. unblink compiles its declarativeNetRequest rulesets (EasyList, EasyPrivacy, uBlock filters — 18,249 rules) into its own host matcher and evaluates them directly, so it blocks real ad/tracker requests (adscore.com, …) while passing first-party/benign traffic — a full render in ~0.1 s, with no service worker or in-extension filter compilation. (Full uBlock Origin — the .xpi/.zip on gorhill/uBlock — is MV2 and never gets its filter engine ready in a pure-Go interpreter, so it is not viable in-process; Lite is the answer. Privacy Badger loads and fully initializes too.) unblink currently supports:

  • Network filtering — an extension's declarativeNetRequest static rules cancel page-JavaScript requests to blocked ad/tracker hosts before they leave the process (visible in the requests tool as blocked).

  • Cosmetic filtering — element-hiding CSS (content-script stylesheets / insertCSS) removes ad markup from the extracted Markdown (unblink has no CSSOM, so a display:none rule becomes physical node removal).

  • Content scripts — an extension's content_scripts JS/CSS is injected into matching pages, with a chrome/browser API surface (runtime, i18n, storage, scripting, tabs, …).

  • Background worker + messaging — the extension's background scripts run on their own event loop, and chrome.runtime.sendMessage round-trips between a content script and the background (uBlock's model: a content script asks the background which selectors to hide for the current host, then hides them).

  • Extension resources & storagefetch(chrome.runtime.getURL(...)) serves packaged files (web_accessible_resources); chrome.storage.local/sync persist to disk and fire onChanged; declarativeNetRequest dynamic/session rules added at runtime take effect.

  • MV2 webRequest — a Manifest-V2 extension's background can block/redirect requests from a blocking webRequest.onBeforeRequest listener (full uBlock Origin's model).

Extensions require the JavaScript engine (they are rejected under --disable-js); extension JS runs in the same sandbox as page JS (heap/byte/SSRF guards apply). Still in progress toward a fully stock uBlock Origin build: isolated content-script worlds, scriptlet injection (##+js), and an IndexedDB/cacheStorage shim.

Configuration

All configuration is via CLI flags (pass them in your MCP client's args). --version prints the build and exits.

Flag Default What it does
--log-level warn Log level (debug/info/warn/error); logs go to stderr, stdout is reserved for MCP.
--rate-limit off Per-host requests/sec politeness limiter (opt-in; e.g. 5 to crawl politely).
--rate-burst 10 Per-host request burst (used when --rate-limit is set).
--retries 2 Retries for transient fetch failures (429/5xx, honoring Retry-After).
--tls-mimic off Present a Chrome TLS/h2 fingerprint (utls) to get past naive anti-bot blocks; also sets navigator.webdriver=false.
--allow-private off Permit page fetches to private/loopback/metadata IPs (needed for localhost/internal targets).
--no-site-hints off Omit robots.txt/llms.txt presence hints from browse.
--no-safe-output off Disable the untrusted-content safety pass (fence, hidden-text strip, image-beacon defang) — return raw reduction.
--disable-js off Disable JavaScript rendering entirely (JS is on by default; reads render unless the caller passes render=false).
--js-timeout 5s Per-render wall-clock budget for JavaScript.
--js-no-network off Disable page-JS network requests (DOM-only render).
--js-allow-private off Permit page-JS subrequests to private/loopback IPs.
--js-max-bytes 64 MiB of page-JS downloads allowed per render / per live-session action (0 disables the download budget). The primary network bound.
--js-max-requests 0 Optional hard cap on page-JS request count (runaway backstop; 0 disables — the real bound is --js-max-bytes).
--js-prewarm 4 Pre-warmed JS runtimes kept ready (0 disables).
--js-concurrency auto Max concurrent JS renders (auto = CPU count clamped to 4..16). Same-host fetch pacing stays --rate-limit's job.
--js-max-live 16 Max concurrent live per-session JS runtimes (LRU torn down over the cap).
--js-memory-limit 1024 MiB of Go heap page JS may grow before every render is interrupted (0 disables the guard).
--js-asset-cache on Cache page-JS script/module/bundle downloads across renders for 60s (data fetch/XHR never cached).
--session-ttl 30m Idle time before a session is evicted.
--session-cap 256 Max concurrent sessions (oldest evicted on overflow).
--search-provider none Web-search backend for the search tool: searxng or brave (empty disables it).
--search-endpoint none Search endpoint URL (SearXNG base URL; optional Brave override). API key comes from UNBLINK_SEARCH_API_KEY.
--tools all Limit the exposed tools to a comma-separated list of tool names and/or presets (core, read-only, full). Empty exposes every usable tool.
--disable-tools none Remove tools from the exposed set (comma-separated names/presets), applied after --tools.
--extension none Load a WebExtension (unpacked dir or .xpi/.crx/.zip) — e.g. uBlock Origin Lite for ad-blocking; repeatable. Requires JS.
--extensions-dir none Load every WebExtension in a directory (each subdir with manifest.json or each archive); repeatable.

License

MIT

Security

unblink runs untrusted page content (and, with JS enabled — the default — untrusted page JavaScript) as part of its job. See SECURITY.md for the threat model and how to report a vulnerability.

Contributing

See CONTRIBUTING.md for the build/test workflow, the eval gate, and the ADR process. Changes to the JS engine or dependency pins follow the architecture doc and ADRs.

Directories

Path Synopsis
cmd
unblink command
Command unblink is a pure-Go "browser for AI": it fetches web pages, reduces them to their meaningful content, and exposes them to an AI model over MCP.
Command unblink is a pure-Go "browser for AI": it fetches web pages, reduces them to their meaningful content, and exposes them to an AI model over MCP.
internal
browser
Package browser is unblink's orchestrator.
Package browser is unblink's orchestrator.
content
Package content converts non-HTML page bodies — JSON, plain text, RSS/Atom/JSON feeds, PDF, and images — into Markdown for the read pipeline.
Package content converts non-HTML page bodies — JSON, plain text, RSS/Atom/JSON feeds, PDF, and images — into Markdown for the read pipeline.
content/feed
Package feed converts RSS, Atom, and JSON feeds into a Markdown summary.
Package feed converts RSS, Atom, and JSON feeds into a Markdown summary.
content/image
Package image produces a Markdown manifest for an image response — type, dimensions, byte size, and filename — without inlining the pixels.
Package image produces a Markdown manifest for an image response — type, dimensions, byte size, and filename — without inlining the pixels.
content/pdf
Package pdf extracts text from PDF documents and renders it as Markdown.
Package pdf extracts text from PDF documents and renders it as Markdown.
dom
Package dom is the only package that imports golang.org/x/net/html (and, in later phases, goquery/cascadia).
Package dom is the only package that imports golang.org/x/net/html (and, in later phases, goquery/cascadia).
emit
Package emit serializes reduced content into the representation an AI agent consumes.
Package emit serializes reduced content into the representation an AI agent consumes.
fetch
Package fetch is unblink's HTTP-client-as-browser.
Package fetch is unblink's HTTP-client-as-browser.
js
Package js is unblink's quarantined JavaScript engine.
Package js is unblink's quarantined JavaScript engine.
mcpserver
Package mcpserver exposes the browser as an MCP server.
Package mcpserver exposes the browser as an MCP server.
page
Package page defines the core data model that flows through the unblink pipeline.
Package page defines the core data model that flows through the unblink pipeline.
ratelimit
Package ratelimit provides a process-global, per-host request rate limiter so unblink throttles requests to each host across all its HTTP clients.
Package ratelimit provides a process-global, per-host request rate limiter so unblink throttles requests to each host across all its HTTP clients.
reduce
Package reduce performs unblink's semantic reduction: it strips visual-only junk (scripts, styles, nav, ads, hidden elements) and extracts the meaningful content of a page.
Package reduce performs unblink's semantic reduction: it strips visual-only junk (scripts, styles, nav, ads, hidden elements) and extracts the meaningful content of a page.
robots
Package robots parses robots.txt (the Robots Exclusion Protocol) for the purpose of *exposing* a site's crawl rules to an agent — not enforcing them.
Package robots parses robots.txt (the Robots Exclusion Protocol) for the purpose of *exposing* a site's crawl rules to an agent — not enforcing them.
search
Package search adds an optional web-search entry point behind a small Provider interface, so unblink stays fully self-contained by default and only reaches an external service when the operator configures one.
Package search adds an optional web-search entry point behind a small Provider interface, so unblink stays fully self-contained by default and only reaches an external service when the operator configures one.
session
Package session holds unblink's per-session state: a cookie jar (via a dedicated fetch.Client), a navigation history, and an optional persistent JavaScript runtime ("live context") bound to the current page.
Package session holds unblink's per-session state: a cookie jar (via a dedicated fetch.Client), a navigation history, and an optional persistent JavaScript runtime ("live context") bound to the current page.
sitemap
Package sitemap parses the sitemaps.org XML formats — a <urlset> of page URLs or a <sitemapindex> of child sitemaps — using only the standard library.
Package sitemap parses the sitemaps.org XML formats — a <urlset> of page URLs or a <sitemapindex> of child sitemaps — using only the standard library.
tokens
Package tokens provides cheap token estimation and Markdown pagination so the browser never returns an unbounded blob to the model.
Package tokens provides cheap token estimation and Markdown pagination so the browser never returns an unbounded blob to the model.
webext
Package webext models Firefox/Chrome WebExtensions well enough to load a user-supplied extension (uBlock Origin and friends) at runtime.
Package webext models Firefox/Chrome WebExtensions well enough to load a user-supplied extension (uBlock Origin and friends) at runtime.
third_party
pdf
Package pdf implements reading of PDF files.
Package pdf implements reading of PDF files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL