semantic

package module
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 28, 2026 License: MIT Imports: 2 Imported by: 0

README

semantic

Zero-dependency Go library for semantic matching of accessibility tree elements.

Matches natural language queries ("sign in button") against UI element descriptors using lexical similarity, synonym expansion, and embedding-based fuzzy matching.

The Problem

Browser automation tools find UI elements via CSS selectors, XPath, or explicit IDs — all brittle. When the DOM changes (SPA re-renders, layout shifts, framework updates), selectors break silently. AI agents make this worse: they describe elements in natural language ("click the sign in button") but the accessibility tree has structured labels ("Sign In", role: button, ref: e4).

The gap between how agents describe elements and how browsers expose them is what semantic solves.

Install

go get github.com/pinchtab/semantic

Or via npm (downloads the Go binary):

npx @pinchtab/semantic find "sign in" --snapshot page.json

Or Homebrew:

brew install pinchtab/tap/semantic

Usage

import "github.com/pinchtab/semantic"

// Build descriptors from your accessibility tree
elements := []semantic.ElementDescriptor{
    {Ref: "e0", Role: "button", Name: "Sign In"},
    {Ref: "e1", Role: "textbox", Name: "Email"},
    {Ref: "e2", Role: "link", Name: "Forgot password?"},
}

// Create a matcher (combined = lexical + embedding)
matcher := semantic.NewCombinedMatcher(semantic.NewHashingEmbedder(128))

// Find matching elements
result, err := matcher.Find(ctx, "log in button", elements, semantic.FindOptions{
    Threshold: 0.3,
    TopK:      3,
})
// result.BestRef = "e0" ("Sign In" matches "log in" via synonyms)
// result.BestScore = 0.82

Structured locators are also supported when descriptors include the corresponding fields:

result, err := matcher.Find(ctx, "role:button Sign In", elements, semantic.FindOptions{})
result, err = matcher.Find(ctx, "placeholder:Search", elements, semantic.FindOptions{})
result, err = matcher.Find(ctx, "nth:1:role:button", elements, semantic.FindOptions{})

nth:<n> is 1-based: nth:1 selects the first ordered candidate, nth:2 selects the second, and nth:0 is not the first match.

Use find:<query> or semantic:<query> to force natural-language matching for locator-like text.

Package Layout

semantic.go              Public API (types + constructors)
internal/types/          Type definitions (interfaces, structs)
internal/engine/         Matching implementations (hidden from consumers)
recovery/                Error recovery + intent caching (public subpackage)
cmd/semantic/            CLI tool

Implementations are internal — consumers use the ElementMatcher interface and constructors. Swap matching strategies without breaking your code.

How It Works

┌─────────────────────────────────────────────────────┐
│                    ElementMatcher                    │
│                    (interface)                       │
├─────────────┬───────────────────┬───────────────────┤
│  Lexical    │    Embedding      │    Combined       │
│  Matcher    │    Matcher        │    Matcher        │
│             │                   │                   │
│  Jaccard    │  Cosine sim on    │  0.6 × lexical    │
│  + synonyms │  hashed vectors   │  + 0.4 × embed   │
│  + stopwords│  (feature hashing,│                   │
│  + prefix   │   char n-grams,   │  Runs both in    │
│  + role     │   zero deps)      │  parallel, fuses │
│    boosting │                   │  by ref          │
└─────────────┴───────────────────┴───────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────┐
│                  RecoveryEngine                      │
│                                                     │
│  IntentCache → reconstruct original query           │
│  SnapshotRefresher → get fresh DOM (callback)       │
│  Re-match → find element in new snapshot            │
│  Re-execute → run the original action               │
└─────────────────────────────────────────────────────┘

Lexical matching uses Jaccard similarity with context-aware stopword removal, 54 synonym groups covering UI vocabulary (login ↔ signin, cart ↔ basket, submit ↔ send), prefix matching for abbreviations ("btn" → "button"), and role boosting when queries mention ARIA roles.

Embedding matching uses feature hashing (the "hashing trick", Weinberger et al. 2009) to convert text into fixed-dimension vectors via word unigrams and character n-grams. No vocabulary construction, no model downloads — sub-millisecond latency.

Combined matching (recommended) fuses both: 60% lexical + 40% embedding. Lexical handles exact matches and synonyms; embedding adds fuzzy sub-word similarity for partial queries.

Features

  • Synonym expansion — 54 UI synonym groups ("sign in" ↔ "log in", "cart" ↔ "basket", "preferences" ↔ "settings", etc.)
  • Visual position hints — Understand layout cues like top, bottom, left, right, and above/below anchors
  • Confidence calibration — Scores mapped to high (≥ 0.8) / medium (≥ 0.6) / low labels
  • Error classification — Classify browser errors (CDP, chromedp) as recoverable or not
  • Self-healing recovery — Re-locate stale elements after DOM changes via callback interfaces
  • Intent caching — Per-tab LRU cache (200 entries, 10min TTL) of element intents for recovery

Matchers

Matcher Speed Accuracy Use case
NewLexicalMatcher() < 0.5ms / 100 elements Best for exact + synonym Simple UIs, speed-critical
NewCombinedMatcher(embedder) < 1ms / 100 elements Best overall General purpose (recommended)
NewEmbeddingMatcher(embedder) < 1ms / 100 elements Best for fuzzy/partial Sub-word similarity

Error Classification

import "github.com/pinchtab/semantic/recovery"

ft := recovery.ClassifyFailure(err)
// ft.Recoverable() → true/false
Type Examples Recoverable
element_not_found "could not find node", "ref not found"
element_stale "node is detached", "execution context destroyed"
element_not_interactable "not visible", "overlapped", "disabled"
navigation "frame detached", "page crashed"
network "connection refused", "timeout"
unknown Everything else

Self-Healing Recovery

When an action fails on a stale ref, the RecoveryEngine reconstructs the original query from its intent cache, refreshes the DOM, re-matches, and re-executes — all through callback interfaces so it works with any browser automation framework:

import (
    "github.com/pinchtab/semantic"
    "github.com/pinchtab/semantic/recovery"
)

intentCache := recovery.NewIntentCache(200, 10*time.Minute)

re := recovery.NewRecoveryEngine(
    recovery.DefaultRecoveryConfig(),
    matcher,
    intentCache,
    refreshSnapshot,  // your callback to refresh the DOM
    resolveNodeID,    // your callback to map ref → node ID
    buildDescriptors, // your callback to build descriptors
)

// Cache intent after successful find
re.RecordIntent(tabID, "e5", recovery.IntentEntry{
    Query:      "checkout button",
    Descriptor: semantic.ElementDescriptor{Ref: "e5", Role: "button", Name: "Checkout"},
})

// Recover on failure
if err != nil && re.ShouldAttempt(err, ref) {
    rr, _, _ := re.Attempt(ctx, tabID, ref, "click", executeAction)
    // rr.Recovered = true, rr.NewRef = "e12"
}

CLI

go install github.com/pinchtab/semantic/cmd/semantic@latest

# Find elements matching a query
semantic find "sign in button" --snapshot page.json

# Pipe from pinchtab or any tool that outputs accessibility JSON
curl -s localhost:9999/snapshot | semantic find "search box"

# Output formats
semantic find "login" --snapshot page.json --format json    # machine-readable
semantic find "login" --snapshot page.json --format table   # human-readable
semantic find "login" --snapshot page.json --format refs    # just refs

# Visual position hints
semantic find "button in top right corner" --snapshot page.json
semantic find "link below the search box" --snapshot page.json
semantic find "sidebar on the left" --snapshot page.json

# Score a specific element
semantic match "login" e4 --snapshot page.json

# Classify an error
semantic classify "could not find node with given id"
# → element_not_found (recoverable: true)

Zero Dependencies

The library uses only the Go standard library. No external dependencies, no model downloads, no supply chain risk. The hashing-based embedder gets ~85% of the quality of real ML embeddings with zero cost.

Design Trade-offs

See docs/architecture/design-decisions.md for detailed discussion of architectural decisions: hashing vs real embeddings, fixed synonym table vs learned, Jaccard vs TF-IDF, and recovery callbacks vs direct integration.

Origin

This project was extracted from pinchtab's internal semantic matching package. The original implementation was contributed by @YashJadhav21, @Djain912, and @Chetnapadhi in PR #109 — a collaboration that built the lexical matching, synonym expansion, embedding, and recovery systems that form the core of this library.

License

MIT

Documentation

Overview

Package semantic provides zero-dependency semantic matching for accessibility tree elements. Match natural language queries like "sign in button" against UI element descriptors using lexical similarity, synonym expansion, and embedding-based fuzzy matching.

Implementations are internal — consumers use the ElementMatcher interface returned by constructors.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CalibrateConfidence

func CalibrateConfidence(score float64) string

CalibrateConfidence maps a score to "high", "medium", or "low".

func CosineSimilarity

func CosineSimilarity(a, b []float32) float64

CosineSimilarity computes cosine similarity between two float32 vectors.

func LexicalScore

func LexicalScore(query, desc string) float64

LexicalScore computes lexical similarity between a query and an element description string. Returns [0, 1].

Types

type ElementDescriptor

type ElementDescriptor = types.ElementDescriptor

ElementDescriptor describes a single accessibility tree node.

type ElementMatch

type ElementMatch = types.ElementMatch

ElementMatch is a single scored match.

type ElementMatcher

type ElementMatcher = types.ElementMatcher

ElementMatcher scores accessibility tree elements against a natural language query.

func NewCombinedMatcher

func NewCombinedMatcher(embedder Embedder) ElementMatcher

NewCombinedMatcher creates a matcher that fuses lexical and embedding strategies with default weights (0.6 lexical, 0.4 embedding).

func NewEmbeddingMatcher

func NewEmbeddingMatcher(e Embedder) ElementMatcher

NewEmbeddingMatcher creates a standalone embedding-based matcher (cosine similarity on dense vectors).

func NewEmbeddingMatcherWithNeighborWeight added in v0.1.1

func NewEmbeddingMatcherWithNeighborWeight(e Embedder, weight float64) ElementMatcher

NewEmbeddingMatcherWithNeighborWeight creates a standalone embedding matcher and configures how much immediate neighbors influence each element embedding. Weight is clamped to [0, 1].

func NewLexicalMatcher

func NewLexicalMatcher() ElementMatcher

NewLexicalMatcher creates a standalone lexical matcher (Jaccard similarity with synonym expansion and role boosting).

type Embedder

type Embedder = types.Embedder

Embedder converts text into dense vectors.

func NewHashingEmbedder

func NewHashingEmbedder(dim int) Embedder

NewHashingEmbedder creates a zero-dependency hashing-based embedder with the given vector dimensionality. Default: 128.

type FindOptions

type FindOptions = types.FindOptions

FindOptions controls matching behavior.

type FindResult

type FindResult = types.FindResult

FindResult holds the top matches from a Find call.

type MatchExplain

type MatchExplain = types.MatchExplain

MatchExplain is the per-strategy score breakdown.

type PositionalHints added in v0.1.1

type PositionalHints = types.PositionalHints

PositionalHints captures optional AX-tree relationship metadata.

Directories

Path Synopsis
cmd
semantic command
Package main provides the semantic CLI tool for matching accessibility tree elements against natural language queries.
Package main provides the semantic CLI tool for matching accessibility tree elements against natural language queries.
semantic-bench command
internal
types
Package types defines the public API types for the semantic library.
Package types defines the public API types for the semantic library.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL