Documentation
¶
Overview ¶
Package sites provides generic, site-agnostic discovery and replay helpers. The goal: identify a marketplace's hidden search backend (Algolia, Elastic, GraphQL, REST, ...) by sniffing real XHR traffic during a normal browse, then let the agent replay it directly via HTTP — no browser, no anti-bot detection, no fingerprint.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var HTTPClient = &http.Client{Timeout: 15 * time.Second}
HTTPClient is package-level for test injection.
Functions ¶
func MutateAlgoliaParams ¶
MutateAlgoliaParams is a small convenience for the most common case: the body is `{"params": "url-encoded-string"}` (Algolia wire format) and the agent wants to override one or more keys (filters, hitsPerPage, page, ...). Returns the new body string ready to drop in ReplayInput.Body.
Types ¶
type CapturedRequest ¶
type CapturedRequest struct {
Method string `json:"method"`
URL string `json:"url"`
Host string `json:"host"`
Path string `json:"path"`
Status int `json:"status"`
MimeType string `json:"mime_type,omitempty"`
RequestBody string `json:"request_body,omitempty"`
ResponseBody string `json:"response_body,omitempty"`
ResponseBytes int `json:"response_bytes"`
DurationMs int64 `json:"duration_ms"`
Headers map[string]string `json:"headers,omitempty"`
// Score is computed by the heuristic (higher = more likely a search API).
Score int `json:"score"`
// ItemCount is the size of the longest top-level array in the response,
// when JSON. Heuristic for "this is a list of results".
ItemCount int `json:"item_count,omitempty"`
// ItemPath is the dotted JSON path to that array (e.g. "hits", "data.ads").
ItemPath string `json:"item_path,omitempty"`
// SampleItem is the first item of the array, truncated to ~400 chars.
SampleItem string `json:"sample_item,omitempty"`
}
CapturedRequest is one XHR/fetch round-trip recorded during a sniff. All fields are JSON-friendly so the agent can transform and reuse them.
func Sniff ¶
func Sniff(ctx context.Context, page *rod.Page, navigateTo string, opts SniffOptions) ([]CapturedRequest, error)
Sniff drives the page through the lifecycle (navigate → wait → drain), captures every request whose response is JSON-shaped, scores each one, and returns them ranked by score (highest first). The page is unchanged at exit; caller decides whether to close the session.
Caller MUST have the page already navigated, OR pass an empty navigateTo and use a pre-warmed session. Most agents pass `navigateTo` with the listing-page URL.
type ReplayInput ¶
type ReplayInput struct {
Method string `json:"method"`
URL string `json:"url"`
Headers map[string]string `json:"headers,omitempty"`
Body string `json:"body,omitempty"`
TimeoutMs int `json:"timeout_ms,omitempty"`
MaxBodyBytes int `json:"max_body_bytes,omitempty"`
}
ReplayInput is the agent-facing payload for replaying an HTTP request captured by Sniff. All fields are JSON-friendly and trivially mutable.
type ReplayResult ¶
type ReplayResult struct {
Status int `json:"status"`
URL string `json:"url"`
Headers map[string]string `json:"headers,omitempty"`
MimeType string `json:"mime_type,omitempty"`
Body string `json:"body,omitempty"`
BodyBytes int `json:"body_bytes"`
DurationMs int64 `json:"duration_ms"`
ItemCount int `json:"item_count,omitempty"`
ItemPath string `json:"item_path,omitempty"`
SampleItem string `json:"sample_item,omitempty"`
Truncated bool `json:"truncated,omitempty"`
}
ReplayResult is what the agent sees back: status + truncated body + optional decoded top-level array stats so the LLM can decide whether the params it just sent yielded the expected list.
func Replay ¶
func Replay(ctx context.Context, in ReplayInput) (*ReplayResult, error)
Replay re-issues an HTTP request and returns the parsed response. Pure HTTP — no browser, no Chrome, no fingerprint variance. The agent uses this after Sniff to query the discovered backend with mutated params (different brand, page, filters).
type SniffOptions ¶
type SniffOptions struct {
// Duration to keep the listener attached after navigate. Default 5s.
Duration time.Duration
// MaxBodyBytes: cap on response body bytes captured per request (avoids
// huge HTML/JS in memory). Default 32 KiB. Bodies above the cap still
// get parsed for top-level array detection — only the SampleItem is
// truncated.
MaxBodyBytes int
// IncludeNonJSON: when true, also keep non-JSON candidates. Default false.
IncludeNonJSON bool
}
SniffOptions controls how aggressively we sniff.