sites

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package sites provides generic, site-agnostic discovery and replay helpers. The goal: identify a marketplace's hidden search backend (Algolia, Elastic, GraphQL, REST, ...) by sniffing real XHR traffic during a normal browse, then let the agent replay it directly via HTTP — no browser, no anti-bot detection, no fingerprint.

Index

Constants

This section is empty.

Variables

View Source
var HTTPClient = &http.Client{Timeout: 15 * time.Second}

HTTPClient is package-level for test injection.

Functions

func MutateAlgoliaParams

func MutateAlgoliaParams(originalBody string, overrides map[string]string) (string, error)

MutateAlgoliaParams is a small convenience for the most common case: the body is `{"params": "url-encoded-string"}` (Algolia wire format) and the agent wants to override one or more keys (filters, hitsPerPage, page, ...). Returns the new body string ready to drop in ReplayInput.Body.

Types

type CapturedRequest

type CapturedRequest struct {
	Method        string            `json:"method"`
	URL           string            `json:"url"`
	Host          string            `json:"host"`
	Path          string            `json:"path"`
	Status        int               `json:"status"`
	MimeType      string            `json:"mime_type,omitempty"`
	RequestBody   string            `json:"request_body,omitempty"`
	ResponseBody  string            `json:"response_body,omitempty"`
	ResponseBytes int               `json:"response_bytes"`
	DurationMs    int64             `json:"duration_ms"`
	Headers       map[string]string `json:"headers,omitempty"`
	// Score is computed by the heuristic (higher = more likely a search API).
	Score int `json:"score"`
	// ItemCount is the size of the longest top-level array in the response,
	// when JSON. Heuristic for "this is a list of results".
	ItemCount int `json:"item_count,omitempty"`
	// ItemPath is the dotted JSON path to that array (e.g. "hits", "data.ads").
	ItemPath string `json:"item_path,omitempty"`
	// SampleItem is the first item of the array, truncated to ~400 chars.
	SampleItem string `json:"sample_item,omitempty"`
}

CapturedRequest is one XHR/fetch round-trip recorded during a sniff. All fields are JSON-friendly so the agent can transform and reuse them.

func Sniff

func Sniff(ctx context.Context, page *rod.Page, navigateTo string, opts SniffOptions) ([]CapturedRequest, error)

Sniff drives the page through the lifecycle (navigate → wait → drain), captures every request whose response is JSON-shaped, scores each one, and returns them ranked by score (highest first). The page is unchanged at exit; caller decides whether to close the session.

Caller MUST have the page already navigated, OR pass an empty navigateTo and use a pre-warmed session. Most agents pass `navigateTo` with the listing-page URL.

type ReplayInput

type ReplayInput struct {
	Method       string            `json:"method"`
	URL          string            `json:"url"`
	Headers      map[string]string `json:"headers,omitempty"`
	Body         string            `json:"body,omitempty"`
	TimeoutMs    int               `json:"timeout_ms,omitempty"`
	MaxBodyBytes int               `json:"max_body_bytes,omitempty"`
}

ReplayInput is the agent-facing payload for replaying an HTTP request captured by Sniff. All fields are JSON-friendly and trivially mutable.

type ReplayResult

type ReplayResult struct {
	Status     int               `json:"status"`
	URL        string            `json:"url"`
	Headers    map[string]string `json:"headers,omitempty"`
	MimeType   string            `json:"mime_type,omitempty"`
	Body       string            `json:"body,omitempty"`
	BodyBytes  int               `json:"body_bytes"`
	DurationMs int64             `json:"duration_ms"`
	ItemCount  int               `json:"item_count,omitempty"`
	ItemPath   string            `json:"item_path,omitempty"`
	SampleItem string            `json:"sample_item,omitempty"`
	Truncated  bool              `json:"truncated,omitempty"`
}

ReplayResult is what the agent sees back: status + truncated body + optional decoded top-level array stats so the LLM can decide whether the params it just sent yielded the expected list.

func Replay

func Replay(ctx context.Context, in ReplayInput) (*ReplayResult, error)

Replay re-issues an HTTP request and returns the parsed response. Pure HTTP — no browser, no Chrome, no fingerprint variance. The agent uses this after Sniff to query the discovered backend with mutated params (different brand, page, filters).

type SniffOptions

type SniffOptions struct {
	// Duration to keep the listener attached after navigate. Default 5s.
	Duration time.Duration
	// MaxBodyBytes: cap on response body bytes captured per request (avoids
	// huge HTML/JS in memory). Default 32 KiB. Bodies above the cap still
	// get parsed for top-level array detection — only the SampleItem is
	// truncated.
	MaxBodyBytes int
	// IncludeNonJSON: when true, also keep non-JSON candidates. Default false.
	IncludeNonJSON bool
}

SniffOptions controls how aggressively we sniff.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL