amz

package
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 13, 2026 License: Apache-2.0 Imports: 23 Imported by: 0

Documentation

Overview

Package amz is a read-only client library for amazon.com: it fetches public pages, detects the bot wall, and normalizes each surface into a rich record.

Index

Constants

View Source
const (
	DefaultDelay   = 3 * time.Second
	DefaultTimeout = 30 * time.Second
	DefaultWorkers = 2
	DefaultRetries = 3
	UserAgent      = "amz/0.1 (+https://github.com/tamnd/amz-cli)"
)

Defaults for the polite read path.

View Source
const (
	EntityProduct    = "product"
	EntityReviews    = "reviews"
	EntityQA         = "qa"
	EntityOffers     = "offers"
	EntityBrand      = "brand"
	EntityAuthor     = "author"
	EntityCategory   = "category"
	EntitySeller     = "seller"
	EntitySearch     = "search"
	EntityBestseller = "bestseller"
)

Entity kinds used by the crawl queue and the seed command.

Variables

View Source
var ErrBlocked = errors.New("blocked by amazon (CAPTCHA / robot check); slow down with --rate, try --cookies, switch --marketplace, or use --api")

ErrBlocked is returned when amazon serves a CAPTCHA or robot-check wall instead of the requested page. It maps to CLI exit code 5.

View Source
var ErrNoDuckDB = errors.New("duckdb binary not found on PATH (install it to use --db)")

ErrNoDuckDB is returned when the duckdb binary is not on PATH.

View Source
var ErrNoPACreds = errors.New("PA-API requires AMZ_PAAPI_ACCESS_KEY, AMZ_PAAPI_SECRET_KEY and AMZ_PAAPI_PARTNER_TAG")

ErrNoPACreds is returned when PA-API credentials are missing.

View Source
var ErrNoQA = errNoQA{}

FetchQA streams Q&A pairs for an ASIN. It returns ErrNoQA when the product has no classic Q&A section (amazon is deprecating it across many categories).

View Source
var ErrNotFound = errors.New("not found")

ErrNotFound is returned when a page is a hard 404 (e.g. an unknown ASIN).

Functions

func ConfigDir

func ConfigDir() string

ConfigDir returns the XDG config directory for amz.

func DetectBlocked

func DetectBlocked(body []byte) bool

DetectBlocked reports whether the response body is a CAPTCHA / robot wall.

func ExtractASIN

func ExtractASIN(s string) string

ExtractASIN pulls the 10-character ASIN out of any amazon product URL or a bare ASIN argument. It returns "" when no ASIN is present.

func IsURL

func IsURL(s string) bool

IsURL reports whether s looks like an http(s) URL rather than a bare id/slug.

func ParsePrice

func ParsePrice(s string) (float64, string)

ParsePrice extracts a numeric price and a best-effort currency code from a display string like "$1,299.00" or "1.299,00 €".

Types

type Author

type Author struct {
	Slug          string    `json:"slug"`
	Name          string    `json:"name"`
	Bio           string    `json:"bio,omitempty"`
	PhotoURL      string    `json:"photo_url,omitempty"`
	Website       string    `json:"website,omitempty"`
	BookASINs     []string  `json:"book_asins,omitempty"`
	FollowerCount int       `json:"follower_count,omitempty"`
	URL           string    `json:"url"`
	FetchedAt     time.Time `json:"fetched_at"`
}

Author is an Author Central page.

type BestsellerEntry

type BestsellerEntry struct {
	ListType     string    `json:"list_type"`
	Category     string    `json:"category,omitempty"`
	NodeID       string    `json:"node_id,omitempty"`
	Rank         int       `json:"rank"`
	ASIN         string    `json:"asin"`
	Title        string    `json:"title"`
	Price        float64   `json:"price"`
	Currency     string    `json:"currency,omitempty"`
	Rating       float64   `json:"rating,omitempty"`
	RatingsCount int64     `json:"ratings_count,omitempty"`
	URL          string    `json:"url"`
	FetchedAt    time.Time `json:"fetched_at"`
}

BestsellerEntry is one ranked item in a chart.

type Brand

type Brand struct {
	Slug          string    `json:"slug"`
	Name          string    `json:"name"`
	Description   string    `json:"description,omitempty"`
	LogoURL       string    `json:"logo_url,omitempty"`
	BannerURL     string    `json:"banner_url,omitempty"`
	FollowerCount int       `json:"follower_count,omitempty"`
	FeaturedASINs []string  `json:"featured_asins,omitempty"`
	URL           string    `json:"url"`
	FetchedAt     time.Time `json:"fetched_at"`
}

Brand is a brand storefront.

type Cache

type Cache struct {
	// contains filtered or unexported fields
}

Cache is a tiny on-disk page cache keyed by a hash of the URL.

func NewCache

func NewCache(dir string) *Cache

NewCache returns a cache rooted at dir (created on first write).

func (*Cache) Dir

func (c *Cache) Dir() string

Dir returns the cache root.

func (*Cache) Get

func (c *Cache) Get(rawURL string, ttl time.Duration) ([]byte, bool)

Get returns the cached body if present and fresher than ttl.

func (*Cache) Put

func (c *Cache) Put(rawURL string, body []byte) error

Put writes the body to the cache.

type Card

type Card struct {
	Position        int     `json:"position,omitempty"`
	Rank            int     `json:"rank,omitempty"`
	ASIN            string  `json:"asin"`
	Title           string  `json:"title"`
	Price           float64 `json:"price"`
	ListPrice       float64 `json:"list_price,omitempty"`
	Currency        string  `json:"currency,omitempty"`
	Rating          float64 `json:"rating,omitempty"`
	RatingsCount    int64   `json:"ratings_count,omitempty"`
	Image           string  `json:"image,omitempty"`
	Badge           string  `json:"badge,omitempty"`
	Prime           bool    `json:"prime,omitempty"`
	BoughtPastMonth string  `json:"bought_past_month,omitempty"`
	Sponsored       bool    `json:"sponsored,omitempty"`
	Kind            string  `json:"kind,omitempty"`
	URL             string  `json:"url"`
}

Card is a lightweight hit from a search page, chart, or recommendation rail.

type Category

type Category struct {
	NodeID       string    `json:"node_id"`
	Name         string    `json:"name"`
	ParentNodeID string    `json:"parent_node_id,omitempty"`
	Breadcrumb   []string  `json:"breadcrumb,omitempty"`
	ChildNodeIDs []string  `json:"child_node_ids,omitempty"`
	TopASINs     []string  `json:"top_asins,omitempty"`
	URL          string    `json:"url"`
	FetchedAt    time.Time `json:"fetched_at"`
}

Category is a browse node.

type ChartKind

type ChartKind string

ChartKind identifies one of amazon's ranked lists.

const (
	ChartBestsellers ChartKind = "bestsellers"
	ChartNewReleases ChartKind = "new-releases"
	ChartMovers      ChartKind = "movers-and-shakers"
	ChartWished      ChartKind = "most-wished-for"
	ChartGifted      ChartKind = "most-gifted"
)

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client is a polite, block-aware HTTP client for one marketplace.

func NewClient

func NewClient(cfg Config) *Client

NewClient builds a client from a resolved config.

func (*Client) AuthorURL

func (c *Client) AuthorURL(slug string) string

AuthorURL builds an Author Central page URL from a slug or author id.

func (*Client) BaseURL

func (c *Client) BaseURL() string

BaseURL returns the marketplace origin, or the override when set.

func (*Client) BrandURL

func (c *Client) BrandURL(slug string) string

BrandURL builds a brand storefront URL from a slug or page id.

func (*Client) CategoryURL

func (c *Client) CategoryURL(node string) string

CategoryURL builds the browse-node URL.

func (*Client) ChartURL

func (c *Client) ChartURL(kind ChartKind, category, node string, page int) string

ChartURL builds a chart URL for a category slug or browse node.

func (*Client) DealsURL

func (c *Client) DealsURL() string

DealsURL builds the deals grid URL.

func (*Client) FetchAuthor

func (c *Client) FetchAuthor(ctx context.Context, slugOrURL string) (Author, error)

FetchAuthor fetches and normalizes an Author Central page.

func (*Client) FetchBrand

func (c *Client) FetchBrand(ctx context.Context, slugOrURL string) (Brand, error)

FetchBrand fetches and normalizes a brand storefront.

func (*Client) FetchCategory

func (c *Client) FetchCategory(ctx context.Context, nodeOrURL string) (Category, error)

FetchCategory fetches and normalizes a browse-node page.

func (*Client) FetchChart

func (c *Client) FetchChart(ctx context.Context, kind ChartKind, category, node string, limit int, emit func(BestsellerEntry) error) error

FetchChart streams ranked entries from a chart, paging until a page is empty (or the limit is reached). Ranks are offset by page so page two continues the numbering even when amazon drops the rank badges.

func (*Client) FetchDeals

func (c *Client) FetchDeals(ctx context.Context, limit int, emit func(Deal) error) error

FetchDeals streams entries from the deals grid.

func (*Client) FetchOffers

func (c *Client) FetchOffers(ctx context.Context, asin string, q OfferQuery, emit func(Offer) error) error

FetchOffers streams buying options for an ASIN.

func (*Client) FetchProduct

func (c *Client) FetchProduct(ctx context.Context, asinOrURL string) (Product, error)

FetchProduct fetches and normalizes one product detail page.

func (*Client) FetchQA

func (c *Client) FetchQA(ctx context.Context, asin string, emit func(QA) error) error

func (*Client) FetchRelated

func (c *Client) FetchRelated(ctx context.Context, asin string, limit int, emit func(Card) error) error

FetchRelated streams recommendation cards (similar items, "frequently bought together", sponsored rails) found on a product detail page.

func (*Client) FetchReviews

func (c *Client) FetchReviews(ctx context.Context, asin string, q ReviewQuery, emit func(Review) error) error

FetchReviews streams reviews for an ASIN, paging until Limit.

func (*Client) FetchSeller

func (c *Client) FetchSeller(ctx context.Context, idOrURL string) (Seller, error)

FetchSeller fetches and normalizes a seller profile.

func (*Client) Get

func (c *Client) Get(ctx context.Context, rawURL string, ttl time.Duration) ([]byte, error)

Get fetches a URL and returns its body, using the cache when allowed and detecting the bot wall. It retries transient 429/503/5xx with backoff.

func (*Client) Marketplace

func (c *Client) Marketplace() Marketplace

Marketplace returns the client's marketplace.

func (*Client) OffersURL

func (c *Client) OffersURL(asin string) string

OffersURL builds the offer-listing URL for an ASIN.

func (*Client) ProductURL

func (c *Client) ProductURL(asin string) string

ProductURL builds the canonical detail URL for an ASIN in this marketplace.

func (*Client) QAURL

func (c *Client) QAURL(asin string) string

QAURL builds the Q&A page URL for an ASIN.

func (*Client) ResolveProductURL

func (c *Client) ResolveProductURL(asinOrURL string) (asin, url string)

ResolveProductURL turns an ASIN or any amazon URL into a canonical detail URL.

func (*Client) ReviewURL

func (c *Client) ReviewURL(asin string, q ReviewQuery, page int) string

ReviewURL builds the product-reviews URL.

func (*Client) Search

func (c *Client) Search(ctx context.Context, query string, q SearchQuery, emit func(Card) error) error

Search streams result cards for a query, paging until Limit is reached.

func (*Client) SearchURL

func (c *Client) SearchURL(query string, q SearchQuery, page int) string

SearchURL builds the /s URL for a query and page.

func (*Client) SellerURL

func (c *Client) SellerURL(id string) string

SellerURL builds a third-party seller profile URL.

func (*Client) SetBaseURL

func (c *Client) SetBaseURL(base string)

SetBaseURL overrides the marketplace origin. It exists so the fetchers can be pointed at a local fixture server or an outbound proxy; production code leaves it unset and uses the marketplace host.

type Config

type Config struct {
	Marketplace string
	Cookies     string
	UseAPI      bool
	Workers     int
	Delay       time.Duration
	Retries     int
	Timeout     time.Duration
	DataDir     string
	CacheDir    string
	DBPath      string
	NoCache     bool
	Refresh     bool

	// PA-API credentials (opt-in path).
	PAAPIAccessKey  string
	PAAPISecretKey  string
	PAAPIPartnerTag string
	PAAPIHost       string
	PAAPIRegion     string
}

Config carries the resolved settings for a run.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns the built-in defaults with XDG-resolved paths.

type Deal

type Deal struct {
	ASIN        string    `json:"asin"`
	Title       string    `json:"title"`
	DealPrice   float64   `json:"deal_price"`
	ListPrice   float64   `json:"list_price,omitempty"`
	DiscountPct int       `json:"discount_pct,omitempty"`
	Badge       string    `json:"badge,omitempty"`
	Currency    string    `json:"currency,omitempty"`
	URL         string    `json:"url"`
	FetchedAt   time.Time `json:"fetched_at"`
}

Deal is one entry from the deals grid.

type Marketplace

type Marketplace struct {
	Slug     string
	Host     string
	Currency string
	Language string
}

Marketplace is one regional amazon storefront.

func LookupMarketplace

func LookupMarketplace(slug string) (Marketplace, bool)

LookupMarketplace returns the marketplace for a slug, defaulting to US for an unknown or empty slug. The second return reports whether the slug was known.

func Marketplaces

func Marketplaces() []Marketplace

Marketplaces returns every registered marketplace slug in a stable-ish order.

func (Marketplace) BaseURL

func (m Marketplace) BaseURL() string

BaseURL is the https origin for the marketplace.

type Offer

type Offer struct {
	ASIN         string    `json:"asin"`
	Price        float64   `json:"price"`
	Currency     string    `json:"currency"`
	Shipping     string    `json:"shipping,omitempty"`
	Condition    string    `json:"condition"`
	SellerName   string    `json:"seller_name"`
	SellerID     string    `json:"seller_id,omitempty"`
	SellerRating string    `json:"seller_rating,omitempty"`
	FulfilledBy  string    `json:"fulfilled_by,omitempty"`
	Delivery     string    `json:"delivery,omitempty"`
	IsBuyBox     bool      `json:"is_buybox,omitempty"`
	URL          string    `json:"url"`
	FetchedAt    time.Time `json:"fetched_at"`
}

Offer is one buying option from the offer-listing page.

type OfferQuery

type OfferQuery struct {
	Condition string // new|used|...
	Prime     bool
}

OfferQuery filters the offer-listing.

type PAClient

type PAClient struct {
	// contains filtered or unexported fields
}

PAClient talks to the official Product Advertising API 5.0. It signs requests with SigV4 using only the standard library (no AWS SDK dependency).

func NewPAClient

func NewPAClient(cfg Config) (*PAClient, error)

NewPAClient builds a PA-API client from config, or returns ErrNoPACreds.

func (*PAClient) GetItems

func (p *PAClient) GetItems(ctx context.Context, asins []string) ([]map[string]any, error)

GetItems fetches one or more ASINs via the official API and returns raw item maps (the caller maps them into Product records).

func (*PAClient) SearchItems

func (p *PAClient) SearchItems(ctx context.Context, keywords string, count int) ([]map[string]any, error)

SearchItems runs a keyword search via the official API.

type Product

type Product struct {
	ASIN            string            `json:"asin"`
	Title           string            `json:"title"`
	Brand           string            `json:"brand"`
	BrandID         string            `json:"brand_id,omitempty"`
	Price           float64           `json:"price"`
	Currency        string            `json:"currency"`
	ListPrice       float64           `json:"list_price,omitempty"`
	Savings         float64           `json:"savings,omitempty"`
	SavingsPct      int               `json:"savings_pct,omitempty"`
	Coupon          string            `json:"coupon,omitempty"`
	Rating          float64           `json:"rating"`
	RatingsCount    int64             `json:"ratings_count"`
	ReviewsCount    int64             `json:"reviews_count,omitempty"`
	AnsweredQs      int               `json:"answered_qs,omitempty"`
	BoughtPastMonth string            `json:"bought_past_month,omitempty"`
	Availability    string            `json:"availability"`
	InStock         bool              `json:"in_stock"`
	Description     string            `json:"description,omitempty"`
	BulletPoints    []string          `json:"bullet_points,omitempty"`
	Specs           map[string]string `json:"specs,omitempty"`
	Images          []string          `json:"images,omitempty"`
	Videos          []string          `json:"videos,omitempty"`
	CategoryPath    []string          `json:"category_path,omitempty"`
	BrowseNodeIDs   []string          `json:"browse_node_ids,omitempty"`
	SellerID        string            `json:"seller_id,omitempty"`
	SellerName      string            `json:"seller_name,omitempty"`
	SoldBy          string            `json:"sold_by,omitempty"`
	ShipsFrom       string            `json:"ships_from,omitempty"`
	FulfilledBy     string            `json:"fulfilled_by,omitempty"`
	VariantASINs    []string          `json:"variant_asins,omitempty"`
	ParentASIN      string            `json:"parent_asin,omitempty"`
	SimilarASINs    []string          `json:"similar_asins,omitempty"`
	Rank            int               `json:"rank,omitempty"`
	RankCategory    string            `json:"rank_category,omitempty"`
	Ranks           []ProductRank     `json:"ranks,omitempty"`
	Marketplace     string            `json:"marketplace"`
	URL             string            `json:"url"`
	FetchedAt       time.Time         `json:"fetched_at"`
}

Product is a normalized amazon.com product detail page.

type ProductRank

type ProductRank struct {
	Rank     int    `json:"rank"`
	Category string `json:"category"`
}

ProductRank is one Best Sellers Rank line: a position within a named category. A product is usually ranked once overall and again in one or more subcategories.

type QA

type QA struct {
	QAID         string    `json:"qa_id"`
	ASIN         string    `json:"asin"`
	Question     string    `json:"question"`
	QuestionBy   string    `json:"question_by,omitempty"`
	Answer       string    `json:"answer"`
	AnswerBy     string    `json:"answer_by,omitempty"`
	HelpfulVotes int       `json:"helpful_votes,omitempty"`
	URL          string    `json:"url"`
	FetchedAt    time.Time `json:"fetched_at"`
}

QA is a question-and-answer pair.

type QueueItem

type QueueItem struct {
	ID       int64  `json:"id"`
	URL      string `json:"url"`
	Entity   string `json:"entity"`
	Priority int    `json:"priority"`
	Status   string `json:"status"`
}

QueueItem is a row from the crawl queue.

type Review

type Review struct {
	ReviewID         string            `json:"review_id"`
	ASIN             string            `json:"asin"`
	ReviewerID       string            `json:"reviewer_id,omitempty"`
	ReviewerName     string            `json:"reviewer_name"`
	Rating           int               `json:"rating"`
	Title            string            `json:"title"`
	Text             string            `json:"text"`
	Date             string            `json:"date,omitempty"`
	Country          string            `json:"country,omitempty"`
	VerifiedPurchase bool              `json:"verified_purchase"`
	HelpfulVotes     int               `json:"helpful_votes"`
	Images           []string          `json:"images,omitempty"`
	VariantAttrs     map[string]string `json:"variant_attrs,omitempty"`
	URL              string            `json:"url"`
	FetchedAt        time.Time         `json:"fetched_at"`
}

Review is a single product review.

type ReviewQuery

type ReviewQuery struct {
	Sort       string // recent|helpful
	Stars      int    // 1..5, 0 = all
	Verified   bool
	WithImages bool
	StartPage  int
	Limit      int
}

ReviewQuery holds review-page refinements.

type SearchQuery

type SearchQuery struct {
	Sort       string // relevance|price-asc|price-desc|review|newest
	MinPrice   int
	MaxPrice   int
	MinRating  int
	Prime      bool
	Brand      string
	Department string
	StartPage  int
	Limit      int
}

SearchQuery holds the refinements for a catalog search.

type Seller

type Seller struct {
	SellerID    string    `json:"seller_id"`
	Name        string    `json:"name"`
	Rating      string    `json:"rating,omitempty"`
	RatingCount int       `json:"rating_count,omitempty"`
	PositivePct float64   `json:"positive_pct,omitempty"`
	NeutralPct  float64   `json:"neutral_pct,omitempty"`
	NegativePct float64   `json:"negative_pct,omitempty"`
	URL         string    `json:"url"`
	FetchedAt   time.Time `json:"fetched_at"`
}

Seller is a third-party seller profile.

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store is an optional DuckDB-backed sink. It shells out to the `duckdb` binary so the build never depends on cgo. A missing binary yields ErrNoDuckDB.

DuckDB takes an exclusive lock on the database file, so two `duckdb` processes cannot write it at once. A crawl fetches with many workers but funnels every write through mu, so concurrency stays on the network where it pays off and the single-writer database never sees a lock conflict.

func OpenStore

func OpenStore(path string) (*Store, error)

OpenStore locates the duckdb binary and ensures the schema exists.

func (*Store) Enqueue

func (s *Store) Enqueue(ctx context.Context, url, entity string, priority int) error

Enqueue inserts a queue item if its URL is not already present.

func (*Store) MarkStatus

func (s *Store) MarkStatus(ctx context.Context, id int64, status string) error

MarkStatus updates the status of one queue item.

func (*Store) NextBatch

func (s *Store) NextBatch(ctx context.Context, n int) ([]QueueItem, error)

NextBatch claims up to n pending queue items, marking them in-progress.

func (*Store) Path

func (s *Store) Path() string

Path returns the database file path.

func (*Store) PendingCount

func (s *Store) PendingCount(ctx context.Context) (int, error)

PendingCount returns the number of pending queue items.

func (*Store) PutBestseller

func (s *Store) PutBestseller(ctx context.Context, e BestsellerEntry) error

PutBestseller appends a chart entry.

func (*Store) PutProduct

func (s *Store) PutProduct(ctx context.Context, p Product) error

PutProduct upserts a product record.

func (*Store) PutQA

func (s *Store) PutQA(ctx context.Context, q QA) error

PutQA upserts a Q&A record.

func (*Store) PutReview

func (s *Store) PutReview(ctx context.Context, r Review) error

PutReview upserts a review record.

func (*Store) Query

func (s *Store) Query(ctx context.Context, sql string) ([]map[string]any, error)

Query runs SQL and returns rows as JSON objects.

func (*Store) Stats

func (s *Store) Stats(ctx context.Context) ([]map[string]any, error)

Stats returns row counts for every table.

func (*Store) Vacuum

func (s *Store) Vacuum(ctx context.Context) error

Vacuum compacts the database.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL