Documentation
¶
Overview ¶
Package amz is a read-only client library for amazon.com: it fetches public pages, detects the bot wall, and normalizes each surface into a rich record.
Index ¶
- Constants
- Variables
- func ConfigDir() string
- func DetectBlocked(body []byte) bool
- func ExtractASIN(s string) string
- func IsURL(s string) bool
- func ParsePrice(s string) (float64, string)
- type Author
- type BestsellerEntry
- type Brand
- type Cache
- type Card
- type Category
- type ChartKind
- type Client
- func (c *Client) AuthorURL(slug string) string
- func (c *Client) BaseURL() string
- func (c *Client) BrandURL(slug string) string
- func (c *Client) CategoryURL(node string) string
- func (c *Client) ChartURL(kind ChartKind, category, node string, page int) string
- func (c *Client) DealsURL() string
- func (c *Client) FetchAuthor(ctx context.Context, slugOrURL string) (Author, error)
- func (c *Client) FetchBrand(ctx context.Context, slugOrURL string) (Brand, error)
- func (c *Client) FetchCategory(ctx context.Context, nodeOrURL string) (Category, error)
- func (c *Client) FetchChart(ctx context.Context, kind ChartKind, category, node string, limit int, ...) error
- func (c *Client) FetchDeals(ctx context.Context, limit int, emit func(Deal) error) error
- func (c *Client) FetchOffers(ctx context.Context, asin string, q OfferQuery, emit func(Offer) error) error
- func (c *Client) FetchProduct(ctx context.Context, asinOrURL string) (Product, error)
- func (c *Client) FetchQA(ctx context.Context, asin string, emit func(QA) error) error
- func (c *Client) FetchRelated(ctx context.Context, asin string, limit int, emit func(Card) error) error
- func (c *Client) FetchReviews(ctx context.Context, asin string, q ReviewQuery, emit func(Review) error) error
- func (c *Client) FetchSeller(ctx context.Context, idOrURL string) (Seller, error)
- func (c *Client) Get(ctx context.Context, rawURL string, ttl time.Duration) ([]byte, error)
- func (c *Client) Marketplace() Marketplace
- func (c *Client) OffersURL(asin string) string
- func (c *Client) ProductURL(asin string) string
- func (c *Client) QAURL(asin string) string
- func (c *Client) ResolveProductURL(asinOrURL string) (asin, url string)
- func (c *Client) ReviewURL(asin string, q ReviewQuery, page int) string
- func (c *Client) Search(ctx context.Context, query string, q SearchQuery, emit func(Card) error) error
- func (c *Client) SearchURL(query string, q SearchQuery, page int) string
- func (c *Client) SellerURL(id string) string
- func (c *Client) SetBaseURL(base string)
- type Config
- type Deal
- type Marketplace
- type Offer
- type OfferQuery
- type PAClient
- type Product
- type ProductRank
- type QA
- type QueueItem
- type Review
- type ReviewQuery
- type SearchQuery
- type Seller
- type Store
- func (s *Store) Enqueue(ctx context.Context, url, entity string, priority int) error
- func (s *Store) MarkStatus(ctx context.Context, id int64, status string) error
- func (s *Store) NextBatch(ctx context.Context, n int) ([]QueueItem, error)
- func (s *Store) Path() string
- func (s *Store) PendingCount(ctx context.Context) (int, error)
- func (s *Store) PutBestseller(ctx context.Context, e BestsellerEntry) error
- func (s *Store) PutProduct(ctx context.Context, p Product) error
- func (s *Store) PutQA(ctx context.Context, q QA) error
- func (s *Store) PutReview(ctx context.Context, r Review) error
- func (s *Store) Query(ctx context.Context, sql string) ([]map[string]any, error)
- func (s *Store) Stats(ctx context.Context) ([]map[string]any, error)
- func (s *Store) Vacuum(ctx context.Context) error
Constants ¶
const ( DefaultDelay = 3 * time.Second DefaultTimeout = 30 * time.Second DefaultWorkers = 2 DefaultRetries = 3 UserAgent = "amz/0.1 (+https://github.com/tamnd/amz-cli)" )
Defaults for the polite read path.
const ( EntityProduct = "product" EntityReviews = "reviews" EntityQA = "qa" EntityOffers = "offers" EntityBrand = "brand" EntityAuthor = "author" EntityCategory = "category" EntitySeller = "seller" EntitySearch = "search" EntityBestseller = "bestseller" )
Entity kinds used by the crawl queue and the seed command.
Variables ¶
var ErrBlocked = errors.New("blocked by amazon (CAPTCHA / robot check); slow down with --rate, try --cookies, switch --marketplace, or use --api")
ErrBlocked is returned when amazon serves a CAPTCHA or robot-check wall instead of the requested page. It maps to CLI exit code 5.
var ErrNoDuckDB = errors.New("duckdb binary not found on PATH (install it to use --db)")
ErrNoDuckDB is returned when the duckdb binary is not on PATH.
var ErrNoPACreds = errors.New("PA-API requires AMZ_PAAPI_ACCESS_KEY, AMZ_PAAPI_SECRET_KEY and AMZ_PAAPI_PARTNER_TAG")
ErrNoPACreds is returned when PA-API credentials are missing.
var ErrNoQA = errNoQA{}
FetchQA streams Q&A pairs for an ASIN. It returns ErrNoQA when the product has no classic Q&A section (amazon is deprecating it across many categories).
var ErrNotFound = errors.New("not found")
ErrNotFound is returned when a page is a hard 404 (e.g. an unknown ASIN).
Functions ¶
func DetectBlocked ¶
DetectBlocked reports whether the response body is a CAPTCHA / robot wall.
func ExtractASIN ¶
ExtractASIN pulls the 10-character ASIN out of any amazon product URL or a bare ASIN argument. It returns "" when no ASIN is present.
func ParsePrice ¶
ParsePrice extracts a numeric price and a best-effort currency code from a display string like "$1,299.00" or "1.299,00 €".
Types ¶
type Author ¶
type Author struct {
Slug string `json:"slug"`
Name string `json:"name"`
Bio string `json:"bio,omitempty"`
PhotoURL string `json:"photo_url,omitempty"`
Website string `json:"website,omitempty"`
BookASINs []string `json:"book_asins,omitempty"`
FollowerCount int `json:"follower_count,omitempty"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
Author is an Author Central page.
type BestsellerEntry ¶
type BestsellerEntry struct {
ListType string `json:"list_type"`
Category string `json:"category,omitempty"`
NodeID string `json:"node_id,omitempty"`
Rank int `json:"rank"`
ASIN string `json:"asin"`
Title string `json:"title"`
Price float64 `json:"price"`
Currency string `json:"currency,omitempty"`
Rating float64 `json:"rating,omitempty"`
RatingsCount int64 `json:"ratings_count,omitempty"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
BestsellerEntry is one ranked item in a chart.
type Brand ¶
type Brand struct {
Slug string `json:"slug"`
Name string `json:"name"`
Description string `json:"description,omitempty"`
LogoURL string `json:"logo_url,omitempty"`
BannerURL string `json:"banner_url,omitempty"`
FollowerCount int `json:"follower_count,omitempty"`
FeaturedASINs []string `json:"featured_asins,omitempty"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
Brand is a brand storefront.
type Cache ¶
type Cache struct {
// contains filtered or unexported fields
}
Cache is a tiny on-disk page cache keyed by a hash of the URL.
type Card ¶
type Card struct {
Position int `json:"position,omitempty"`
Rank int `json:"rank,omitempty"`
ASIN string `json:"asin"`
Title string `json:"title"`
Price float64 `json:"price"`
ListPrice float64 `json:"list_price,omitempty"`
Currency string `json:"currency,omitempty"`
Rating float64 `json:"rating,omitempty"`
RatingsCount int64 `json:"ratings_count,omitempty"`
Image string `json:"image,omitempty"`
Badge string `json:"badge,omitempty"`
Prime bool `json:"prime,omitempty"`
BoughtPastMonth string `json:"bought_past_month,omitempty"`
Sponsored bool `json:"sponsored,omitempty"`
Kind string `json:"kind,omitempty"`
URL string `json:"url"`
}
Card is a lightweight hit from a search page, chart, or recommendation rail.
type Category ¶
type Category struct {
NodeID string `json:"node_id"`
Name string `json:"name"`
ParentNodeID string `json:"parent_node_id,omitempty"`
Breadcrumb []string `json:"breadcrumb,omitempty"`
ChildNodeIDs []string `json:"child_node_ids,omitempty"`
TopASINs []string `json:"top_asins,omitempty"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
Category is a browse node.
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client is a polite, block-aware HTTP client for one marketplace.
func (*Client) CategoryURL ¶
CategoryURL builds the browse-node URL.
func (*Client) FetchAuthor ¶
FetchAuthor fetches and normalizes an Author Central page.
func (*Client) FetchBrand ¶
FetchBrand fetches and normalizes a brand storefront.
func (*Client) FetchCategory ¶
FetchCategory fetches and normalizes a browse-node page.
func (*Client) FetchChart ¶
func (c *Client) FetchChart(ctx context.Context, kind ChartKind, category, node string, limit int, emit func(BestsellerEntry) error) error
FetchChart streams ranked entries from a chart, paging until a page is empty (or the limit is reached). Ranks are offset by page so page two continues the numbering even when amazon drops the rank badges.
func (*Client) FetchDeals ¶
FetchDeals streams entries from the deals grid.
func (*Client) FetchOffers ¶
func (c *Client) FetchOffers(ctx context.Context, asin string, q OfferQuery, emit func(Offer) error) error
FetchOffers streams buying options for an ASIN.
func (*Client) FetchProduct ¶
FetchProduct fetches and normalizes one product detail page.
func (*Client) FetchRelated ¶
func (c *Client) FetchRelated(ctx context.Context, asin string, limit int, emit func(Card) error) error
FetchRelated streams recommendation cards (similar items, "frequently bought together", sponsored rails) found on a product detail page.
func (*Client) FetchReviews ¶
func (c *Client) FetchReviews(ctx context.Context, asin string, q ReviewQuery, emit func(Review) error) error
FetchReviews streams reviews for an ASIN, paging until Limit.
func (*Client) FetchSeller ¶
FetchSeller fetches and normalizes a seller profile.
func (*Client) Get ¶
Get fetches a URL and returns its body, using the cache when allowed and detecting the bot wall. It retries transient 429/503/5xx with backoff.
func (*Client) Marketplace ¶
func (c *Client) Marketplace() Marketplace
Marketplace returns the client's marketplace.
func (*Client) ProductURL ¶
ProductURL builds the canonical detail URL for an ASIN in this marketplace.
func (*Client) ResolveProductURL ¶
ResolveProductURL turns an ASIN or any amazon URL into a canonical detail URL.
func (*Client) ReviewURL ¶
func (c *Client) ReviewURL(asin string, q ReviewQuery, page int) string
ReviewURL builds the product-reviews URL.
func (*Client) Search ¶
func (c *Client) Search(ctx context.Context, query string, q SearchQuery, emit func(Card) error) error
Search streams result cards for a query, paging until Limit is reached.
func (*Client) SearchURL ¶
func (c *Client) SearchURL(query string, q SearchQuery, page int) string
SearchURL builds the /s URL for a query and page.
func (*Client) SetBaseURL ¶
SetBaseURL overrides the marketplace origin. It exists so the fetchers can be pointed at a local fixture server or an outbound proxy; production code leaves it unset and uses the marketplace host.
type Config ¶
type Config struct {
Marketplace string
Cookies string
UseAPI bool
Workers int
Delay time.Duration
Retries int
Timeout time.Duration
DataDir string
CacheDir string
DBPath string
NoCache bool
Refresh bool
// PA-API credentials (opt-in path).
PAAPIAccessKey string
PAAPISecretKey string
PAAPIPartnerTag string
PAAPIHost string
PAAPIRegion string
}
Config carries the resolved settings for a run.
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns the built-in defaults with XDG-resolved paths.
type Deal ¶
type Deal struct {
ASIN string `json:"asin"`
Title string `json:"title"`
DealPrice float64 `json:"deal_price"`
ListPrice float64 `json:"list_price,omitempty"`
DiscountPct int `json:"discount_pct,omitempty"`
Badge string `json:"badge,omitempty"`
Currency string `json:"currency,omitempty"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
Deal is one entry from the deals grid.
type Marketplace ¶
Marketplace is one regional amazon storefront.
func LookupMarketplace ¶
func LookupMarketplace(slug string) (Marketplace, bool)
LookupMarketplace returns the marketplace for a slug, defaulting to US for an unknown or empty slug. The second return reports whether the slug was known.
func Marketplaces ¶
func Marketplaces() []Marketplace
Marketplaces returns every registered marketplace slug in a stable-ish order.
func (Marketplace) BaseURL ¶
func (m Marketplace) BaseURL() string
BaseURL is the https origin for the marketplace.
type Offer ¶
type Offer struct {
ASIN string `json:"asin"`
Price float64 `json:"price"`
Currency string `json:"currency"`
Shipping string `json:"shipping,omitempty"`
Condition string `json:"condition"`
SellerName string `json:"seller_name"`
SellerID string `json:"seller_id,omitempty"`
SellerRating string `json:"seller_rating,omitempty"`
FulfilledBy string `json:"fulfilled_by,omitempty"`
Delivery string `json:"delivery,omitempty"`
IsBuyBox bool `json:"is_buybox,omitempty"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
Offer is one buying option from the offer-listing page.
type OfferQuery ¶
OfferQuery filters the offer-listing.
type PAClient ¶
type PAClient struct {
// contains filtered or unexported fields
}
PAClient talks to the official Product Advertising API 5.0. It signs requests with SigV4 using only the standard library (no AWS SDK dependency).
func NewPAClient ¶
NewPAClient builds a PA-API client from config, or returns ErrNoPACreds.
type Product ¶
type Product struct {
ASIN string `json:"asin"`
Title string `json:"title"`
Brand string `json:"brand"`
BrandID string `json:"brand_id,omitempty"`
Price float64 `json:"price"`
Currency string `json:"currency"`
ListPrice float64 `json:"list_price,omitempty"`
Savings float64 `json:"savings,omitempty"`
SavingsPct int `json:"savings_pct,omitempty"`
Coupon string `json:"coupon,omitempty"`
Rating float64 `json:"rating"`
RatingsCount int64 `json:"ratings_count"`
ReviewsCount int64 `json:"reviews_count,omitempty"`
AnsweredQs int `json:"answered_qs,omitempty"`
BoughtPastMonth string `json:"bought_past_month,omitempty"`
Availability string `json:"availability"`
InStock bool `json:"in_stock"`
Description string `json:"description,omitempty"`
BulletPoints []string `json:"bullet_points,omitempty"`
Specs map[string]string `json:"specs,omitempty"`
Images []string `json:"images,omitempty"`
Videos []string `json:"videos,omitempty"`
CategoryPath []string `json:"category_path,omitempty"`
BrowseNodeIDs []string `json:"browse_node_ids,omitempty"`
SellerID string `json:"seller_id,omitempty"`
SellerName string `json:"seller_name,omitempty"`
SoldBy string `json:"sold_by,omitempty"`
ShipsFrom string `json:"ships_from,omitempty"`
FulfilledBy string `json:"fulfilled_by,omitempty"`
VariantASINs []string `json:"variant_asins,omitempty"`
ParentASIN string `json:"parent_asin,omitempty"`
SimilarASINs []string `json:"similar_asins,omitempty"`
Rank int `json:"rank,omitempty"`
RankCategory string `json:"rank_category,omitempty"`
Ranks []ProductRank `json:"ranks,omitempty"`
Marketplace string `json:"marketplace"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
Product is a normalized amazon.com product detail page.
type ProductRank ¶
ProductRank is one Best Sellers Rank line: a position within a named category. A product is usually ranked once overall and again in one or more subcategories.
type QA ¶
type QA struct {
QAID string `json:"qa_id"`
ASIN string `json:"asin"`
Question string `json:"question"`
QuestionBy string `json:"question_by,omitempty"`
Answer string `json:"answer"`
AnswerBy string `json:"answer_by,omitempty"`
HelpfulVotes int `json:"helpful_votes,omitempty"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
QA is a question-and-answer pair.
type QueueItem ¶
type QueueItem struct {
ID int64 `json:"id"`
URL string `json:"url"`
Entity string `json:"entity"`
Priority int `json:"priority"`
Status string `json:"status"`
}
QueueItem is a row from the crawl queue.
type Review ¶
type Review struct {
ReviewID string `json:"review_id"`
ASIN string `json:"asin"`
ReviewerID string `json:"reviewer_id,omitempty"`
ReviewerName string `json:"reviewer_name"`
Rating int `json:"rating"`
Title string `json:"title"`
Text string `json:"text"`
Date string `json:"date,omitempty"`
Country string `json:"country,omitempty"`
VerifiedPurchase bool `json:"verified_purchase"`
HelpfulVotes int `json:"helpful_votes"`
Images []string `json:"images,omitempty"`
VariantAttrs map[string]string `json:"variant_attrs,omitempty"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
Review is a single product review.
type ReviewQuery ¶
type ReviewQuery struct {
Sort string // recent|helpful
Stars int // 1..5, 0 = all
Verified bool
WithImages bool
StartPage int
Limit int
}
ReviewQuery holds review-page refinements.
type SearchQuery ¶
type SearchQuery struct {
Sort string // relevance|price-asc|price-desc|review|newest
MinPrice int
MaxPrice int
MinRating int
Prime bool
Brand string
Department string
StartPage int
Limit int
}
SearchQuery holds the refinements for a catalog search.
type Seller ¶
type Seller struct {
SellerID string `json:"seller_id"`
Name string `json:"name"`
Rating string `json:"rating,omitempty"`
RatingCount int `json:"rating_count,omitempty"`
PositivePct float64 `json:"positive_pct,omitempty"`
NeutralPct float64 `json:"neutral_pct,omitempty"`
NegativePct float64 `json:"negative_pct,omitempty"`
URL string `json:"url"`
FetchedAt time.Time `json:"fetched_at"`
}
Seller is a third-party seller profile.
type Store ¶
type Store struct {
// contains filtered or unexported fields
}
Store is an optional DuckDB-backed sink. It shells out to the `duckdb` binary so the build never depends on cgo. A missing binary yields ErrNoDuckDB.
DuckDB takes an exclusive lock on the database file, so two `duckdb` processes cannot write it at once. A crawl fetches with many workers but funnels every write through mu, so concurrency stays on the network where it pays off and the single-writer database never sees a lock conflict.
func (*Store) MarkStatus ¶
MarkStatus updates the status of one queue item.
func (*Store) PendingCount ¶
PendingCount returns the number of pending queue items.
func (*Store) PutBestseller ¶
func (s *Store) PutBestseller(ctx context.Context, e BestsellerEntry) error
PutBestseller appends a chart entry.
func (*Store) PutProduct ¶
PutProduct upserts a product record.