Documentation
¶
Index ¶
- func WithCachePolicy(ctx context.Context, p CachePolicy) context.Context
- func WithCacheTTL(ctx context.Context, ttl time.Duration) context.Context
- type CachePolicy
- type Client
- func (c *Client) Close()
- func (c *Client) Do(ctx context.Context, req *http.Request, cfgs ...DoConfig) (page *Page, err error)
- func (c *Client) Get(ctx context.Context, url string, cfgs ...DoConfig) (*Page, error)
- func (c *Client) GetMany(ctx context.Context, urls []string, concurrency int, cfg DoConfig, ...) error
- func (c *Client) Version(ctx context.Context, key string) (*Page, error)
- func (c *Client) Versions(ctx context.Context, req *http.Request) ([]PageVersion, error)
- type DoConfig
- type Limiter
- type Option
- func WithBrowser() Option
- func WithCacheStatuses(codes ...int) Option
- func WithChromiumSandbox(enabled bool) Option
- func WithIgnoreHeaders(names ...string) Option
- func WithIgnoreParams(names ...string) Option
- func WithRateLimit(rps int, opts ...ratelimit.Option) Option
- func WithRequestBodyLimit(n int64) Option
- func WithResponseBodyLimit(n int64) Option
- func WithRetry(cfg RetryConfig) Option
- func WithUserAgent(ua string) Option
- type Page
- type PageDiff
- type PageMeta
- type PageRequest
- type PageResponse
- type PageVersion
- type RetryConfig
- type StatusError
- type ThrottledError
- type Transport
- type TransportOption
- func TransportWithCacheStatuses(codes ...int) TransportOption
- func TransportWithIgnoreHeaders(names ...string) TransportOption
- func TransportWithIgnoreParams(names ...string) TransportOption
- func TransportWithRateLimit(rps int, opts ...ratelimit.Option) TransportOption
- func TransportWithRequestBodyLimit(n int64) TransportOption
- func TransportWithResponseBodyLimit(n int64) TransportOption
- func TransportWithUserAgent(ua string) TransportOption
- type TransportStats
- type TransportStatsSnapshot
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func WithCachePolicy ¶
func WithCachePolicy(ctx context.Context, p CachePolicy) context.Context
WithCachePolicy returns a context that carries the given cache policy.
func WithCacheTTL ¶ added in v0.2.0
WithCacheTTL returns a context that overrides the default cache TTL for writes made with this context. Works with both Client and Transport. Use 0 for no expiry, or a positive duration for a custom TTL.
Types ¶
type CachePolicy ¶
type CachePolicy int
CachePolicy controls per-request caching behavior.
const ( // CachePolicyDefault reads from cache on hit, writes on miss (status 200). CachePolicyDefault CachePolicy = iota // CachePolicyReplace skips cache read but still writes on status 200. CachePolicyReplace // CachePolicySkip bypasses cache entirely (no read, no write). CachePolicySkip )
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client fetches web pages with automatic caching.
func (*Client) Close ¶
func (c *Client) Close()
Close shuts down the browser (if started) and releases resources.
func (*Client) Do ¶
func (c *Client) Do( ctx context.Context, req *http.Request, cfgs ...DoConfig, ) (page *Page, err error)
Do fetches the given request, returning a cached result if available. Pass a DoConfig to control caching, browser mode, and rate limiting.
func (*Client) GetMany ¶
func (c *Client) GetMany( ctx context.Context, urls []string, concurrency int, cfg DoConfig, fn func(url string, page *Page, err error) error, ) error
GetMany fetches multiple URLs concurrently with the given concurrency limit. The callback fn is called for each completed fetch (in arbitrary order). Stops early if ctx is cancelled. Returns the first non-nil error from fn, or nil if all callbacks return nil.
type DoConfig ¶
type DoConfig struct {
// Replace skips cache read, forcing a fresh fetch (still caches the result).
Replace bool
// Browser uses headless Chromium instead of plain HTTP.
Browser bool
// Archive stores a timestamped snapshot alongside the latest cache entry.
// Use Client.Versions to list snapshots and detect changes over time.
Archive bool
// SilentThrottle detects and retries when a site silently serves
// captcha/block pages matching this regexp.
SilentThrottle *regexp.Regexp
// Limiter applies a per-request rate limiter instead of the client default.
Limiter Limiter
}
DoConfig controls per-request behavior for Client.Do and Client.Get.
type Option ¶
type Option func(*Client)
Option configures a Client at construction time.
func WithBrowser ¶
func WithBrowser() Option
WithBrowser configures the client to always use headless browser.
func WithCacheStatuses ¶ added in v0.2.0
WithCacheStatuses sets which HTTP status codes are eligible for caching. By default only 200 is cached. Use this to also cache redirects, 404s, etc.
func WithChromiumSandbox ¶ added in v0.2.0
WithChromiumSandbox controls whether the headless Chromium browser runs with OS-level sandboxing. Defaults to true. Set to false in environments where sandboxing is unsupported (e.g. CI containers without suid sandbox).
func WithIgnoreHeaders ¶ added in v0.2.0
WithIgnoreHeaders excludes the named headers from cache key computation. Useful for scraping where User-Agent or Accept-Encoding vary between requests but should map to the same cache entry.
func WithIgnoreParams ¶ added in v0.2.0
WithIgnoreParams excludes the named query parameters from cache key computation. Useful for stripping auth tokens, timestamps, or tracking params (utm_source, etc.) that vary between requests to the same resource.
func WithRateLimit ¶
WithRateLimit sets a programmatic rate limit, overriding the env var.
func WithRequestBodyLimit ¶ added in v0.2.0
WithRequestBodyLimit sets the maximum request body size used for cache key computation. 0 means no limit. Default: 10 MB.
func WithResponseBodyLimit ¶ added in v0.2.0
WithResponseBodyLimit sets the maximum response body size to read and cache. 0 means no limit. Default: 100 MB.
func WithRetry ¶ added in v0.2.0
func WithRetry(cfg RetryConfig) Option
WithRetry configures retry behavior. Zero-value fields keep defaults.
func WithUserAgent ¶ added in v0.2.0
WithUserAgent sets a default User-Agent header on all HTTP requests. The header is added before each request if not already set by the caller.
type Page ¶
type Page struct {
Meta PageMeta `json:"meta"`
Request PageRequest `json:"request"`
Response PageResponse `json:"response"`
}
Page is a cached HTTP request/response pair with metadata.
func (*Page) HTTPResponse ¶
HTTPResponse reconstructs a standard *http.Response from the cached page. The returned response has its own header map (safe for concurrent use).
type PageDiff ¶
type PageDiff struct {
Changed bool
OldSize int
NewSize int
OldFetched time.Time
NewFetched time.Time
}
PageDiff describes the difference between two page snapshots.
type PageMeta ¶
type PageMeta struct {
Version uint16 `json:"version"`
Source string `json:"-"`
FetchedAt time.Time `json:"fetched_at"`
FetchDur time.Duration `json:"fetch_dur"`
}
PageMeta contains cache metadata for a fetched page.
type PageRequest ¶
type PageRequest struct {
URL string `json:"url"`
RedirectedURL string `json:"redirected_url,omitempty"`
Method string `json:"method"`
Header http.Header `json:"header,omitempty"`
Body []byte `json:"body,omitempty"`
}
PageRequest stores the original HTTP request details.
type PageResponse ¶
type PageResponse struct {
StatusCode int `json:"status_code"`
ProtoMajor int `json:"proto_major"`
ProtoMinor int `json:"proto_minor"`
TransferEncoding []string `json:"transfer_encoding,omitempty"`
ContentLength int64 `json:"content_length"`
Header http.Header `json:"header"`
Body []byte `json:"body"`
Trailer http.Header `json:"trailer,omitempty"`
}
PageResponse stores the HTTP response details including the body.
type PageVersion ¶
type PageVersion struct {
Key string // Cache key for this snapshot.
FetchedAt time.Time // When this snapshot was fetched.
BodyHash string // SHA-256 hex digest of the response body.
}
PageVersion describes a single archived snapshot of a cached page.
type RetryConfig ¶ added in v0.2.0
type RetryConfig struct {
// Attempts is the maximum number of tries (including the first). Default: 5.
Attempts int
// MinWait is the base wait duration for exponential backoff. Default: 1s.
MinWait time.Duration
// MaxWait caps the backoff duration. Default: 1m.
MaxWait time.Duration
// Jitter adds random jitter up to this duration per attempt. Default: 1s.
Jitter time.Duration
}
RetryConfig controls retry behavior for failed HTTP requests.
type StatusError ¶
type StatusError struct {
Page *Page
}
StatusError is returned when the HTTP status is not 200 OK. The Page contains the response and status.
func (*StatusError) Error ¶
func (e *StatusError) Error() string
type ThrottledError ¶
type ThrottledError struct{}
ThrottledError is returned when the fetch is throttled.
func (*ThrottledError) Error ¶
func (e *ThrottledError) Error() string
type Transport ¶
type Transport struct {
// Base is the underlying RoundTripper. Nil means http.DefaultTransport.
Base http.RoundTripper
// contains filtered or unexported fields
}
Transport is an http.RoundTripper that caches responses in a blob.Bucket. Responses are fully buffered (no streaming). Only HTTP 200 responses are cached.
Use WithCachePolicy on the request context to control per-request caching.
func NewTransport ¶
func NewTransport(bucket *blob.Bucket, opts ...TransportOption) *Transport
NewTransport creates a caching Transport backed by the given bucket.
func (*Transport) Stats ¶ added in v0.2.0
func (t *Transport) Stats() TransportStatsSnapshot
Stats returns a snapshot of the transport's cache performance counters.
type TransportOption ¶
type TransportOption func(*Transport)
TransportOption configures a Transport.
func TransportWithCacheStatuses ¶ added in v0.2.0
func TransportWithCacheStatuses(codes ...int) TransportOption
TransportWithCacheStatuses sets which HTTP status codes are eligible for caching. By default only 200 is cached.
func TransportWithIgnoreHeaders ¶ added in v0.2.0
func TransportWithIgnoreHeaders(names ...string) TransportOption
TransportWithIgnoreHeaders excludes the named headers from cache key computation. Useful when User-Agent or Accept-Encoding vary between requests but should map to the same cache entry.
func TransportWithIgnoreParams ¶ added in v0.2.0
func TransportWithIgnoreParams(names ...string) TransportOption
TransportWithIgnoreParams excludes the named query parameters from cache key computation. Useful for stripping auth tokens or tracking params.
func TransportWithRateLimit ¶
func TransportWithRateLimit(rps int, opts ...ratelimit.Option) TransportOption
TransportWithRateLimit sets a rate limit on outgoing requests.
func TransportWithRequestBodyLimit ¶
func TransportWithRequestBodyLimit(n int64) TransportOption
TransportWithRequestBodyLimit sets the maximum request body size used for cache key computation. 0 means no limit.
func TransportWithResponseBodyLimit ¶
func TransportWithResponseBodyLimit(n int64) TransportOption
TransportWithResponseBodyLimit sets the maximum response body size to cache. 0 means no limit.
func TransportWithUserAgent ¶ added in v0.2.0
func TransportWithUserAgent(ua string) TransportOption
TransportWithUserAgent sets a default User-Agent header on all requests.
type TransportStats ¶ added in v0.2.0
type TransportStats struct {
// Hits counts cache hits (served from cache without fetch).
Hits atomic.Int64
// Misses counts cache misses (required a fetch).
Misses atomic.Int64
// Revalidated counts conditional requests that returned 304.
Revalidated atomic.Int64
// Coalesced counts requests served by singleflight coalescing.
Coalesced atomic.Int64
}
TransportStats tracks cache performance counters. All fields are safe for concurrent access. Read via Transport.Stats().