Documentation
¶
Overview ¶
Package cache the data store
Index ¶
- Constants
- Variables
- func CachedResponse(c Cache, req *http.Request) (resp *http.Response, err error)
- func Date(respHeaders http.Header) (date time.Time, err error)
- func IsFromCache(res *http.Response) bool
- type Cache
- type Cookie
- type Options
- type Policy
- type Transport
- func (t *Transport) Client() *http.Client
- func (t *Transport) RoundTrip(req *http.Request) (resp *http.Response, err error)
- func (t *Transport) RoundTripDummy(req *http.Request) (resp *http.Response, err error)
- func (t *Transport) RoundTripRFC2616(req *http.Request) (resp *http.Response, err error)
- func (t *Transport) SetProxy(proxy func(*http.Request) (*url.URL, error))
Constants ¶
const ( // XFromCache is the header added to responses that are returned from the cache XFromCache = "X-From-Cache" // DefaultPath the default cache path DefaultPath = "cache" )
Variables ¶
var ErrNoDateHeader = errors.New("no Date header")
ErrNoDateHeader indicates that the HTTP headers contained no Date header.
Functions ¶
func CachedResponse ¶
CachedResponse returns the cached http.Response for req if present, and nil otherwise.
func IsFromCache ¶
IsFromCache returns true if the response is from cache
Types ¶
type Cache ¶
type Cache interface {
Get(key string) ([]byte, bool)
Set(key string, value []byte)
SetWithTimeout(key string, value []byte, timeout time.Duration)
Del(key string)
}
A Cache interface is used to store bytes.
type Cookie ¶
type Cookie interface {
http.CookieJar
// SetCookieString handles the receipt of the cookies string in a reply for the given URL.
SetCookieString(u *url.URL, cookies string)
// CookieString returns the cookies string for the given URL.
CookieString(u *url.URL) string
// DeleteCookie delete the cookies for the given URL.
DeleteCookie(u *url.URL)
}
Cookie manages storage and use of cookies in HTTP requests. Implementations of Cookie must be safe for concurrent use by multiple goroutines.
type Options ¶
type Options struct {
Path string `yaml:"path"`
ExpireCleanInterval time.Duration `yaml:"expire-clean-interval"`
}
Options the cache configuration
type Policy ¶
type Policy string
Policy has no awareness of any HTTP Cache-Control directives.
const ( // Dummy policy is useful for testing spiders faster (without having to wait for downloads every time) // and for trying your spider offline, when an Internet connection is not available. // The goal is to be able to “replay” a spider run exactly as it ran before. Dummy Policy = "dummy" // RFC2616 This policy provides a RFC2616 compliant HTTP cache, i.e. with HTTP Cache-Control awareness, // aimed at production and used in continuous runs to avoid downloading unmodified data // (to save bandwidth and speed up crawls). RFC2616 Policy = "rfc2616" )
type Transport ¶
type Transport struct {
Policy Policy
// The RoundTripper interface actually used to make requests
// If nil, http.DefaultTransport is used
Transport http.RoundTripper
Cache Cache
// If true, responses returned from the cache will be given an extra header, X-From-Cache
MarkCachedResponses bool
}
Transport is an implementation of http.RoundTripper that will return values from a cache where possible (avoiding a network request) and will additionally add validators (etag/if-modified-since) to repeated requests allowing servers to return 304 / Not Modified
func NewTransport ¶
NewTransport returns new Transport with the provided Cache implementation and MarkCachedResponses set to true
func (*Transport) RoundTrip ¶
RoundTrip is a wrapper for caching requests. If there is a fresh Response already in cache, then it will be returned without connecting to the server.
func (*Transport) RoundTripDummy ¶
RoundTripDummy has no awareness of any HTTP Cache-Control directives. Every request and its corresponding response are cached. When the same request is seen again, the response is returned without transferring anything from the Internet.
func (*Transport) RoundTripRFC2616 ¶
RoundTripRFC2616 provides a RFC2616 compliant HTTP cache, i.e. with HTTP Cache-Control awareness, aimed at production and used in continuous runs to avoid downloading unmodified data (to save bandwidth and speed up crawls).
If there is a stale Response, then any validators it contains will be set on the new request to give the server a chance to respond with NotModified. If this happens, then the cached Response will be returned.