gohttpdisk

package module
v0.0.0-...-07ed8d9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 6, 2021 License: MIT Imports: 19 Imported by: 0

README

badge

Overview

gohttpdisk will cache http responses on disk. Several of these already exist (see below) but this one is a bit different. The priority for gohttpdisk is to always cache on disk. It is not RFC compliant. It caches GET, POST and everything else. gohttpdisk is useful for crawling projects, to aggressively avoid extra http requests.

Usage

Just plug gohttpdisk into an http.Client:

hd := NewHTTPDisk(gohttpdisk.Options{})
client := http.Client{Transport: hd}
resp, err = client.Get("http://google.com")
...

Responses will be cached in gohttpdisk. The cache key is the md5 sum of the HTTP method, the normalized URL, and the request body. The path will be of the form gohttpdisk/google.com/98/fa/1f08556382802ef7e26852c527c2. Responses never expire and are never deleted by gohttpdisk. They will last forever and grow unbounded until manually deleted.

Note that HTTP headers are NOT used to calculate the cache key. This can be unintuitive for crawling projects that involve cookies or session state.

Also See

Here are some other excellent caching libraries that you might want to check out. These generally act like traditional HTTP caches:

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func MustRequest

func MustRequest(method string, url string) *http.Request

create a test request

func MustURL

func MustURL(s string) *url.URL

parse url

func MustWriteGzip

func MustWriteGzip(path string, data string)

write data to path

func TmpDir

func TmpDir() string

a temp dir where we can place our cache

Types

type Cache

type Cache struct {
	// Directory where the cache is stored. Defaults to gohttpdisk.
	Dir string

	// If true, don't include the request hostname in the path for each element.
	NoHosts bool
}

Cache will cache http.Responses on disk, using the http.Request to calculate a key. It deals with keys and files, not the network.

func (*Cache) Get

func (cache *Cache) Get(cacheKey *CacheKey) (data []byte, age time.Duration, err error)

Get the cached data for a request. An empty byte array will be returned if the entry doesn't exist or can't be read for any reason.

func (*Cache) RemoveAll

func (cache *Cache) RemoveAll() error

RemoveAll unlinks the cache.

func (*Cache) Set

func (cache *Cache) Set(cacheKey *CacheKey, data []byte) error

Set cached data for a request.

func (*Cache) Touch

func (cache *Cache) Touch(cacheKey *CacheKey) error

Update the modified time if the cached file exists.

type CacheEntry

type CacheEntry struct {
	Response *http.Response
	Age      time.Duration
}

type CacheKey

type CacheKey struct {
	Request *http.Request
}

a key in the cache

func MustCacheKey

func MustCacheKey(req *http.Request) *CacheKey

create cache key

func NewCacheKey

func NewCacheKey(req *http.Request) (*CacheKey, error)

func (*CacheKey) Digest

func (cacheKey *CacheKey) Digest() string

Digest returns the md5 sum for this request.

func (*CacheKey) Diskpath

func (cacheKey *CacheKey) Diskpath(noHosts bool) string

Path returns the path on disk for this request.

func (*CacheKey) Key

func (cacheKey *CacheKey) Key() string

Key calculates a canonical cache key for the request based on the http method, the normalized URL, and the request body if present. The key can be quite long since it contains the request body.

type HTTPDisk

type HTTPDisk struct {
	// Underlying Cache.
	Cache Cache
	// if nil, http.DefaultTransport is used.
	Transport http.RoundTripper
	Options   Options
}

HTTPDisk is a caching http transport.

func NewHTTPDisk

func NewHTTPDisk(options Options) *HTTPDisk

NewHTTPDisk constructs a new HTTPDisk.

func (*HTTPDisk) RoundTrip

func (hd *HTTPDisk) RoundTrip(req *http.Request) (resp *http.Response, err error)

func (*HTTPDisk) Status

func (hd *HTTPDisk) Status(req *http.Request) (*Status, error)

type Options

type Options struct {
	// Directory where the cache is stored. Defaults to httpdisk.
	Dir string

	// Maximum amount of time a cached response is considered fresh. If less
	// than or equal to zero, then all content is considered fresh. If positive,
	// then cached content will be re-fetched if it is older than this.
	MaxAge time.Duration

	// Don't read anything from cache (but still write)
	Force bool

	// Don't read errors from cache (but still write)
	ForceErrors bool

	// Optional logger
	Logger *log.Logger

	// Don't cache errors during background revalidation. Leave stale data in cache instead.
	// Only relevant if StaleWhileRevalidate is set.
	NoCacheRevalidationErrors bool

	// If true, don't include the request hostname in the path for each element.
	NoHosts bool

	// If StaleWhileRevalidate is enabled, you may optionally set this wait group
	// to be notified when background fetches complete.
	RevalidationWaitGroup *sync.WaitGroup

	// Return stale cached responses while refreshing the cache in the background.
	// Only relevant if MaxAge is set.
	StaleWhileRevalidate bool

	// Update cache file modification time before kicking off a background revalidation.
	// Helps guard against thundering herd problem, but risks leaving stale data in the
	// cache longer than expected. Only relevant if StaleWhileRevalidate is set.
	TouchBeforeRevalidate bool
}

Options for creating a new HTTPDisk.

type Status

type Status struct {
	Age    time.Duration
	Digest string
	Key    string
	Path   string
	Status string
	URL    string
}

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL