gitcache

package
v0.0.0-...-66da7ee Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 12, 2026 License: MIT Imports: 33 Imported by: 0

Documentation

Overview

Package gitcache implements a Git protocol v2 caching proxy.

NOTE: This package uses the global slog logger rather than accepting an injected *slog.Logger. The server configures the default slog handler at startup, so package-level slog calls inherit that configuration. This is idiomatic Go for packages that don't need per-instance log routing.

Package gitcache implements a Git protocol v2 caching proxy for read-only (clone/fetch) operations. It maintains local bare repository mirrors and serves git objects from cache when possible, falling back to upstream GitHub when the cache is cold.

The caching technique is inspired by Google's goblet project (Apache 2.0). Key invariant: ls-refs is always forwarded to upstream to verify access and freshness. Fetch commands are only served from cache after a successful ls-refs in the same request.

Index

Constants

This section is empty.

Variables

View Source
var ValidNameRe = regexp.MustCompile(`^[a-zA-Z0-9._-]+$`)

ValidNameRe matches valid GitHub owner and repository names: alphanumeric, hyphens, underscores, and dots. Rejects path separators, "..", and other characters that could escape the cache root directory.

Functions

func EncodeCommandsToReader

func EncodeCommandsToReader(cmd Command) io.Reader

EncodeCommandsToReader serialises protocol v2 request chunks into an io.Reader suitable for forwarding to an upstream server.

func MaybeGunzip

func MaybeGunzip(r *http.Request) (io.ReadCloser, error)

MaybeGunzip returns a reader that decompresses gzip content if the Content-Encoding header indicates gzip, or the original body otherwise. The returned io.ReadCloser must be closed by the caller to release resources. Closing the returned reader closes both the gzip layer and the underlying request body.

func NewCacheLookup

func NewCacheLookup(inner http.Handler, cache *Handler, store database.Store, logger *slog.Logger) http.Handler

NewCacheLookup creates a middleware that intercepts git smart HTTP requests for cache-enabled repositories.

func ParseFetchHaves

func ParseFetchHaves(cmd Command) []plumbing.Hash

ParseFetchHaves extracts the "have <hash>" arguments from a fetch command.

func ParseFetchWants

func ParseFetchWants(cmd Command) (hashes []plumbing.Hash, refs []string, err error)

ParseFetchWants extracts the object hashes and ref names from a fetch command's "want <hash>" and "want-ref <refname>" arguments.

func ParseLsRefsResponse

func ParseLsRefsResponse(chunks []*gitprotocolio.ProtocolV2ResponseChunk) (map[string]plumbing.Hash, error)

ParseLsRefsResponse parses an ls-refs response from upstream into a map of ref name → object hash.

func ResolveWantHashes

func ResolveWantHashes(store storer.EncodedObjectStorer, refStore storer.ReferenceStorer, wantHashes []plumbing.Hash, wantRefs []string) ([]plumbing.Hash, error)

ResolveWantHashes resolves want-ref references to their object hashes using the local store, then combines with direct want hashes.

func ServeFetchLocal

func ServeFetchLocal(w io.Writer, store storer.EncodedObjectStorer, wants []plumbing.Hash, haves []plumbing.Hash) error

ServeFetchLocal generates a packfile containing the objects the client needs (wants minus haves) and writes the Git protocol v2 response.

The response format is:

  • "packfile" section header (no leading delimiter — delimiters only separate sections)
  • Sideband-encoded packfile data (band 1)
  • Flush packet

func StartCleanup

func StartCleanup(ctx context.Context, cacheStorageRoot string, store database.Store, interval time.Duration)

StartCleanup launches a background goroutine that periodically enforces per-repository cache size limits by evicting the oldest protocol response files. cacheStorageRoot is the top-level cache storage directory (protocol response subdirectories are resolved internally). The goroutine stops when ctx is cancelled.

func SyncCacheReposMetric

func SyncCacheReposMetric(ctx context.Context, store database.Store)

SyncCacheReposMetric queries the store and updates the CacheReposActive gauge.

func WriteError

func WriteError(w io.Writer, msg string) error

WriteError writes a Git protocol error packet.

func WriteInfoRefsResponse

func WriteInfoRefsResponse(w http.ResponseWriter)

WriteInfoRefsResponse writes the synthetic capability advertisement for a cached repository's /info/refs?service=git-upload-pack endpoint.

func WriteResponseChunks

func WriteResponseChunks(w io.Writer, chunks []*gitprotocolio.ProtocolV2ResponseChunk) error

WriteResponseChunks writes protocol v2 response chunks to w.

Types

type CacheLookup

type CacheLookup struct {
	// contains filtered or unexported fields
}

CacheLookup checks whether a git smart HTTP request targets a cache-enabled repository and routes it to the cache handler if so. Non-git paths and uncached repos pass through to the inner handler.

func (*CacheLookup) ServeHTTP

func (cl *CacheLookup) ServeHTTP(w http.ResponseWriter, r *http.Request)

ServeHTTP checks if the request targets a cached repository's git endpoints and routes accordingly.

type CacheResult

type CacheResult string

CacheResult describes the outcome of a cache operation for logging and metrics.

const (
	CacheHit         CacheResult = "hit"
	CacheMiss        CacheResult = "miss"
	CacheRejected    CacheResult = "rejected"
	CacheError       CacheResult = "error"
	CachePassthrough CacheResult = "passthrough" // delegate to inner handler
)

type CacheStorageFactory

type CacheStorageFactory interface {
	// Open returns a go-git Storer for the given repository.
	// Creates the backing storage if it doesn't exist.
	Open(owner, repo string) (storage.Storer, error)

	// Delete removes all cached data for a repository.
	Delete(owner, repo string) error
}

CacheStorageFactory abstracts where cached git objects are stored. go-git's storage.Storer interface is the foundation — filesystem and S3 backends implement this interface, making them transparent to the cache handler and repository management layers.

type Command

type Command struct {
	Name   string
	Chunks []*gitprotocolio.ProtocolV2RequestChunk
}

Command represents a parsed Git protocol v2 command (ls-refs or fetch) along with its raw request chunks for forwarding to upstream.

func ParseCommands

func ParseCommands(r io.Reader) ([]Command, error)

ParseCommands reads a Git protocol v2 request body and returns the individual commands. Each command consists of a command chunk followed by capability/argument chunks. Commands are separated by flush packets (EndArgument); the final flush (EndRequest) terminates the stream.

type FilesystemStorageFactory

type FilesystemStorageFactory struct {
	// contains filtered or unexported fields
}

FilesystemStorageFactory stores cached bare repos on local disk under a root directory. Each repository gets its own subdirectory at <root>/<owner>/<repo>.

func NewFilesystemStorageFactory

func NewFilesystemStorageFactory(root string) (*FilesystemStorageFactory, error)

NewFilesystemStorageFactory creates a factory that stores bare repos under root. The root directory is created if it does not exist.

func (*FilesystemStorageFactory) Delete

func (f *FilesystemStorageFactory) Delete(owner, repo string) error

Delete removes all cached data for a repository from disk.

func (*FilesystemStorageFactory) Open

func (f *FilesystemStorageFactory) Open(owner, repo string) (storage.Storer, error)

Open returns a filesystem-backed go-git Storer for the given repository. The directory is created if it does not exist.

type Handler

type Handler struct {
	// contains filtered or unexported fields
}

Handler implements the Git protocol v2 caching proxy.

Request flow: the Handler sits inside a middleware chain where token resolution has already occurred before the request arrives here:

ScopedPassthroughHandler (resolves ghx_/gha_ proxy tokens → real GitHub tokens)
  → CacheLookup middleware (checks if repo is cache-enabled)
    → Handler (this code)

By the time a request reaches Handler, the Authorization header contains a resolved GitHub credential (e.g. ghs_*, gho_*), not the original proxy token. Methods like handleLsRefs and handleFetch forward this header verbatim to upstream GitHub.

Access verification: for bundled requests (ls-refs + fetch), handleLsRefs runs first and verifies access via upstream GitHub before any cached data is served — the fetch command only executes after ls-refs succeeds (enforced by the lsRefsSucceeded gate in ServeUploadPack). For standalone fetch requests, the ScopedPassthroughHandler has already validated the token and enforced scope before the request reaches this handler.

func NewHandler

func NewHandler(registry *Registry, serviceTokenFn ServiceTokenFunc, upstreamBaseURL, responseCacheDir string) *Handler

func (*Handler) ServeInfoRefs

func (h *Handler) ServeInfoRefs(w http.ResponseWriter, r *http.Request)

ServeInfoRefs handles GET /owner/repo.git/info/refs?service=git-upload-pack by returning synthetic protocol v2 capabilities.

func (*Handler) ServeUploadPack

func (h *Handler) ServeUploadPack(w http.ResponseWriter, r *http.Request, owner, repo string)

ServeUploadPack handles POST /owner/repo.git/git-upload-pack by parsing protocol v2 commands and serving from cache where possible.

Security invariant: fetch commands are only served after a successful ls-refs (which verifies the user's GitHub access via upstream).

type ManagedRepository

type ManagedRepository struct {
	// contains filtered or unexported fields
}

ManagedRepository represents a cached bare repository mirror. It wraps a go-git repository and provides thread-safe fetch, object existence checks, and ref comparison against upstream.

func (*ManagedRepository) FetchUpstream

func (m *ManagedRepository) FetchUpstream(ctx context.Context, token string) error

FetchUpstream fetches all refs from the upstream remote into the local cache. Only one fetch runs at a time per repository; concurrent calls block until the in-progress fetch completes.

The token parameter is the GitHub credential to use for authentication. For async cache warming this should be a GitHub App installation token.

func (*ManagedRepository) HasAllWants

func (m *ManagedRepository) HasAllWants(hashes []plumbing.Hash, refs []string) (bool, error)

HasAllWants checks whether all requested objects and refs exist in the local cache. Returns true only if every want is satisfied locally.

func (*ManagedRepository) HasAnyUpdate

func (m *ManagedRepository) HasAnyUpdate(upstreamRefs map[string]plumbing.Hash) (bool, error)

HasAnyUpdate compares upstream refs (from an ls-refs response) against the local cache. Returns true if any ref is new or has a different hash.

func (*ManagedRepository) LastUpdateTime

func (m *ManagedRepository) LastUpdateTime() time.Time

LastUpdateTime returns the time of the most recent successful upstream fetch.

func (*ManagedRepository) RLock

func (m *ManagedRepository) RLock()

RLock acquires the read lock, preventing concurrent writes (FetchUpstream) from modifying the storer while reads are in progress.

func (*ManagedRepository) RUnlock

func (m *ManagedRepository) RUnlock()

RUnlock releases the read lock.

func (*ManagedRepository) Storer

func (m *ManagedRepository) Storer() storage.Storer

Storer returns the underlying go-git Storer for direct object access (used by pack generation).

type Registry

type Registry struct {
	// contains filtered or unexported fields
}

Registry manages the set of cached ManagedRepository instances, keyed by owner/repo. It lazily initialises repositories on first access.

func NewRegistry

func NewRegistry(factory CacheStorageFactory, baseURL *url.URL) *Registry

NewRegistry creates a Registry that uses the given storage factory and upstream base URL.

func (*Registry) Get

func (r *Registry) Get(owner, repo string) (*ManagedRepository, error)

Get returns the ManagedRepository for the given owner/repo, creating and initialising it if it doesn't exist yet (lazy population).

func (*Registry) Remove

func (r *Registry) Remove(owner, repo string) error

Remove removes a cached repository from the registry and deletes its backing storage.

type S3StorageFactory

type S3StorageFactory struct {
	// contains filtered or unexported fields
}

S3StorageFactory implements CacheStorageFactory using an S3-compatible object store as the backing storage for cached bare repositories.

Each repository is stored under the key prefix <basePath>/<owner>/<repo>/ within the configured bucket. The go-git Storer interface is backed by the filesystem storage with a local staging area, and objects are synced to/from S3 transparently.

This backend enables horizontal scaling: multiple ghp instances can share the same S3 bucket and prefix, benefiting from a shared cache without filesystem coordination.

Note: S3 does not support atomic operations, so concurrent writes to the same repository from different instances may result in redundant uploads. This is safe because git objects are content-addressed and immutable.

func NewS3StorageFactory

func NewS3StorageFactory(bucket, region, endpoint, basePath string) *S3StorageFactory

NewS3StorageFactory creates an S3-backed storage factory.

Parameters:

  • bucket: S3 bucket name
  • region: AWS region (e.g. "us-east-1")
  • endpoint: custom S3-compatible endpoint (empty for AWS S3)
  • basePath: key prefix within the bucket (e.g. "cache")

func (*S3StorageFactory) Delete

func (s *S3StorageFactory) Delete(owner, repo string) error

Delete removes all cached data for the given repository from S3.

TODO: Implement by listing and deleting all objects under the repository's key prefix in the S3 bucket.

func (*S3StorageFactory) Open

func (s *S3StorageFactory) Open(owner, repo string) (storage.Storer, error)

Open returns a go-git Storer for the given repository.

TODO: Implement S3-backed Storer. This requires adding the AWS SDK dependency (github.com/aws/aws-sdk-go-v2). The implementation will:

  1. Use a local filesystem staging area for in-progress operations
  2. Sync objects to/from S3 on open/close
  3. Support go-git's EncodedObjectStorer and ReferenceStorer interfaces

For now, this returns an error indicating the feature is not yet available.

type ServiceTokenFunc

type ServiceTokenFunc func(ctx context.Context) (string, error)

ServiceTokenFunc returns a GitHub App installation token for async cache warming. Unlike per-request tokens, this is not tied to a specific user's credential. May be nil if async warming is not configured.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL