vcsfetch

package module
v0.0.0-...-13edf49 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 14, 2025 License: Apache-2.0 Imports: 12 Imported by: 0

README

go-vcsfetch

Tests Coverage CI vuln scan CodeQL

Release Go Report Card CodeFactor Grade License

GoDoc go version Top language Commits since latest release

vcs fetcher and cloner for Go.

A Go library for fetching files from version control systems (vcs).


Easily retrieve individual files or repositories over a vcs location.

  • Support git repositories
  • Support SPDX Locators (spdx downloadLocation attribute)
  • Support common git-url schemes

All fetched resources are exposed for read-only operations only.

If you're looking for general purpose vcs support in Go for read/write or other git-heavy operations, consider using github.com/go-git/go-git instead.

Status

Work in progress. Unreleased.

Use-cases

  • retrieve a single file over a remote repo (e.g. config file)
  • retrieve an entire folder at a specific version
  • ...

Not intended to work with local resources (e.g. file://...).

Features

VCS (Version Control System)

  • Works without git installed
  • Supported schemes: http, https, ssh, git TCP
  • Authentication (basic, ssh)
  • Fetch (single file) or Clone (folder or entire repo)
  • Fetch optimized for common SCMs (github.com, gitlab), with https raw content download to bypass pure-git operations
  • In memory or filesystem-backed
  • Supports sparse-cloning
  • Auto-detects the presence of the git binary for faster fetching using the git command line

Resolving versions

  • Ref as commit sha, branch or tag, with exact match
  • Semver tag resolution with incomplete semver: e.g. resolve v2 as the latest tag <v3, and 2.1 as the latest tag <v2.2

SCM-specific URLs

  • git-url parses resource locators for well-known schemes
    • azure
    • bitbucket
    • gitea
    • github
    • gitlab
  • know how to transform a resource locator into a raw-content URL

Quick start

go get github.com/fredbi/go-vcsfetch

Usage

Basic usage
import (
    "bytes"
    "context"
    "log"

    "github.com/fredbi/go-vcsfetch"
)

...

vf := vcsfetch.NewFetcher()
w := new(bytes.Buffer)
ctx := context.Background()

const spdxDownloadLocation = "https://github.com/fredbi/go-vcsfetch@HEAD#.golangci.yml"

if err := vf.Fetch(ctx, w, spdxDownloadLocation); err != nil {
    ...
}

log.Println(w.String())
Advanced usage with options

Example use cases:

  1. authentication
  2. git over TCP
  3. repo cloning
  4. git-urls
  5. folder retrieval and repeated fetches
  6. exact vs semver tag resolution
  7. using shorthand slugs
  8. git-archive (with benchmark)
  9. TLS settings

Take a tour of the Live examples.

Dependencies

This library is built on top of github.com/go-git/go-git, a pure Go git implementation. It does not require runtime dependencies (e.g. like when using go-git bindings from git2go).

It does not require the git binary to be installed.

However, when the git binary is present and auto-detection is not disabled, the library may chose to perform some operations using the native git implementation, which is usually faster than the native go port.

Documentation

Go Reference.

Resource usage and performances

TODO

Roadmap
  • Support for git-archive download, when well-known SCM will start support this protocol
  • Support for mercurial, with a runtime dependency on hg.
  • native go git-archive support (or from go-git/v6?)
  • support semver version constraint such as ^v1.2.3 or ~v1.2.3
  • mock git server

License

This library is distributed under the Apache 2.0 license.

SPDX-FileCopyrightText: Copyright 2025 Frédéric BIDON SPDX-License-Identifier: Apache-2.0

Credits and acknowledgments

Initially, my intent was to enable a shared golangci-lint config file on a repository common to all repos within an organization.

Doing a little research on how to work with vcs resources in Go, I stumbled over this little package github.com/carabiner-dev/vcslocator.

I started to use it right away, but was quickly hindered by a number of limitations. My requirements departed quite a bit from that implementation, to the point that forking wasn't an option. And so started this implementation.

Thank you to the guys at carabiner-dev, who provided me the inspiration to use a SPDX locator on top of go-git.

Notice that this implementation is 100% original code and not a plagiarism of the above.

Documentation

Overview

Package vcsfetch provides a vcs fetcher and cloner for Go.

URL formats for vcs locations

We recommend the SPDX format, which is standardized and unambiguous. SPDX URLs must contain an URL fragment.

Fetcher and Cloner also support well-known git-url schemes exposed by git platforms such as github, gitlab and gitea.

URL shorthands using repo slugs: TODO

Supported vcs protocols

Both the Fetcher and the Cloner come with native support for git, with no runtime dependencies. Supported transports for git are: file, https, ssh and git over TCP.

NOTES:

  • http is also supported (e.g. for testing).
  • git over TCP is not supported as a SPDX locator (TODO: check this)

Limitations

At this moment, this package does not support mercurial ("hg"). We may add this feature later on, as mercurial is supported by Go.

Fetcher and Cloner do not support bazaar ("bzr") or subversion ("svn"), and we currently have no plan to add support for those.

Versions

Versions may specify a given commit sha (full or short sha), a branch (resolves as the HEAD of that branch) or a tag.

The symbolic ref "HEAD" resolves as the HEAD of the default branch.

Semver tags may be incomplete to refer to the latest semver with a given major or minor version:

- v2 resolves as the latest v2.x.y tag (i.e. <v3) - v2.1 resolves as the latest v2.1.y tag (i.e. <v2.2)

Partial version behavior may be disabled with FetchWithExactTag (resp. CloneWithExactTag when cloning).

If no version information is provided, the default reference is the HEAD commit of the default branch (e.g. master or main).

Default version behavior may be disabled with FetchWithRequireVersion (resp. [CloneWithRequiredVersion]).

Authentication

  • TLS
  • proxy

TODO

Index

Constants

View Source
const ErrVCS vcsFetchError = "vcsfetch error"

ErrVCS is a sentinel error for all errors that originate from this package.

Variables

This section is empty.

Functions

This section is empty.

Types

type CloneOption

type CloneOption func(*cloneOptions)

CloneOption configures a Cloner with optional behavior.

func CloneWithAllowPrereleases

func CloneWithAllowPrereleases(allowed bool) CloneOption

CloneWithAllowPrereleases includes pre-releases in semver tag resolution.

By default pre-releases are ignored.

This option is disabled when using CloneWithExactTag.

Example: for tag "v2", with pre-releases allowed, "v1.3.0-rc1" is a valid candidate.

func CloneWithBackingDir

func CloneWithBackingDir(enabled bool, dir string) CloneOption

CloneWithBackingDir tells the Cloner to back the cloned resources on disk. By default, cloned resources are mapped in memory.

If dir is empty, the default is given by os.MkDirTemp using "vcsclone" as the pattern. In this case, CloneWithBackingDir panics if it can't create a temporary directory.

When using CloneWithBackingDir with a non-empty directory, the cloned content will not be removed after usage and left up to the caller to leave it or clean it if needed.

func CloneWithExactTag

func CloneWithExactTag(exact bool) CloneOption

CloneWithExactTag indicates that tag references are matched exactly.

By default tags are resolved to match the latest semver tag, when a version tag is not fully specified, e.g. "v2" would look for the latest "v2.x.y" tag, and "v2.1" for the latest "v2.1.y" tag. "v2.3.4" would always resolve to "v2.3.4".

When specifying an exact tag, there is no semver implied or filtering of prereleases.

func CloneWithGitDebug

func CloneWithGitDebug(enabled bool) CloneOption

CloneWithGitDebug enables debug logging of the underlying git operations.

func CloneWithGitLocatorOptions

func CloneWithGitLocatorOptions(opts ...GitLocatorOption) CloneOption

CloneWithGitLocatorOptions appends giturl-specific options to apply to any git-url locator to be cloned.

func CloneWithGitSkipAutoDetect

func CloneWithGitSkipAutoDetect(skipped bool) CloneOption

CloneWithGitSkipAutoDetect skips the auto-detection of a local git binary.

Whenever enabled, git binary autodetection allows for some operations to be performed faster using git native implementation rather than the pure go implementation.

func CloneWithRecurseSubmodules

func CloneWithRecurseSubmodules(enabled bool) CloneOption

CloneWithRecurseSubmodules resolves submodules when cloning.

By default, git submodules are not updated.

func CloneWithRequireVersion

func CloneWithRequireVersion(required bool) CloneOption

CloneWithRequireVersion tells the Cloner to check that the cloned location comes with an explicit version. No default to HEAD is applied.

func CloneWithSPDXOptions

func CloneWithSPDXOptions(opts ...SPDXOption) CloneOption

CloneWithSPDXOptions appends SPDX-specific options to apply to any SPDX locator to be cloned.

func CloneWithSparseFilter

func CloneWithSparseFilter(filter ...string) CloneOption

CloneWithSparseFilter instructs the cloning to be performed only on the specified directories or files.

type Cloner

type Cloner struct {
	// contains filtered or unexported fields
}

Cloner allows for working with vcs repositories to perform cloning or sparse cloning.

The Cloner is intended for read-only capture of remote resources. If you need to mutate the cloned resources, please consider using another tool.

See Fetcher for available options.

Fetching multiple resources

The Cloner may be used to fetch against the cloned resources using a similar syntax as with a Fetcher, using the Cloner.FetchFromClone methods. All fetched locators must then match with the cloned base URL or will return an error.

Concurrency

The Cloner is not intended for concurrent usage: it is a stateful object. Once a repository has been cloned, it becomes accessible via Cloner.FS.

You may use Cloner.Close to relinquish memory or temporary disk resources and reuse the Cloner.

Exception: when using CloneWithBackingDir with a non-empty directory, the cloned content is not removed after usage and left up to the caller to leave it or clean it if needed.

func NewCloner

func NewCloner(opts ...CloneOption) *Cloner

NewCloner builds a Cloner to retrieve an entire vcs repository.

func (*Cloner) CloneLocator

func (f *Cloner) CloneLocator(ctx context.Context, locator Locator, opts ...CloneOption) error

CloneLocator clones a vcs repository from a Locator.

The clone is accessible as a read-only fs.FS using Cloner.FS.

func (*Cloner) CloneRepo

func (f *Cloner) CloneRepo(ctx context.Context, repoURL string) error

CloneRepo clones a vcs repository.

The repoURL string must be a valid URL.

The URL is detected to be either a valid SPDX locator or a well-known giturl.

The clone is accessible as a read-only fs.FS using Cloner.FS.

func (*Cloner) CloneURL

func (f *Cloner) CloneURL(ctx context.Context, u *url.URL) error

CloneURL clones a vcs repository from a url.URL.

The clone is accessible as a read-only fs.FS using Cloner.FS.

func (*Cloner) Close

func (f *Cloner) Close() error

Close resets the state of the cloner.

func (*Cloner) FS

func (f *Cloner) FS() fs.FS

FS returns the cloned repository as a file system.

func (*Cloner) FetchFromClone

func (f *Cloner) FetchFromClone(ctx context.Context, w io.Writer, location string) error

FetchFromClone fetches a single file from the cloned repository.

func (*Cloner) FetchLocatorFromClone

func (f *Cloner) FetchLocatorFromClone(ctx context.Context, w io.Writer, locator Locator) error

FetchLocatorFromClone fetches a single file from the cloned repository, using a Locator.

func (*Cloner) FetchURLFromClone

func (f *Cloner) FetchURLFromClone(ctx context.Context, w io.Writer, u *url.URL) error

FetchURLFromClone fetches a single file from the cloned repository, using a url.URL.

type FetchOption

type FetchOption func(*fetchOptions)

FetchOption configures a Fetcher with optional behavior.

func FetchWithAllowPrereleases

func FetchWithAllowPrereleases(allowed bool) FetchOption

FetchWithAllowPrereleases includes pre-releases in semver tag resolution.

By default pre-releases are ignored.

This option is disabled when using FetchWithExactTag.

Example: for tag "v2", with pre-releases allowed, "v1.3.0-rc1" is a valid candidate.

func FetchWithBackingDir

func FetchWithBackingDir(enabled bool, dir string) FetchOption

FetchWithBackingDir tells the Fetcher to back the fetched resources on disk. By default, fetched resources are mapped in memory.

If dir is empty, the default is given by os.MkDirTemp using "vcsclone" as the pattern. In this case, FetchWithBackingDir panics if it can't create a temporary directory.

When using FetchWithBackingDir with a non-empty directory, the fetched content will not be removed after usage and left up to the caller to leave it or clean it if needed.

func FetchWithExactTag

func FetchWithExactTag(exact bool) FetchOption

FetchWithExactTag indicates that tag references are matched exactly.

By default tags are resolved to match the latest semver tag, when a version tag is not fully specified, e.g. "v2" would look for the latest "v2.x.y" tag, and "v2.1" for the latest "v2.1.y" tag. "v2.3.4" would always resolve to "v2.3.4".

When specifying an exact tag, there is no semver implied or filtering of prereleases.

func FetchWithGitDebug

func FetchWithGitDebug(enabled bool) FetchOption

FetchWithGitDebug enables debug logging of the underlying git operations.

func FetchWithGitLocatorOptions

func FetchWithGitLocatorOptions(opts ...GitLocatorOption) FetchOption

FetchWithGitLocatorOptions appends giturl-specific options to apply to any git-url locator to be fetched.

func FetchWithGitSkipAutoDetect

func FetchWithGitSkipAutoDetect(skipped bool) FetchOption

FetchWithGitSkipAutoDetect skips the auto-detection of a local git binary.

Whenever enabled, git binary autodetection allows for some operations to be performed faster using git native implementation rather than the pure go implementation.

func FetchWithRecurseSubmodules

func FetchWithRecurseSubmodules(enabled bool) FetchOption

FetchWithRecurseSubmodules resolves submodules when fetching.

By default, git submodules are not updated.

func FetchWithRequireVersion

func FetchWithRequireVersion(required bool) FetchOption

FetchWithRequireVersion tells the Fetcher to check that the fetched location comes with an explicit version. No default to HEAD is applied.

func FetchWithSPDXOptions

func FetchWithSPDXOptions(opts ...SPDXOption) FetchOption

FetchWithSPDXOptions appends SPDX-specific options to apply to any SPDX locator to be fetched.

func FetchWithSkipRawURL

func FetchWithSkipRawURL(skipped bool) FetchOption

FetchWithSkipRawURL disables the attempt to short-circuit git if a SCM raw-content URL is available for the remote resource.

type Fetcher

type Fetcher struct {
	// contains filtered or unexported fields
}

Fetcher allows for working with vcs repositories to perform cloning, sparse cloning and single file fetching.

The Fetcher is intended for read-only capture of remote resources. If you need to mutate the cloned resources, please consider using another tool.

Concurrency

The Fetcher is stateless and may be called concurrently.

All fetches are carried out independently. If you plan to fetch multiple resources against a single repository, consider using a Cloner for improved performances.

func NewFetcher

func NewFetcher(opts ...FetchOption) *Fetcher

NewFetcher builds a Fetcher to retrieve single files from a vcs repository.

func (*Fetcher) Fetch

func (f *Fetcher) Fetch(ctx context.Context, w io.Writer, location string) error

Fetch a single file from a vcs location string.

The content of the fetched file is copied to the passed io.Writer.

The string argument must be a valid URL.

func (*Fetcher) FetchLocator

func (f *Fetcher) FetchLocator(ctx context.Context, w io.Writer, locator Locator) error

FetchLocator fetches a single file specified by a Locator from a vcs location.

The content of the fetched file is copied to the passed io.Writer.

If you want to retrieve a locator representing a folder, use Cloner.CloneLocator with sparse option.

NOTE: this package provides 2 implementations of the Locator. You may pass your own implementation of this interface to this method.

func (*Fetcher) FetchURL

func (f *Fetcher) FetchURL(ctx context.Context, w io.Writer, u *url.URL) error

FetchURL fetches a single file from a vcs location as an URL.

The content of the fetched file is copied to the passed io.Writer.

If the URL is detected to be a valid SPDX locator, it is equivalent to Fetcher.FetchLocator with a SPDXLocator. Otherwise, it falls back to git-url parsing and is equivalent to Fetcher.FetchLocator with a GitLocator.

If you want to retrieve an URL representing a folder, use Cloner.CloneURL with sparse option instead.

type GitLocator

type GitLocator struct {
	url.Userinfo

	Provider  string
	Transport string
	Host      string
	RepoPath  string
	Ref       string
	SubPath   string
	// contains filtered or unexported fields
}

GitLocator describes an URL used to access a vcs resource over git using common URL formats (github, gitlab, ...).

The URL may use schemes git, http, https or ssh.

See https://git-scm.com/docs/git-fetch#_git_urls for reference.

func GitLocatorFromURL

func GitLocatorFromURL(u *url.URL, opts ...GitLocatorOption) (*GitLocator, error)

GitLocatorFromURL builds a GitLocator from an url.URL.

func ParseGitLocator

func ParseGitLocator(location string, opts ...GitLocatorOption) (*GitLocator, error)

ParseGitLocator builds a GitLocator from an URL string.

func (*GitLocator) HasAuth

func (l *GitLocator) HasAuth() bool

func (*GitLocator) IsLocal

func (l *GitLocator) IsLocal() bool

func (*GitLocator) Path

func (l *GitLocator) Path() string

func (*GitLocator) RepoURL

func (l *GitLocator) RepoURL() *url.URL

func (*GitLocator) String

func (l *GitLocator) String() string

func (*GitLocator) Version

func (l *GitLocator) Version() string

type GitLocatorOption

type GitLocatorOption func(*gitLocatorOptions)

GitLocatorOption is an option to parse a git locator (aka git-url).

func GitWithRequiredVersion

func GitWithRequiredVersion(required bool) GitLocatorOption

GitWithRequiredVersion tells the GitLocator parser to check that the location comes with an explicit version.

func GitWithRootURL

func GitWithRootURL[T string | *url.URL | url.URL](root T) GitLocatorOption

GitWithRootURL declares an URL (as a url.URL or as a string) to prepend to "slug-like" abbreviated locators.

Example to resolve github repo slugs: rootURL = https://github.com

NOTE: GitWithRootURL panics if the argument passed is a string representing an invalid URL.

type Locator

type Locator interface {
	// RepoURL yields the base URL of the vcs repository,
	// e.g. https://github.com/fredbi/go-vcsfetch
	RepoURL() *url.URL

	// Version yields the ref identifying the desired version of a file, e.g. v0.0.1
	Version() string

	// Path yields the file path relative to the repository,
	// e.g. internal/git/api.go
	Path() string

	// IsLocal indicates if the repository is local,
	// e.g. the URL looks like file://src/fred/github.com/fredbi/go-vcsfetch
	IsLocal() bool

	// HasAuth indicates if the [Locator] embeds some credentials,
	// e.g. the URL looks like https://fredbi:token@github.com/fredbi/go-vcsfetch
	HasAuth() bool

	String() string
}

Locator is the interface for types that know how to resolve a vcs URL.

This package currently exposes two implementations: SPDXLocator and GitLocator.

Users of the Fetcher and the Cloner may implement a custom Locator to meet special requirements.

type SPDXLocator

type SPDXLocator struct {
	url.Userinfo

	Tool      string
	Transport string
	Host      string
	RepoPath  string
	Ref       string
	SubPath   string
}

SPDXLocator describes a SPDX VCS locator, with all its components detailed.

It implements the Locator interface.

The SPDX (Software Package Data Exchange) specification provides a detailed framework for referencing software components, including through Version Control System (VCS) locators.

Normative references

TL;DR: the SPDX locator comes with a "@" in the URL path for the version and a "#" URL fragment for the target file (or directory).

SPDX VCS Locator format

The VCS location syntax, as described in the latest SPDX version, resembles a URL with specific structure to accommodate different version control systems.

VCS Location structure

Format:

<vcs_tool>+<transport>://<host_name>[/<path_to_repository>][@<revision_tag_or_branch>][#<sub_path>]

Where:

<vcs_tool>: Specifies the type of version control system (e.g., git, hg, svn, bzr).
<transport>: Indicates the transport mechanism (e.g., ssh, https).
<host_name>: The server or host where the repository resides.
<path_to_repository>: The path to the repository if applicable.
<revision_tag_or_branch>: Identifies a specific commit, branch, or tag in the repository.
<sub_path>: Optional, specifies a sub-directory or file path within the repository.

Examples

Implementation tolerances and limitations

Our use-case for SPDX locators is limited to single file retrieval:

  • an URL fragment is required

Our implementation supports a full URL with the following:

  • an empty "vcs-tool" part is tolerated in the scheme and defaults to "git". Therefore schemes such as "git+https" and "https" are equivalent.
  • "username:password" credentials
  • hostname port
  • query parameters in URL are ignored but tolerated
  • the absence of an explicit reference provided with "@" will be resolved as the head of the default branch

Optionally, the SPDXLocator may support SCM-specific shorthands using "git repo slugs":

The implied vcs base URL is customizable with [WithRootURL].

func ParseSPDXLocator

func ParseSPDXLocator(location string, opts ...SPDXOption) (*SPDXLocator, error)

ParseSPDXLocator parses a VCS locator string and returns its components as a SPDXLocator.

func SPDXLocatorFromURL

func SPDXLocatorFromURL(u *url.URL, opts ...SPDXOption) (*SPDXLocator, error)

SPDXLocatorFromURL parses an URL into a SPDXLocator.

func (*SPDXLocator) HasAuth

func (l *SPDXLocator) HasAuth() bool

func (*SPDXLocator) IsLocal

func (l *SPDXLocator) IsLocal() bool

func (*SPDXLocator) Path

func (l *SPDXLocator) Path() string

func (*SPDXLocator) RepoURL

func (l *SPDXLocator) RepoURL() *url.URL

func (*SPDXLocator) String

func (l *SPDXLocator) String() string

func (*SPDXLocator) Version

func (l *SPDXLocator) Version() string

type SPDXOption

type SPDXOption func(*spdxOptions)

SPDXOption is an option to parse a SPDX locator URL.

func SPDXWithRequiredVersion

func SPDXWithRequiredVersion(required bool) SPDXOption

SPDXWithRequiredVersion tells the SPDXLocator parser to check that the location comes with an explicit version.

func SPDXWithRootURL

func SPDXWithRootURL[T string | *url.URL | url.URL](root T) SPDXOption

SPDXWithRootURL declares an URL (as a url.URL or as a string) to prepend to "slug-like" abbreviated locators.

Example to resolve github repo slugs: rootURL = https://github.com

NOTE: SPDXWithRootURL panics if the argument passed is a string representing an invalid URL.

Directories

Path Synopsis
internal
git
giturl
Package giturl detects and parses vcs URLs for well-known SCM platforms.
Package giturl detects and parses vcs URLs for well-known SCM platforms.
giturl/azure
Package azure provides URL parsing and raw content URL generation for Azure DevOps.
Package azure provides URL parsing and raw content URL generation for Azure DevOps.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL