huggingfacehub

package module
v0.0.0-...-3cc43a7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 4, 2023 License: Apache-2.0 Imports: 17 Imported by: 0

README

huggingfacehub

Download models and tokenizers from HuggingFace Hub, a port of huggingFace_hub python library to Go

Introduction

A simple, straight-forward port of github.com/huggingface/huggingface_hub library for Go.

It is still not very ergonomic -- all parameters are obligatory. So a bit annoying to use, but hopefully end users will instead never have to use this library directly, and instead use it though [github.com/gomlx/tokenizers] and [github.com/gomlx/transformers].

Features supported:

  • Cache system that maches HuggingFace Hub (so same cache can be shared with Python).
  • Locked files (to guarantee only one download when multiple workers are trying to download simultaneously the same model).
  • Allow arbitrary progress function to be called (for progress bar).
  • Arbitrary revision.

TODOs:

  • Add support for optional parameters.
  • Authentication tokens: should be relatively easy.
  • Resume downloads from interrupted connections.
  • Check disk-space before starting to download.

Documentation

Index

Constants

View Source
const (
	HeaderXRepoCommit = "X-Repo-Commit"
	HeaderXLinkedETag = "X-Linked-Etag"
	HeaderXLinkedSize = "X-Linked-Size"
)
View Source
const RepoIdSeparator = "--"

RepoIdSeparator is used to separate repository/model names parts when mapping to file names. Likely only for internal use.

Variables

View Source
var (
	// DefaultDirCreationPerm is used when creating new cache subdirectories.
	DefaultDirCreationPerm = os.FileMode(0755)

	// DefaultFileCreationPerm is used when creating files inside the cache subdirectories.
	DefaultFileCreationPerm = os.FileMode(0644)
)
View Source
var (
	RepoTypesUrlPrefixes = map[string]string{
		"dataset": "datasets/",
		"space":   "spaces/",
	}

	DefaultRevision = "main"

	HuggingFaceUrlTemplate = template.Must(template.New("hf_url").Parse(
		"https://huggingface.co/{{.RepoId}}/resolve/{{.Revision}}/{{.Filename}}"))
)
View Source
var SessionId string

Functions

func DefaultCacheDir

func DefaultCacheDir() string

DefaultCacheDir for HuggingFace Hub, same used by the python library.

Its prefix is either `${XDG_CACHE_HOME}` if set, or `~/.cache` otherwise. Followed by `/huggingface/hub/`. So typically: `~/.cache/huggingface/hub/`.

func Download

func Download(ctx context.Context, client *http.Client,
	repoId, repoType, revision, fileName, cacheDir, token string,
	forceDownload, forceLocal bool, progressFn ProgressFn) (filePath, commitHash string, err error)

Download returns file either from cache or by downloading from HuggingFace Hub.

TODO: a version with optional parameters.

Args:

  • `ctx` for the requests. There may be more than one request, the first being an `HEAD` HTTP.
  • `client` used to make HTTP requests. I can be created with `&httpClient{}`.
  • `repoId` and `fileName`: define the file and repository (model) name to download.
  • `repoType`: usually "model".
  • `revision`: default is "main", but a commitHash can be given.
  • `cacheDir`: directory where to store the downloaded files, or reuse if previously downloaded. Consider using the output from `DefaultCacheDir()` if in doubt.
  • `token`: used for authentication. TODO: not implemented yet.
  • `forceDownload`: if set to true, it will download the contents of the file even if there is a local copy.
  • `localOnly`: does not use network, not even for reading the metadata.
  • `progressFn`: is called during the download of a file. It is called synchronously and expected to be fast/ instantaneous. If the UI can be blocking, arrange it to be handled on a separate GoRoutine.

On success it returns the `filePath` to the downloaded file, and its `commitHash`. Otherwise it returns an error.

func FileExists

func FileExists(path string) bool

FileExists returns true if file or directory exists.

func GetHeaders

func GetHeaders(userAgent, token string) map[string]string

GetHeaders is based on the `build_hf_headers` function defined in the [huggingface_hub](https://github.com/huggingface/huggingface_hub) library. TODO: add support for authentication token.

func GetUrl

func GetUrl(repoId, fileName, repoType, revision string) string

GetUrl is based on the `hf_hub_url` function defined in the [huggingface_hub](https://github.com/huggingface/huggingface_hub) library.

func HttpUserAgent

func HttpUserAgent() string

HttpUserAgent returns a user agent to use with HuggingFace Hub API. Loosely based on https://github.com/huggingface/transformers/blob/main/src/transformers/utils/hub.py#L198.

func RepoFolderName

func RepoFolderName(repoId, repoType string) string

RepoFolderName returns a serialized version of a hf.co repo name and type, safe for disk storage as a single non-nested folder.

Based on github.com/huggingface/huggingface_hub repo_folder_name.

Types

type HFFileMetadata

type HFFileMetadata struct {
	CommitHash, ETag, Location string
	Size                       int
}

HFFileMetadata used by HuggingFace Hub.

type ProgressFn

type ProgressFn func(progress, downloaded, total int, eof bool)

ProgressFn is a function called while downloading a file. It will be called with `progress=0` and `downloaded=0` at the first call, when download starts.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL