file

package
v0.0.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 7, 2024 License: Apache-2.0 Imports: 14 Imported by: 42

Documentation

Overview

Package file provides basic file operations across multiple file-system types. It is designed for use in applications that operate uniformly on multiple storage types, such as local files, S3 and HTTP.

Overview

This package is designed with following goals:

- Support popular file systems, especially S3 and the local file system.

- Define operation semantics that are implementable on all the supported file systems, yet practical and usable.

- Extensible. Provide leeway to do things like registering new file system types or ticket-based authorizations.

This package defines two key interfaces, Implementation and File.

- Implementation provides filesystem operations, such as Open, Remove, and List (directory walking).

- File implements operations on a file. It is created by Implementation.{Open,Create} calls. File is similar to go's os.File object but provides limited functionality.

Reading and writing files

The following snippet shows registering an S3 implementation, then writing and reading a S3 file.

import (
 "context"
 "ioutil"

 "github.com/grailbio/base/file"
 "github.com/grailbio/base/file/s3file"    // file.Implementation implementation for S3
 "github.com/aws/aws-sdk-go/aws/session"
 "github.com/stretchr/testify/require"
)

func init() {
  file.RegisterImplementation("s3", s3file.NewImplementation(
    s3file.NewDefaultProvider()))
}

// Caution: this code ignores all errors.
func WriteTest() {
  ctx := context.Background()
  f, err := file.Create(ctx, "s3://grail-saito/tmp/test.txt")
  n, err = f.Writer(ctx).Write([]byte{"Hello"})
  err = f.Close(ctx)
}

func ReadTest() {
  ctx := context.Background()
  f, err := file.Open(ctx, "s3://grail-saito/tmp/test.txt")
  data, err := ioutil.ReadAll(f.Reader(ctx))
  err = f.Close(ctx)
}

To open a file for reading or writing, run file.Open("s3://bucket/key") or file.Create("s3://bucket/key"). A File object does not implement an io.Reader or io.Writer directly. Instead, you must call File.Reader or File.Writer to start reading or writing. These methods are split from the File itself so that an application can pass different contexts to different I/O operations.

File-system operations

The file package provides functions similar to those in the standard os class. For example, file.Remove("s3://bucket/key") removes a file, and file.Stat("s3://bucket/key") provides a metadata about the file.

Pathname utility functions

The file package also provides functions that are similar to those in the standard filepath package. Functions file.Base, file.Dir, file.Join work just like filepath.{Base,Dir,Join}, except that they handle the URL pathnames properly. For example, file.Join("s3://foo", "bar") will return "s3://foo/bar", whereas filepath.Join("s3://foo", "bar") would return "s3:/foo/bar".

Registering a filesystem implementation

Function RegisterImplementation associates an implementation to a scheme ("s3", "http", "git", etc). A local file system implementation is automatically available without any explicit registration. RegisterImplementation is usually invoked when a process starts up, for all the supported file system types. For example:

import (
 "ioutil"
 "github.com/grailbio/base/context"
 "github.com/grailbio/base/file"
 "github.com/grailbio/base/file/s3file"    // file.Implementation implementation for S3
)
func init() {
  file.RegisterImplementation("s3:", s3file.NewImplementation(...))
}
func main() {
  ctx := context.Background()
  f, err := file.Open(ctx, "s3://somebucket/foo.txt")
  data, err := ioutil.ReadAll(f.Reader(ctx))
  err := f.Close(ctx)
  ...
}

Once an implementation is registered, the files for that scheme can be opened or created using "scheme:name" pathname.

Differences from the os package

The file package is similar to Go's standard os package. The differences are the following.

- The file package focuses on providing a file-like API for object storage systems, such as S3 or GCS.

- Mutations to a File are restricted to whole-file writes. There is no option to overwrite a part of an existing file.

- All the operations take a context parameter.

- file.File does not implement io.Reader nor io.Writer directly. One must call File.Reader or File.Writer methods to obtains a reader or writer object.

- Directories are simulated in a best-effort manner on implementations that do not support directories as first-class entities, such as S3. Lister provides IsDir() for the current path. Info(path) returns nil for directories.

Concurrency

The Implementation and File provide an open-close consistency. More specifically, this package linearizes fileops, with a fileop defined in the following way: fileop is a set of operations, starting from Implementation.{Open,Create}, followed by read/write/stat operations on the file, followed by File.Close. Operations such as Implementation.{Stat,Remove,List} and Lister.Scan form a singleton fileop.

Caution: a local file system on NFS (w/o cache leasing) doesn't provide this guarantee. Use NFS at your own risk.

Example (Localfile)

Example_localfile is an example of basic read/write operations on the local file system.

package main

import (
	"context"
	"fmt"
	"io/ioutil"

	"github.com/grailbio/base/file"
)

func main() {
	doWrite := func(ctx context.Context, data []byte, path string) {
		out, err := file.Create(ctx, path)
		if err != nil {
			panic(err)
		}
		if _, err = out.Writer(ctx).Write(data); err != nil {
			panic(err)
		}
		if err := out.Close(ctx); err != nil {
			panic(err)
		}
	}

	doRead := func(ctx context.Context, path string) []byte {
		in, err := file.Open(ctx, path)
		if err != nil {
			panic(err)
		}
		data, err := ioutil.ReadAll(in.Reader(ctx))
		if err != nil {
			panic(err)
		}
		if err := in.Close(ctx); err != nil {
			panic(err)
		}
		return data
	}

	ctx := context.Background()
	doWrite(ctx, []byte("Blue box jumped over red bat"), "/tmp/foohah.txt")
	fmt.Printf("Got: %s\n", string(doRead(ctx, "/tmp/foohah.txt")))
}
Output:

Got: Blue box jumped over red bat

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Base

func Base(path string) string

Base returns the last element of the path. It is the same as filepath.Base for a local filesystem path. Else, it acts like filepath.Base, with the following differences: (1) the path separator is always '/'. (2) if the URL suffix is empty, it returns the path itself.

Example:

file.Base("s3://") returns "s3://".
file.Base("s3://foo/hah/") returns "hah".
Example
package main

import (
	"fmt"

	"github.com/grailbio/base/file"
)

func main() {
	fmt.Println(file.Base(""))
	fmt.Println(file.Base("foo1"))
	fmt.Println(file.Base("foo2/"))
	fmt.Println(file.Base("/"))
	fmt.Println(file.Base("s3://"))
	fmt.Println(file.Base("s3://blah1"))
	fmt.Println(file.Base("s3://blah2/"))
	fmt.Println(file.Base("s3://foo/blah3//"))
}
Output:

.
foo1
foo2
/
s3://
blah1
blah2
blah3

func CloseAndReport deprecated

func CloseAndReport(ctx context.Context, f Closer, err *error)

CloseAndReport returns a defer-able helper that calls f.Close and reports errors, if any, to *err. Pass your function's named return error. Example usage:

func processFile(filename string) (_ int, err error) {
  ctx := context.Background()
  f, err := file.Open(ctx, filename)
  if err != nil { ... }
  defer file.CloseAndReport(ctx, f, &err)
  ...
}

If your function returns with an error, any f.Close error will be chained appropriately.

Deprecated: Use errors.CleanUpCtx directly.

func Dir

func Dir(path string) string

Dir returns the all but the last element of the path. It the same as filepath.Dir for a local filesystem path. Else, it acts like filepath.Base, with the following differences: (1) the path separator is always '/'. (2) if the URL suffix is empty, it returns the path itself. (3) The path is not cleaned; for example repeated "/"s in the path is preserved.

Example
package main

import (
	"fmt"

	"github.com/grailbio/base/file"
)

func main() {
	fmt.Println(file.Dir("foo"))
	fmt.Println(file.Dir("."))
	fmt.Println(file.Dir("/a/b"))
	fmt.Println(file.Dir("a/b"))
	fmt.Println(file.Dir("s3://ab/cd"))
	fmt.Println(file.Dir("s3://ab//cd"))
	fmt.Println(file.Dir("s3://a/b/"))
	fmt.Println(file.Dir("s3://a/b//"))
	fmt.Println(file.Dir("s3://a//b//"))
	fmt.Println(file.Dir("s3://a"))
}
Output:

.
.
/a
a
s3://ab
s3://ab
s3://a/b
s3://a/b
s3://a//b
s3://

func IsAbs

func IsAbs(path string) bool

IsAbs returns true if pathname is absolute local path. For non-local file, it always returns true.

Example
package main

import (
	"fmt"

	"github.com/grailbio/base/file"
)

func main() {
	fmt.Println(file.IsAbs("foo"))
	fmt.Println(file.IsAbs("/foo"))
	fmt.Println(file.IsAbs("s3://foo"))
}
Output:

false
true
true

func Join

func Join(elems ...string) string

Join joins any number of path elements into a single path, adding a separator if necessary. It works like filepath.Join, with the following differences:

  1. The path separator is always '/' (so this doesn't work on Windows).
  2. The interior of each element is not cleaned; for example if an element contains repeated "/"s in the middle, they are preserved.
  3. If elems[0] has a prefix of the form "<scheme>://" or "//", that prefix is retained. (A prefix of "/" is also retained; that matches filepath.Join's behavior.)
Example
package main

import (
	"fmt"

	"github.com/grailbio/base/file"
)

func main() {
	fmt.Println(file.Join())
	fmt.Println(file.Join(""))
	fmt.Println(file.Join("foo", "bar"))
	fmt.Println(file.Join("foo", ""))
	fmt.Println(file.Join("foo", "/bar/"))
	fmt.Println(file.Join(".", "foo:bar"))
	fmt.Println(file.Join("s3://foo"))
	fmt.Println(file.Join("s3://foo", "/bar/"))
	fmt.Println(file.Join("s3://foo", "", "bar"))
	fmt.Println(file.Join("s3://foo", "0"))
	fmt.Println(file.Join("s3://foo", "abc"))
	fmt.Println(file.Join("s3://foo//bar", "/", "/baz"))
}
Output:

foo/bar
foo
foo/bar
./foo:bar
s3://foo
s3://foo/bar
s3://foo/bar
s3://foo/0
s3://foo/abc
s3://foo//bar/baz

func MustClose added in v0.0.2

func MustClose(ctx context.Context, f Closer)

MustClose is a defer-able function that calls f.Close and panics on error.

Example:

ctx := context.Background()
f, err := file.Open(ctx, filename)
if err != nil { panic(err) }
defer file.MustClose(ctx, f)
...

func MustParsePath

func MustParsePath(path string) (scheme, suffix string)

MustParsePath is similar to ParsePath, but crashes the process on error.

func ParsePath

func ParsePath(path string) (scheme, suffix string, err error)

ParsePath parses "path" and find the namespace object that can handle the path. The path can be of form either "scheme://path" just "path0/.../pathN". The latter indicates a local file.

On success, "schema" will be the schema part of the path. "suffix" will be the path part after the scheme://. For example, ParsePath("s3://key/bucket") will return ("s3", "key/bucket", nil).

For a local-filesystem path, this function returns ("", path, nil).

Example
package main

import (
	"fmt"

	"github.com/grailbio/base/file"
)

func main() {
	parse := func(path string) {
		scheme, suffix, err := file.ParsePath(path)
		if err != nil {
			fmt.Printf("%s 🢥 error %v\n", path, err)
			return
		}
		fmt.Printf("%s 🢥 scheme \"%s\", suffix \"%s\"\n", path, scheme, suffix)
	}
	parse("/tmp/test")
	parse("foo://bar")
	parse("foo:///bar")
	parse("foo:bar")
	parse("/foo:bar")
}
Output:

/tmp/test 🢥 scheme "", suffix "/tmp/test"
foo://bar 🢥 scheme "foo", suffix "bar"
foo:///bar 🢥 scheme "foo", suffix "/bar"
foo:bar 🢥 error parsepath foo:bar: a URL must start with 'scheme://'
/foo:bar 🢥 scheme "", suffix "/foo:bar"

func Presign added in v0.0.2

func Presign(ctx context.Context, path, method string, expiry time.Duration) (string, error)

Presign is a shortcut for calling ParsePath(), then calling Implementation.Presign method.

func ReadFile

func ReadFile(ctx context.Context, path string, opts ...Opts) ([]byte, error)

ReadFile reads the given file and returns the contents. A successful call returns err == nil, not err == EOF. Arg opts is passed to file.Open.

func RegisterImplementation

func RegisterImplementation(scheme string, implFactory func() Implementation)

RegisterImplementation arranges so that ParsePath(schema + "://anystring") will return (impl, "anystring", nil) in the future. Schema is a string such as "s3", "http".

RegisterImplementation() should generally be called when the process starts. implFactory will be invoked exactly once, upon the first request to this scheme; this allows you to register with a factory that has not yet been full configured (e.g., it requires parsing command line flags) as long as it will be configured before the first request.

REQUIRES: This function has not been called with the same schema before.

func Remove

func Remove(ctx context.Context, path string) error

Remove is a shortcut for calling ParsePath(), then calling Implementation.Remove method.

func RemoveAll

func RemoveAll(ctx context.Context, path string) error

RemoveAll removes path and any children it contains. It is unspecified whether empty directories are removed by this function. It removes everything it can but returns the first error it encounters. If the path does not exist, RemoveAll returns nil.

func WriteFile

func WriteFile(ctx context.Context, path string, data []byte) error

WriteFile writes data to the given file. If the file does not exist, WriteFile creates it; otherwise WriteFile truncates it before writing.

Types

type Closer

type Closer = ioctx.Closer

TODO: Migrate callers to use new location.

type ETagged added in v0.0.7

type ETagged interface {
	// ETag is an identifier assigned to a specific version of the file.
	ETag() string
}

ETagged defines a getter for a file with an ETag.

type Error

type Error struct {
	// contains filtered or unexported fields
}

Error implements io.{Reader,Writer,Seeker,Closer}. It returns the given error to any call.

func NewError

func NewError(err error) *Error

NewError returns a new Error object that returns the given error to any Read/Write/Seek/Close call.

func (*Error) Close

func (r *Error) Close() error

Close implements io.Closer.

func (*Error) Read

func (r *Error) Read([]byte) (int, error)

Read implements io.Reader

func (*Error) Seek

func (r *Error) Seek(int64, int) (int64, error)

Seek implements io.Seeker.

func (*Error) Write

func (r *Error) Write([]byte) (int, error)

Write implements io.Writer.

type File

type File interface {
	// String returns a diagnostic string.
	String() string

	// Name returns the path name given to file.Open or file.Create when this
	// object was created.
	Name() string

	// Stat returns file metadata.
	//
	// REQUIRES: Close has not been called
	Stat(ctx context.Context) (Info, error)

	// Reader creates an io.ReadSeeker object that operates on the file.  If
	// Reader() is called multiple times, they share the seek pointer.
	//
	// For emphasis: these share state, which is different from OffsetReader!
	//
	// REQUIRES: Close has not been called
	Reader(ctx context.Context) io.ReadSeeker

	// OffsetReader creates a new, independent ioctx.ReadCloser, starting at
	// offset. Unlike Reader, its position in the file is only modified by Read
	// on this object. The returned object is not thread-safe, and callers are
	// responsible for serializing all of their calls, including calling Close
	// after all Reads are done. Of course, callers can use separate
	// OffsetReaders in parallel.
	//
	// Background: This API reflects S3's performance characteristics, where
	// initiating a new read position is relatively expensive, but then
	// streaming data is fast (including in parallel with multiple readers).
	//
	// REQUIRES: Close has not been called
	OffsetReader(offset int64) ioctx.ReadCloser

	// Writer creates a writes that to the file. If Writer() is called multiple
	// times, they share the seek pointer.
	//
	// REQUIRES: Close has not been called
	Writer(ctx context.Context) io.Writer

	// Discard discards a file before it is closed, relinquishing any
	// temporary resources implied by pending writes. This should be
	// used if the caller decides not to complete writing the file.
	// Discard is a best-effort operation. Discard is not defined for
	// files opened for reading. Exactly one of Discard or Close should
	// be called. No other File, io.ReadSeeker, or io.Writer methods
	// shall be called after Discard.
	Discard(ctx context.Context)

	// Closer commits the contents of a written file, invalidating the
	// File and all Readers and Writers created from the file. Exactly
	// one of Discard or Close should be called. No other File or
	// io.ReadSeeker, io.Writer methods shall be called after Close.
	Closer
}

File defines operations on a file. Implementations must be thread safe.

func Create

func Create(ctx context.Context, path string, opts ...Opts) (File, error)

Create opens the given file writeonly. It is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Create.

func Open

func Open(ctx context.Context, path string, opts ...Opts) (File, error)

Open opens the given file readonly. It is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Open.

Open returns an error of kind errors.NotExist if the file at the provided path does not exist.

type Implementation

type Implementation interface {
	// String returns a diagnostic string.
	String() string

	// Open opens a file for reading. The pathname given to file.Open() is passed
	// here unchanged. Thus, it contains the URL prefix such as "s3://".
	//
	// Open returns an error of kind errors.NotExist if there is
	// no file at the provided path.
	Open(ctx context.Context, path string, opts ...Opts) (File, error)

	// Create opens a file for writing. If "path" already exists, the old contents
	// will be destroyed. If "path" does not exist already, the file will be newly
	// created. The pathname given to file.Create() is passed here unchanged.
	// Thus, it contains the URL prefix such as "s3://".
	//
	// Creating a file with the same name as an existing directory is unspecified
	// behavior and varies by implementation. Users are thus advised to avoid
	// this if possible.
	//
	// For filesystem based storage engines (e.g. localfile), if the directory
	// part of the path does not exist already, it will be created. If the path
	// is a directory, an error will be returned.
	//
	// For key based storage engines (e.g. S3), it is OK to create a file that
	// already exists as a common prefix for other objects, assuming a pseudo
	// path separator. So both "foo" and "foo/bar" can be used as paths for
	// creating regular files in the same storage. See List() for more context.
	Create(ctx context.Context, path string, opts ...Opts) (File, error)

	// List finds files and directories. If "path" points to a regular file, the
	// lister will return information about the file itself and finishes.
	//
	// If "path" is a directory, the lister will list file and directory under the
	// given path.  When "recursive" is set to false, List finds files "one level"
	// below dir.  Dir may end in /, but need not.  All the files and directories
	// returned by the lister will have pathnames of the form dir/something.
	//
	// For key based storage engines (e.g. S3), a dir prefix not ending in "/" must
	// be followed immediately by "/" in some object keys, and only such keys
	// will be returned.
	// With "recursive=true" List finds all files whose pathnames under "dir" or its
	// subdirectories.  All the files returned by the lister will have pathnames of
	// the form dir/something.  Directories will not be returned as separate entities.
	// For example List(ctx, "foo",true) will yield "foo/bar/bat.txt", but not "foo.txt"
	// or "foo/bar/", while List(ctx, "foo", false) will yield "foo/bar", and
	// "foo/bat.txt", but not "foo.txt" or "foo/bar/bat.txt".  There is no difference
	// in the return value of List(ctx, "foo", ...) and List(ctx, "foo/", ...)
	List(ctx context.Context, path string, recursive bool) Lister

	// Stat returns the file metadata.  It returns nil if path is
	// a directory. (There is no direct test for existence of a
	// directory.)
	//
	// Stat returns an error of kind errors.NotExist if there is
	// no file at the provided path.
	Stat(ctx context.Context, path string, opts ...Opts) (Info, error)

	// Remove removes the file. The path passed to file.Remove() is passed here
	// unchanged.
	Remove(ctx context.Context, path string) error

	// Presign returns a URL that can be used to perform the given HTTP method,
	// usually one of "GET", "PUT" or "DELETE", on the path for the duration
	// specified in expiry.
	//
	// It returns an error of kind errors.NotSupported for implementations that
	// do not support signed URLs, or that do not support the given HTTP method.
	//
	// Unlike Open and Stat, this method does not return an error of kind
	// errors.NotExist if there is no file at the provided path.
	Presign(ctx context.Context, path, method string, expiry time.Duration) (url string, err error)
}

Implementation implements operations for a file-system type. Thread safe.

func FindImplementation

func FindImplementation(scheme string) Implementation

FindImplementation returns an Implementation object registered for the given scheme. It returns nil if the scheme is not registered.

func NewLocalImplementation

func NewLocalImplementation() Implementation

NewLocalImplementation returns a new file.Implementation for the local file system that uses Go's native "os" module. This function is only for unittests. Applications should use functions such as file.Open, file.Create to access the local file system.

type Info

type Info interface {
	// Size returns the length of the file in bytes for regular files; system-dependent for others
	Size() int64
	// ModTime returns modification time for regular files; system-dependent for others
	ModTime() time.Time
}

Info represents file metadata.

func Stat

func Stat(ctx context.Context, path string, opts ...Opts) (Info, error)

Stat returns the give file's metadata. Is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Stat.

Stat returns an error of kind errors.NotExist if the file at the provided path does not exist.

type Lister

type Lister interface {
	// Scan advances the lister to the next entry.  It returns
	// false either when the scan stops because we have reached the end of the input
	// or else because there was error.  After Scan returns, the Err method returns
	// any error that occurred during scanning.
	Scan() bool

	// Err returns the first error that occurred while scanning.
	Err() error

	// Path returns the last path that was scanned. The path always starts with
	// the directory path given to the List method.
	//
	// REQUIRES: Last call to Scan returned true.
	Path() string

	// IsDir() returns true if Path() refers to a directory in a file system
	// or a common prefix ending in "/" in S3.
	//
	// REQUIRES: Last call to Scan returned true.
	IsDir() bool

	// Info returns metadata of the file that was scanned.
	//
	// REQUIRES: Last call to Scan returned true.
	Info() Info
}

Lister lists files in a directory tree. Not thread safe.

func List

func List(ctx context.Context, prefix string, recursive bool) Lister

List finds all files whose pathnames under "dir" or its subdirectories. All the files returned by the lister will have pathnames of form dir/something. For example List(ctx, "foo") will yield "foo/bar.txt", but not "foo.txt".

Example: impl.List(ctx, "s3://grail-data/foo")

type Opts

type Opts struct {
	// When set, this flag causes the file package to keep retrying when the file
	// is reported as not found. This flag should be set when:
	//
	// 1. you are accessing a file on S3, and
	//
	// 2. an application may have attempted to GET the same file in recent past
	// (~5 minutes). The said application may be on a different machine.
	//
	// This flag is honored only by S3 to work around the problem where s3 may
	// report spurious KeyNotFound error after a GET request to the same file.
	// For more details, see
	// https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#CoreConcepts,
	// section "S3 Data Consistency Model". In particular:
	//
	//   The caveat is that if you make a HEAD or GET request to the key
	//   name (to find if the object exists) before creating the object, Amazon S3
	//   provides eventual consistency for read-after-write.
	RetryWhenNotFound bool

	// When set, Close will ignore NoSuchUpload error from S3
	// CompleteMultiPartUpload and silently returns OK.
	//
	// This is to work around a bug where concurrent uploads to one file sometimes
	// causes an upload request to be lost on the server side.
	// https://console.aws.amazon.com/support/cases?region=us-west-2#/6299905521/en
	// https://github.com/yasushi-saito/s3uploaderror
	//
	// Set this flag only if:
	//
	//  1. you are writing to a file on S3, and
	//
	//  2. possible concurrent writes to the same file produce the same
	//  contents, so you are ok with taking any of them.
	//
	// If you don't set this flag, then concurrent writes to the same file may
	// fail with a NoSuchUpload error, and it is up to you to retry.
	//
	// On non-S3 file systems, this flag is ignored.
	IgnoreNoSuchUpload bool
}

Opts controls the file access requests, such as Open and Stat.

Directories

Path Synopsis
fsnode represents a filesystem as a directed graph (probably a tree for many implementations).
fsnode represents a filesystem as a directed graph (probably a tree for many implementations).
fsnodefuse implements github.com/hanwen/go-fuse/v2/fs for fsnode.T. It's a work-in-progress.
fsnodefuse implements github.com/hanwen/go-fuse/v2/fs for fsnode.T. It's a work-in-progress.
internal
Package s3file implements grail file interface for S3.
Package s3file implements grail file interface for S3.
internal/cmd/resolvetest
resolvetest simply resolves a hostname at an increasing time interval to observe the diversity in DNS lookup addresses for the host.
resolvetest simply resolves a hostname at an increasing time interval to observe the diversity in DNS lookup addresses for the host.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL