Documentation

Overview

Package file provides basic file operations across multiple file-system types. It is designed for use in applications that operate uniformly on multiple storage types, such as local files, S3 and HTTP.

Overview

This package is designed with following goals:

- Support popular file systems, especially S3 and the local file system.

- Define operation semantics that are implementable on all the supported file systems, yet practical and usable.

- Extensible. Provide leeway to do things like registering new file system types or ticket-based authorizations.

This package defines two key interfaces, Implementation and File.

- Implementation provides filesystem operations, such as Open, Remove, and List (directory walking).

- File implements operations on a file. It is created by Implementation.{Open,Create} calls. File is similar to go's os.File object but provides limited functionality.

Reading and writing files

The following snippet shows registering an S3 implementation, then writing and reading a S3 file.

import (
 "context"
 "ioutil"

 "github.com/grailbio/base/file"
 "github.com/grailbio/base/file/s3file"    // file.Implementation implementation for S3
 "github.com/aws/aws-sdk-go/aws/session"
 "github.com/stretchr/testify/require"
)

func init() {
  file.RegisterImplementation("s3", s3file.NewImplementation(
    s3file.NewDefaultProvider(session.Options{})))
}

// Caution: this code ignores all errors.
func WriteTest() {
  ctx := context.Background()
  f, err := file.Create(ctx, "s3://grail-saito/tmp/test.txt")
  n, err = f.Writer(ctx).Write([]byte{"Hello"})
  err = f.Close(ctx)
}

func ReadTest() {
  ctx := context.Background()
  f, err := file.Open(ctx, "s3://grail-saito/tmp/test.txt")
  data, err := ioutil.ReadAll(f.Reader(ctx))
  err = f.Close(ctx)
}

To open a file for reading or writing, run file.Open("s3://bucket/key") or file.Create("s3://bucket/key"). A File object does not implement an io.Reader or io.Writer directly. Instead, you must call File.Reader or File.Writer to start reading or writing. These methods are split from the File itself so that an application can pass different contexts to different I/O operations.

File-system operations

The file package provides functions similar to those in the standard os class. For example, file.Remove("s3://bucket/key") removes a file, and file.Stat("s3://bucket/key") provides a metadata about the file.

Pathname utility functions

The file package also provides functions that are similar to those in the standard filepath package. Functions file.Base, file.Dir, file.Join work just like filepath.{Base,Dir,Join}, except that they handle the URL pathnames properly. For example, file.Join("s3://foo", "bar") will return "s3://foo/bar", whereas filepath.Join("s3://foo", "bar") would return "s3:/foo/bar".

Registering a filesystem implementation

Function RegisterImplementation associates an implementation to a scheme ("s3", "http", "git", etc). A local file system implementation is automatically available without any explicit registration. RegisterImplementation is usually invoked when a process starts up, for all the supported file system types. For example:

import (
 "ioutil"
 "github.com/grailbio/base/context"
 "github.com/grailbio/base/file"
 "github.com/grailbio/base/file/s3file"    // file.Implementation implementation for S3
)
func init() {
  file.RegisterImplementation("s3:", s3file.NewImplementation(...))
}
func main() {
  ctx := context.Background()
  f, err := file.Open(ctx, "s3://somebucket/foo.txt")
  data, err := ioutil.ReadAll(f.Reader(ctx))
  err := f.Close(ctx)
  ...
}

Once an implementation is registered, the files for that scheme can be opened or created using "scheme:name" pathname.

Differences from the os package

The file package is similar to Go's standard os package. The differences are the following.

- The file package focuses on providing a file-like API for object storage systems, such as S3 or GCS.

- Mutations to a File are restricted to whole-file writes. There is no option to overwrite a part of an existing file.

- All the operations take a context parameter.

- file.File does not implement io.Reader nor io.Writer directly. One must call File.Reader or File.Writer methods to obtains a reader or writer object.

- Directories are simulated in a best-effort manner on implementations that do not support directories as first-class entities, such as S3. Lister provides IsDir() for the current path. Info(path) returns nil for directories.

Concurrency

The Implementation and File provide an open-close consistency. More specifically, this package linearizes fileops, with a fileop defined in the following way: fileop is a set of operations, starting from Implementation.{Open,Create}, followed by read/write/stat operations on the file, followed by File.Close. Operations such as Implementation.{Stat,Remove,List} and Lister.Scan form a singleton fileop.

Caution: a local file system on NFS (w/o cache leasing) doesn't provide this guarantee. Use NFS at your own risk.

Example (Localfile)

    Example_localfile is an example of basic read/write operations on the local file system.

    Output:
    
    Got: Blue box jumped over red bat
    

    Index

    Examples

    Constants

    This section is empty.

    Variables

    This section is empty.

    Functions

    func Base

    func Base(path string) string

      Base returns the last element of the path. It is the same as filepath.Base for a local filesystem path. Else, it acts like filepath.Base, with the following differences: (1) the path separator is always '/'. (2) if the URL suffix is empty, it returns the path itself.

      Example:

      file.Base("s3://") returns "s3://".
      file.Base("s3://foo/hah/") returns "hah".
      
      Example
      Output:
      
      .
      foo1
      foo2
      /
      s3://
      blah1
      blah2
      blah3
      

      func CloseAndReport

      func CloseAndReport(ctx context.Context, f Closer, err *error)

        CloseAndReport returns a defer-able helper that calls f.Close and reports errors, if any, to *err. Pass your function's named return error. Example usage:

        func processFile(filename string) (_ int, err error) {
          ctx := context.Background()
          f, err := file.Open(ctx, filename)
          if err != nil { ... }
          defer file.CloseAndReport(ctx, f, &err)
          ...
        }
        

        If your function returns with an error, any f.Close error will be chained appropriately.

        func Dir

        func Dir(path string) string

          Dir returns the all but the last element of the path. It the same as filepath.Dir for a local filesystem path. Else, it acts like filepath.Base, with the following differences: (1) the path separator is always '/'. (2) if the URL suffix is empty, it returns the path itself. (3) The path is not cleaned; for example repeated "/"s in the path is preserved.

          Example
          Output:
          
          .
          .
          /a
          a
          s3://ab
          s3://ab
          s3://a/b
          s3://a/b
          s3://a//b
          s3://
          

          func IsAbs

          func IsAbs(path string) bool

            IsAbs returns true if pathname is absolute local path. For non-local file, it always returns true.

            Example
            Output:
            
            false
            true
            true
            

            func Join

            func Join(elems ...string) string

              Join joins any number of path elements into a single path, adding a separator if necessary. It is the same as filepath.Join if elems[0] is a local filesystem path. Else, it works like filepath.Join, with the following differences: (1) the path separator is always '/'. (2) Each element is not cleaned; for example if an element contains repeated "/"s in the middle, they are preserved.

              Example
              Output:
              
              foo/bar
              foo
              foo/bar
              ./foo:bar
              s3://foo
              s3://foo/bar
              s3://foo/bar
              s3://foo/0
              s3://foo/abc
              s3://foo//bar/baz
              

              func MustClose

              func MustClose(ctx context.Context, f Closer)

                MustClose is a defer-able function that calls f.Close and panics on error.

                Example:

                ctx := context.Background()
                f, err := file.Open(ctx, filename)
                if err != nil { panic(err) }
                defer file.MustClose(ctx, f)
                ...
                

                func MustParsePath

                func MustParsePath(path string) (scheme, suffix string)

                  MustParsePath is similar to ParsePath, but crashes the process on error.

                  func ParsePath

                  func ParsePath(path string) (scheme, suffix string, err error)

                    ParsePath parses "path" and find the namespace object that can handle the path. The path can be of form either "scheme://path" just "path0/.../pathN". The latter indicates a local file.

                    On success, "schema" will be the schema part of the path. "suffix" will be the path part after the scheme://. For example, ParsePath("s3://key/bucket") will return ("s3", "key/bucket", nil).

                    For a local-filesystem path, this function returns ("", path, nil).

                    Example
                    Output:
                    
                    /tmp/test 🢥 scheme "", suffix "/tmp/test"
                    foo://bar 🢥 scheme "foo", suffix "bar"
                    foo:///bar 🢥 scheme "foo", suffix "/bar"
                    foo:bar 🢥 error parsepath foo:bar: a URL must start with 'scheme://'
                    /foo:bar 🢥 scheme "", suffix "/foo:bar"
                    

                    func Presign

                    func Presign(ctx context.Context, path, method string, expiry time.Duration) (string, error)

                      Presign is a shortcut for calling ParsePath(), then calling Implementation.Presign method.

                      func ReadFile

                      func ReadFile(ctx context.Context, path string, opts ...Opts) ([]byte, error)

                        ReadFile reads the given file and returns the contents. A successful call returns err == nil, not err == EOF. Arg opts is passed to file.Open.

                        func RegisterImplementation

                        func RegisterImplementation(scheme string, implFactory func() Implementation)

                          RegisterImplementation arranges so that ParsePath(schema + "://anystring") will return (impl, "anystring", nil) in the future. Schema is a string such as "s3", "http".

                          RegisterImplementation() should generally be called when the process starts. implFactory will be invoked exactly once, upon the first request to this scheme; this allows you to register with a factory that has not yet been full configured (e.g., it requires parsing command line flags) as long as it will be configured before the first request.

                          REQUIRES: This function has not been called with the same schema before.

                          func Remove

                          func Remove(ctx context.Context, path string) error

                            Remove is a shortcut for calling ParsePath(), then calling Implementation.Remove method.

                            func RemoveAll

                            func RemoveAll(ctx context.Context, path string) error

                              RemoveAll removes path and any children it contains. It is unspecified whether empty directories are removed by this function. It removes everything it can but returns the first error it encounters. If the path does not exist, RemoveAll returns nil.

                              func WriteFile

                              func WriteFile(ctx context.Context, path string, data []byte) error

                                WriteFile writes data to the given file. If the file does not exist, WriteFile creates it; otherwise WriteFile truncates it before writing.

                                Types

                                type Closer

                                type Closer interface {
                                	// Close tries to clean up the resource. Implementations can define whether
                                	// Close can be called more than once and whether callers should retry on error.
                                	Close(context.Context) error
                                }

                                  Closer cleans up a resource. Generally, resource provider implementations will return a Closer when opening a resource (like File above).

                                  type ETagged

                                  type ETagged interface {
                                  	// ETag is an identifier assigned to a specific version of the file.
                                  	ETag() string
                                  }

                                    ETagged defines a getter for a file with an ETag.

                                    type Error

                                    type Error struct {
                                    	// contains filtered or unexported fields
                                    }

                                      Error implements io.{Reader,Writer,Seeker,Closer}. It returns the given error to any call.

                                      func NewError

                                      func NewError(err error) *Error

                                        NewError returns a new Error object that returns the given error to any Read/Write/Seek/Close call.

                                        func (*Error) Close

                                        func (r *Error) Close() error

                                          Close implements io.Closer.

                                          func (*Error) Read

                                          func (r *Error) Read([]byte) (int, error)

                                            Read implements io.Reader

                                            func (*Error) Seek

                                            func (r *Error) Seek(int64, int) (int64, error)

                                              Seek implements io.Seeker.

                                              func (*Error) Write

                                              func (r *Error) Write([]byte) (int, error)

                                                Write implements io.Writer.

                                                type File

                                                type File interface {
                                                	// String returns a diagnostic string.
                                                	String() string
                                                
                                                	// Name returns the path name given to file.Open or file.Create when this
                                                	// object was created.
                                                	Name() string
                                                
                                                	// Stat returns file metadata.
                                                	//
                                                	// REQUIRES: Close has not been called
                                                	Stat(ctx context.Context) (Info, error)
                                                
                                                	// Reader creates an io.ReadSeeker object that operates on the file.  If
                                                	// Reader() is called multiple times, they share the seek pointer.
                                                	//
                                                	// REQUIRES: Close has not been called
                                                	Reader(ctx context.Context) io.ReadSeeker
                                                
                                                	// Writer creates a writes that to the file. If Writer() is called multiple
                                                	// times, they share the seek pointer.
                                                	//
                                                	// REQUIRES: Close has not been called
                                                	Writer(ctx context.Context) io.Writer
                                                
                                                	// Discard discards a file before it is closed, relinquishing any
                                                	// temporary resources implied by pending writes. This should be
                                                	// used if the caller decides not to complete writing the file.
                                                	// Discard is a best-effort operation. Discard is not defined for
                                                	// files opened for reading. Exactly one of Discard or Close should
                                                	// be called. No other File, io.ReadSeeker, or io.Writer methods
                                                	// shall be called after Discard.
                                                	Discard(ctx context.Context)
                                                
                                                	// Closer commits the contents of a written file, invalidating the
                                                	// File and all Readers and Writers created from the file. Exactly
                                                	// one of Discard or Close should be called. No other File or
                                                	// io.ReadSeeker, io.Writer methods shall be called after Close.
                                                	Closer
                                                }

                                                  File defines operations on a file. Implementations must be thread safe.

                                                  func Create

                                                  func Create(ctx context.Context, path string, opts ...Opts) (File, error)

                                                    Create opens the given file writeonly. It is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Create.

                                                    func Open

                                                    func Open(ctx context.Context, path string, opts ...Opts) (File, error)

                                                      Open opens the given file readonly. It is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Open.

                                                      Open returns an error of kind errors.NotExist if the file at the provided path does not exist.

                                                      type Implementation

                                                      type Implementation interface {
                                                      	// String returns a diagnostic string.
                                                      	String() string
                                                      
                                                      	// Open opens a file for reading. The pathname given to file.Open() is passed
                                                      	// here unchanged. Thus, it contains the URL prefix such as "s3://".
                                                      	//
                                                      	// Open returns an error of kind errors.NotExist if there is
                                                      	// no file at the provided path.
                                                      	Open(ctx context.Context, path string, opts ...Opts) (File, error)
                                                      
                                                      	// Create opens a file for writing. If "path" already exists, the old contents
                                                      	// will be destroyed. If "path" does not exist already, the file will be newly
                                                      	// created.  If the directory part of the path does not exist already, it will
                                                      	// be created. The pathname given to file.Open() is passed here unchanged.
                                                      	// Thus, it contains the URL prefix such as "s3://".
                                                      	Create(ctx context.Context, path string, opts ...Opts) (File, error)
                                                      
                                                      	// List finds files and directories. If "path" points to a regular file, the
                                                      	// lister will return information about the file itself and finishes.
                                                      	//
                                                      	// If "path" is a directory, the lister will list file and directory under the
                                                      	// given path.  When "recursive" is set to false, List finds files "one level"
                                                      	// below dir.  Dir may end in /, but need not.  All the files and directories
                                                      	// returned by the lister will have pathnames of the form dir/something.
                                                      	//
                                                      	// For key based storage engines (e.g. S3), a dir prefix not ending in "/" must
                                                      	// be followed immediately by "/" in some object keys, and only such keys
                                                      	// will be returned.
                                                      	// With "recursive=true" List finds all files whose pathnames under "dir" or its
                                                      	// subdirectories.  All the files returned by the lister will have pathnames of
                                                      	// the form dir/something.  Directories will not be returned as separate entities.
                                                      	// For example List(ctx, "foo",true) will yield "foo/bar/bat.txt", but not "foo.txt"
                                                      	// or "foo/bar/", while List(ctx, "foo", false) will yield "foo/bar", and
                                                      	// "foo/bat.txt", but not "foo.txt" or "foo/bar/bat.txt".  There is no difference
                                                      	// in the return value of List(ctx, "foo", ...) and List(ctx, "foo/", ...)
                                                      	List(ctx context.Context, path string, recursive bool) Lister
                                                      
                                                      	// Stat returns the file metadata.  It returns nil if path is
                                                      	// a directory. (There is no direct test for existence of a
                                                      	// directory.)
                                                      	//
                                                      	// Stat returns an error of kind errors.NotExist if there is
                                                      	// no file at the provided path.
                                                      	Stat(ctx context.Context, path string, opts ...Opts) (Info, error)
                                                      
                                                      	// Remove removes the file. The path passed to file.Remove() is passed here
                                                      	// unchanged.
                                                      	Remove(ctx context.Context, path string) error
                                                      
                                                      	// Presign returns a URL that can be used to perform the given HTTP method,
                                                      	// usually one of "GET", "PUT" or "DELETE", on the path for the duration
                                                      	// specified in expiry.
                                                      	//
                                                      	// It returns an error of kind errors.NotSupported for implementations that
                                                      	// do not support signed URLs, or that do not support the given HTTP method.
                                                      	//
                                                      	// Unlike Open and Stat, this method does not return an error of kind
                                                      	// errors.NotExist if there is no file at the provided path.
                                                      	Presign(ctx context.Context, path, method string, expiry time.Duration) (url string, err error)
                                                      }

                                                        Implementation implements operations for a file-system type. Thread safe.

                                                        func FindImplementation

                                                        func FindImplementation(scheme string) Implementation

                                                          FindImplementation returns an Implementation object registered for the given scheme. It returns nil if the scheme is not registered.

                                                          func NewLocalImplementation

                                                          func NewLocalImplementation() Implementation

                                                            NewLocalImplementation returns a new file.Implementation for the local file system that uses Go's native "os" module. This function is only for unittests. Applications should use functions such as file.Open, file.Create to access the local file system.

                                                            type Info

                                                            type Info interface {
                                                            	// Size returns the length of the file in bytes for regular files; system-dependent for others
                                                            	Size() int64
                                                            	// ModTime returns modification time for regular files; system-dependent for others
                                                            	ModTime() time.Time
                                                            }

                                                              Info represents file metadata.

                                                              func Stat

                                                              func Stat(ctx context.Context, path string, opts ...Opts) (Info, error)

                                                                Stat returns the give file's metadata. Is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Stat.

                                                                Stat returns an error of kind errors.NotExist if the file at the provided path does not exist.

                                                                type Lister

                                                                type Lister interface {
                                                                	// Scan advances the lister to the next entry.  It returns
                                                                	// false either when the scan stops because we have reached the end of the input
                                                                	// or else because there was error.  After Scan returns, the Err method returns
                                                                	// any error that occurred during scanning.
                                                                	Scan() bool
                                                                
                                                                	// Err returns the first error that occurred while scanning.
                                                                	Err() error
                                                                
                                                                	// Path returns the last path that was scanned. The path always starts with
                                                                	// the directory path given to the List method.
                                                                	//
                                                                	// REQUIRES: Last call to Scan returned true.
                                                                	Path() string
                                                                
                                                                	// IsDir() returns true if Path() refers to a directory in a file system
                                                                	// or a common prefix ending in "/" in S3.
                                                                	//
                                                                	// REQUIRES: Last call to Scan returned true.
                                                                	IsDir() bool
                                                                
                                                                	// Info returns metadata of the file that was scanned.
                                                                	//
                                                                	// REQUIRES: Last call to Scan returned true.
                                                                	Info() Info
                                                                }

                                                                  Lister lists files in a directory tree. Not thread safe.

                                                                  func List

                                                                  func List(ctx context.Context, prefix string, recursive bool) Lister

                                                                    List finds all files whose pathnames under "dir" or its subdirectories. All the files returned by the lister will have pathnames of form dir/something. For example List(ctx, "foo") will yield "foo/bar.txt", but not "foo.txt".

                                                                    Example: impl.List(ctx, "s3://grail-data/foo")

                                                                    type Opts

                                                                    type Opts struct {
                                                                    	// When set, this flag causes the file package to keep retrying when the file
                                                                    	// is reported as not found. This flag should be set when:
                                                                    	//
                                                                    	// 1. you are accessing a file on S3, and
                                                                    	//
                                                                    	// 2. an application may have attempted to GET the same file in recent past
                                                                    	// (~5 minutes). The said application may be on a different machine.
                                                                    	//
                                                                    	// This flag is honored only by S3 to work around the problem where s3 may
                                                                    	// report spurious KeyNotFound error after a GET request to the same file.
                                                                    	// For more details, see
                                                                    	// https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#CoreConcepts,
                                                                    	// section "S3 Data Consistency Model". In particular:
                                                                    	//
                                                                    	//   The caveat is that if you make a HEAD or GET request to the key
                                                                    	//   name (to find if the object exists) before creating the object, Amazon S3
                                                                    	//   provides eventual consistency for read-after-write.
                                                                    	RetryWhenNotFound bool
                                                                    
                                                                    	// When set, Close will ignore NoSuchUpload error from S3
                                                                    	// CompleteMultiPartUpload and silently returns OK.
                                                                    	//
                                                                    	// This is to work around a bug where concurrent uploads to one file sometimes
                                                                    	// causes an upload request to be lost on the server side.
                                                                    	// https://console.aws.amazon.com/support/cases?region=us-west-2#/6299905521/en
                                                                    	// https://github.com/yasushi-saito/s3uploaderror
                                                                    	//
                                                                    	// Set this flag only if:
                                                                    	//
                                                                    	//  1. you are writing to a file on S3, and
                                                                    	//
                                                                    	//  2. possible concurrent writes to the same file produce the same
                                                                    	//  contents, so you are ok with taking any of them.
                                                                    	//
                                                                    	// If you don't set this flag, then concurrent writes to the same file may
                                                                    	// fail with a NoSuchUpload error, and it is up to you to retry.
                                                                    	//
                                                                    	// On non-S3 file systems, this flag is ignored.
                                                                    	IgnoreNoSuchUpload bool
                                                                    }

                                                                      Opts controls the file access requests, such as Open and Stat.

                                                                      Directories

                                                                      Path Synopsis
                                                                      internal
                                                                      Package s3file implements grail file interface for S3.
                                                                      Package s3file implements grail file interface for S3.
                                                                      internal/cmd/resolvetest
                                                                      resolvetest simply resolves a hostname at an increasing time interval to observe the diversity in DNS lookup addresses for the host.
                                                                      resolvetest simply resolves a hostname at an increasing time interval to observe the diversity in DNS lookup addresses for the host.