Documentation ¶
Overview ¶
Package file provides basic file operations across multiple file-system types. It is designed for use in applications that operate uniformly on multiple storage types, such as local files, S3 and HTTP.
Overview ¶
This package is designed with following goals:
- Support popular file systems, especially S3 and the local file system.
- Define operation semantics that are implementable on all the supported file systems, yet practical and usable.
- Extensible. Provide leeway to do things like registering new file system types or ticket-based authorizations.
This package defines two key interfaces, Implementation and File.
- Implementation provides filesystem operations, such as Open, Remove, and List (directory walking).
- File implements operations on a file. It is created by Implementation.{Open,Create} calls. File is similar to go's os.File object but provides limited functionality.
Reading and writing files ¶
The following snippet shows registering an S3 implementation, then writing and reading a S3 file.
import ( "context" "ioutil" "github.com/grailbio/base/file" "github.com/grailbio/base/file/s3file" // file.Implementation implementation for S3 "github.com/aws/aws-sdk-go/aws/session" "github.com/stretchr/testify/require" ) func init() { file.RegisterImplementation("s3", s3file.NewImplementation( s3file.NewDefaultProvider())) } // Caution: this code ignores all errors. func WriteTest() { ctx := context.Background() f, err := file.Create(ctx, "s3://grail-saito/tmp/test.txt") n, err = f.Writer(ctx).Write([]byte{"Hello"}) err = f.Close(ctx) } func ReadTest() { ctx := context.Background() f, err := file.Open(ctx, "s3://grail-saito/tmp/test.txt") data, err := ioutil.ReadAll(f.Reader(ctx)) err = f.Close(ctx) }
To open a file for reading or writing, run file.Open("s3://bucket/key") or file.Create("s3://bucket/key"). A File object does not implement an io.Reader or io.Writer directly. Instead, you must call File.Reader or File.Writer to start reading or writing. These methods are split from the File itself so that an application can pass different contexts to different I/O operations.
File-system operations ¶
The file package provides functions similar to those in the standard os class. For example, file.Remove("s3://bucket/key") removes a file, and file.Stat("s3://bucket/key") provides a metadata about the file.
Pathname utility functions ¶
The file package also provides functions that are similar to those in the standard filepath package. Functions file.Base, file.Dir, file.Join work just like filepath.{Base,Dir,Join}, except that they handle the URL pathnames properly. For example, file.Join("s3://foo", "bar") will return "s3://foo/bar", whereas filepath.Join("s3://foo", "bar") would return "s3:/foo/bar".
Registering a filesystem implementation ¶
Function RegisterImplementation associates an implementation to a scheme ("s3", "http", "git", etc). A local file system implementation is automatically available without any explicit registration. RegisterImplementation is usually invoked when a process starts up, for all the supported file system types. For example:
import ( "ioutil" "github.com/grailbio/base/context" "github.com/grailbio/base/file" "github.com/grailbio/base/file/s3file" // file.Implementation implementation for S3 ) func init() { file.RegisterImplementation("s3:", s3file.NewImplementation(...)) } func main() { ctx := context.Background() f, err := file.Open(ctx, "s3://somebucket/foo.txt") data, err := ioutil.ReadAll(f.Reader(ctx)) err := f.Close(ctx) ... }
Once an implementation is registered, the files for that scheme can be opened or created using "scheme:name" pathname.
Differences from the os package ¶
The file package is similar to Go's standard os package. The differences are the following.
- The file package focuses on providing a file-like API for object storage systems, such as S3 or GCS.
- Mutations to a File are restricted to whole-file writes. There is no option to overwrite a part of an existing file.
- All the operations take a context parameter.
- file.File does not implement io.Reader nor io.Writer directly. One must call File.Reader or File.Writer methods to obtains a reader or writer object.
- Directories are simulated in a best-effort manner on implementations that do not support directories as first-class entities, such as S3. Lister provides IsDir() for the current path. Info(path) returns nil for directories.
Concurrency ¶
The Implementation and File provide an open-close consistency. More specifically, this package linearizes fileops, with a fileop defined in the following way: fileop is a set of operations, starting from Implementation.{Open,Create}, followed by read/write/stat operations on the file, followed by File.Close. Operations such as Implementation.{Stat,Remove,List} and Lister.Scan form a singleton fileop.
Caution: a local file system on NFS (w/o cache leasing) doesn't provide this guarantee. Use NFS at your own risk.
Example (Localfile) ¶
Example_localfile is an example of basic read/write operations on the local file system.
package main import ( "context" "fmt" "io/ioutil" "github.com/grailbio/base/file" ) func main() { doWrite := func(ctx context.Context, data []byte, path string) { out, err := file.Create(ctx, path) if err != nil { panic(err) } if _, err = out.Writer(ctx).Write(data); err != nil { panic(err) } if err := out.Close(ctx); err != nil { panic(err) } } doRead := func(ctx context.Context, path string) []byte { in, err := file.Open(ctx, path) if err != nil { panic(err) } data, err := ioutil.ReadAll(in.Reader(ctx)) if err != nil { panic(err) } if err := in.Close(ctx); err != nil { panic(err) } return data } ctx := context.Background() doWrite(ctx, []byte("Blue box jumped over red bat"), "/tmp/foohah.txt") fmt.Printf("Got: %s\n", string(doRead(ctx, "/tmp/foohah.txt"))) }
Output: Got: Blue box jumped over red bat
Index ¶
- func Base(path string) string
- func CloseAndReport(ctx context.Context, f Closer, err *error)deprecated
- func Dir(path string) string
- func IsAbs(path string) bool
- func Join(elems ...string) string
- func MustClose(ctx context.Context, f Closer)
- func MustParsePath(path string) (scheme, suffix string)
- func ParsePath(path string) (scheme, suffix string, err error)
- func Presign(ctx context.Context, path, method string, expiry time.Duration) (string, error)
- func ReadFile(ctx context.Context, path string, opts ...Opts) ([]byte, error)
- func RegisterImplementation(scheme string, implFactory func() Implementation)
- func Remove(ctx context.Context, path string) error
- func RemoveAll(ctx context.Context, path string) error
- func WriteFile(ctx context.Context, path string, data []byte) error
- type Closer
- type ETagged
- type Error
- type File
- type Implementation
- type Info
- type Lister
- type Opts
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Base ¶
Base returns the last element of the path. It is the same as filepath.Base for a local filesystem path. Else, it acts like filepath.Base, with the following differences: (1) the path separator is always '/'. (2) if the URL suffix is empty, it returns the path itself.
Example:
file.Base("s3://") returns "s3://". file.Base("s3://foo/hah/") returns "hah".
Example ¶
package main import ( "fmt" "github.com/grailbio/base/file" ) func main() { fmt.Println(file.Base("")) fmt.Println(file.Base("foo1")) fmt.Println(file.Base("foo2/")) fmt.Println(file.Base("/")) fmt.Println(file.Base("s3://")) fmt.Println(file.Base("s3://blah1")) fmt.Println(file.Base("s3://blah2/")) fmt.Println(file.Base("s3://foo/blah3//")) }
Output: . foo1 foo2 / s3:// blah1 blah2 blah3
func CloseAndReport
deprecated
CloseAndReport returns a defer-able helper that calls f.Close and reports errors, if any, to *err. Pass your function's named return error. Example usage:
func processFile(filename string) (_ int, err error) { ctx := context.Background() f, err := file.Open(ctx, filename) if err != nil { ... } defer file.CloseAndReport(ctx, f, &err) ... }
If your function returns with an error, any f.Close error will be chained appropriately.
Deprecated: Use errors.CleanUpCtx directly.
func Dir ¶
Dir returns the all but the last element of the path. It the same as filepath.Dir for a local filesystem path. Else, it acts like filepath.Base, with the following differences: (1) the path separator is always '/'. (2) if the URL suffix is empty, it returns the path itself. (3) The path is not cleaned; for example repeated "/"s in the path is preserved.
Example ¶
package main import ( "fmt" "github.com/grailbio/base/file" ) func main() { fmt.Println(file.Dir("foo")) fmt.Println(file.Dir(".")) fmt.Println(file.Dir("/a/b")) fmt.Println(file.Dir("a/b")) fmt.Println(file.Dir("s3://ab/cd")) fmt.Println(file.Dir("s3://ab//cd")) fmt.Println(file.Dir("s3://a/b/")) fmt.Println(file.Dir("s3://a/b//")) fmt.Println(file.Dir("s3://a//b//")) fmt.Println(file.Dir("s3://a")) }
Output: . . /a a s3://ab s3://ab s3://a/b s3://a/b s3://a//b s3://
func IsAbs ¶
IsAbs returns true if pathname is absolute local path. For non-local file, it always returns true.
Example ¶
package main import ( "fmt" "github.com/grailbio/base/file" ) func main() { fmt.Println(file.IsAbs("foo")) fmt.Println(file.IsAbs("/foo")) fmt.Println(file.IsAbs("s3://foo")) }
Output: false true true
func Join ¶
Join joins any number of path elements into a single path, adding a separator if necessary. It works like filepath.Join, with the following differences:
- The path separator is always '/' (so this doesn't work on Windows).
- The interior of each element is not cleaned; for example if an element contains repeated "/"s in the middle, they are preserved.
- If elems[0] has a prefix of the form "<scheme>://" or "//", that prefix is retained. (A prefix of "/" is also retained; that matches filepath.Join's behavior.)
Example ¶
package main import ( "fmt" "github.com/grailbio/base/file" ) func main() { fmt.Println(file.Join()) fmt.Println(file.Join("")) fmt.Println(file.Join("foo", "bar")) fmt.Println(file.Join("foo", "")) fmt.Println(file.Join("foo", "/bar/")) fmt.Println(file.Join(".", "foo:bar")) fmt.Println(file.Join("s3://foo")) fmt.Println(file.Join("s3://foo", "/bar/")) fmt.Println(file.Join("s3://foo", "", "bar")) fmt.Println(file.Join("s3://foo", "0")) fmt.Println(file.Join("s3://foo", "abc")) fmt.Println(file.Join("s3://foo//bar", "/", "/baz")) }
Output: foo/bar foo foo/bar ./foo:bar s3://foo s3://foo/bar s3://foo/bar s3://foo/0 s3://foo/abc s3://foo//bar/baz
func MustClose ¶ added in v0.0.2
MustClose is a defer-able function that calls f.Close and panics on error.
Example:
ctx := context.Background() f, err := file.Open(ctx, filename) if err != nil { panic(err) } defer file.MustClose(ctx, f) ...
func MustParsePath ¶
MustParsePath is similar to ParsePath, but crashes the process on error.
func ParsePath ¶
ParsePath parses "path" and find the namespace object that can handle the path. The path can be of form either "scheme://path" just "path0/.../pathN". The latter indicates a local file.
On success, "schema" will be the schema part of the path. "suffix" will be the path part after the scheme://. For example, ParsePath("s3://key/bucket") will return ("s3", "key/bucket", nil).
For a local-filesystem path, this function returns ("", path, nil).
Example ¶
package main import ( "fmt" "github.com/grailbio/base/file" ) func main() { parse := func(path string) { scheme, suffix, err := file.ParsePath(path) if err != nil { fmt.Printf("%s 🢥 error %v\n", path, err) return } fmt.Printf("%s 🢥 scheme \"%s\", suffix \"%s\"\n", path, scheme, suffix) } parse("/tmp/test") parse("foo://bar") parse("foo:///bar") parse("foo:bar") parse("/foo:bar") }
Output: /tmp/test 🢥 scheme "", suffix "/tmp/test" foo://bar 🢥 scheme "foo", suffix "bar" foo:///bar 🢥 scheme "foo", suffix "/bar" foo:bar 🢥 error parsepath foo:bar: a URL must start with 'scheme://' /foo:bar 🢥 scheme "", suffix "/foo:bar"
func Presign ¶ added in v0.0.2
Presign is a shortcut for calling ParsePath(), then calling Implementation.Presign method.
func ReadFile ¶
ReadFile reads the given file and returns the contents. A successful call returns err == nil, not err == EOF. Arg opts is passed to file.Open.
func RegisterImplementation ¶
func RegisterImplementation(scheme string, implFactory func() Implementation)
RegisterImplementation arranges so that ParsePath(schema + "://anystring") will return (impl, "anystring", nil) in the future. Schema is a string such as "s3", "http".
RegisterImplementation() should generally be called when the process starts. implFactory will be invoked exactly once, upon the first request to this scheme; this allows you to register with a factory that has not yet been full configured (e.g., it requires parsing command line flags) as long as it will be configured before the first request.
REQUIRES: This function has not been called with the same schema before.
func Remove ¶
Remove is a shortcut for calling ParsePath(), then calling Implementation.Remove method.
Types ¶
type ETagged ¶ added in v0.0.7
type ETagged interface { // ETag is an identifier assigned to a specific version of the file. ETag() string }
ETagged defines a getter for a file with an ETag.
type Error ¶
type Error struct {
// contains filtered or unexported fields
}
Error implements io.{Reader,Writer,Seeker,Closer}. It returns the given error to any call.
type File ¶
type File interface { // String returns a diagnostic string. String() string // Name returns the path name given to file.Open or file.Create when this // object was created. Name() string // Stat returns file metadata. // // REQUIRES: Close has not been called Stat(ctx context.Context) (Info, error) // Reader creates an io.ReadSeeker object that operates on the file. If // Reader() is called multiple times, they share the seek pointer. // // For emphasis: these share state, which is different from OffsetReader! // // REQUIRES: Close has not been called Reader(ctx context.Context) io.ReadSeeker // OffsetReader creates a new, independent ioctx.ReadCloser, starting at // offset. Unlike Reader, its position in the file is only modified by Read // on this object. The returned object is not thread-safe, and callers are // responsible for serializing all of their calls, including calling Close // after all Reads are done. Of course, callers can use separate // OffsetReaders in parallel. // // Background: This API reflects S3's performance characteristics, where // initiating a new read position is relatively expensive, but then // streaming data is fast (including in parallel with multiple readers). // // REQUIRES: Close has not been called OffsetReader(offset int64) ioctx.ReadCloser // Writer creates a writes that to the file. If Writer() is called multiple // times, they share the seek pointer. // // REQUIRES: Close has not been called Writer(ctx context.Context) io.Writer // Discard discards a file before it is closed, relinquishing any // temporary resources implied by pending writes. This should be // used if the caller decides not to complete writing the file. // Discard is a best-effort operation. Discard is not defined for // files opened for reading. Exactly one of Discard or Close should // be called. No other File, io.ReadSeeker, or io.Writer methods // shall be called after Discard. Discard(ctx context.Context) // Closer commits the contents of a written file, invalidating the // File and all Readers and Writers created from the file. Exactly // one of Discard or Close should be called. No other File or // io.ReadSeeker, io.Writer methods shall be called after Close. Closer }
File defines operations on a file. Implementations must be thread safe.
func Create ¶
Create opens the given file writeonly. It is a shortcut for calling ParsePath(), then FindImplementation, then Implementation.Create.
type Implementation ¶
type Implementation interface { // String returns a diagnostic string. String() string // Open opens a file for reading. The pathname given to file.Open() is passed // here unchanged. Thus, it contains the URL prefix such as "s3://". // // Open returns an error of kind errors.NotExist if there is // no file at the provided path. Open(ctx context.Context, path string, opts ...Opts) (File, error) // Create opens a file for writing. If "path" already exists, the old contents // will be destroyed. If "path" does not exist already, the file will be newly // created. The pathname given to file.Create() is passed here unchanged. // Thus, it contains the URL prefix such as "s3://". // // Creating a file with the same name as an existing directory is unspecified // behavior and varies by implementation. Users are thus advised to avoid // this if possible. // // For filesystem based storage engines (e.g. localfile), if the directory // part of the path does not exist already, it will be created. If the path // is a directory, an error will be returned. // // For key based storage engines (e.g. S3), it is OK to create a file that // already exists as a common prefix for other objects, assuming a pseudo // path separator. So both "foo" and "foo/bar" can be used as paths for // creating regular files in the same storage. See List() for more context. Create(ctx context.Context, path string, opts ...Opts) (File, error) // List finds files and directories. If "path" points to a regular file, the // lister will return information about the file itself and finishes. // // If "path" is a directory, the lister will list file and directory under the // given path. When "recursive" is set to false, List finds files "one level" // below dir. Dir may end in /, but need not. All the files and directories // returned by the lister will have pathnames of the form dir/something. // // For key based storage engines (e.g. S3), a dir prefix not ending in "/" must // be followed immediately by "/" in some object keys, and only such keys // will be returned. // With "recursive=true" List finds all files whose pathnames under "dir" or its // subdirectories. All the files returned by the lister will have pathnames of // the form dir/something. Directories will not be returned as separate entities. // For example List(ctx, "foo",true) will yield "foo/bar/bat.txt", but not "foo.txt" // or "foo/bar/", while List(ctx, "foo", false) will yield "foo/bar", and // "foo/bat.txt", but not "foo.txt" or "foo/bar/bat.txt". There is no difference // in the return value of List(ctx, "foo", ...) and List(ctx, "foo/", ...) List(ctx context.Context, path string, recursive bool) Lister // Stat returns the file metadata. It returns nil if path is // a directory. (There is no direct test for existence of a // directory.) // // Stat returns an error of kind errors.NotExist if there is // no file at the provided path. Stat(ctx context.Context, path string, opts ...Opts) (Info, error) // Remove removes the file. The path passed to file.Remove() is passed here // unchanged. Remove(ctx context.Context, path string) error // Presign returns a URL that can be used to perform the given HTTP method, // usually one of "GET", "PUT" or "DELETE", on the path for the duration // specified in expiry. // // It returns an error of kind errors.NotSupported for implementations that // do not support signed URLs, or that do not support the given HTTP method. // // Unlike Open and Stat, this method does not return an error of kind // errors.NotExist if there is no file at the provided path. Presign(ctx context.Context, path, method string, expiry time.Duration) (url string, err error) }
Implementation implements operations for a file-system type. Thread safe.
func FindImplementation ¶
func FindImplementation(scheme string) Implementation
FindImplementation returns an Implementation object registered for the given scheme. It returns nil if the scheme is not registered.
func NewLocalImplementation ¶
func NewLocalImplementation() Implementation
NewLocalImplementation returns a new file.Implementation for the local file system that uses Go's native "os" module. This function is only for unittests. Applications should use functions such as file.Open, file.Create to access the local file system.
type Info ¶
type Info interface { // Size returns the length of the file in bytes for regular files; system-dependent for others Size() int64 // ModTime returns modification time for regular files; system-dependent for others ModTime() time.Time }
Info represents file metadata.
type Lister ¶
type Lister interface { // Scan advances the lister to the next entry. It returns // false either when the scan stops because we have reached the end of the input // or else because there was error. After Scan returns, the Err method returns // any error that occurred during scanning. Scan() bool // Err returns the first error that occurred while scanning. Err() error // Path returns the last path that was scanned. The path always starts with // the directory path given to the List method. // // REQUIRES: Last call to Scan returned true. Path() string // IsDir() returns true if Path() refers to a directory in a file system // or a common prefix ending in "/" in S3. // // REQUIRES: Last call to Scan returned true. IsDir() bool // Info returns metadata of the file that was scanned. // // REQUIRES: Last call to Scan returned true. Info() Info }
Lister lists files in a directory tree. Not thread safe.
type Opts ¶
type Opts struct { // When set, this flag causes the file package to keep retrying when the file // is reported as not found. This flag should be set when: // // 1. you are accessing a file on S3, and // // 2. an application may have attempted to GET the same file in recent past // (~5 minutes). The said application may be on a different machine. // // This flag is honored only by S3 to work around the problem where s3 may // report spurious KeyNotFound error after a GET request to the same file. // For more details, see // https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#CoreConcepts, // section "S3 Data Consistency Model". In particular: // // The caveat is that if you make a HEAD or GET request to the key // name (to find if the object exists) before creating the object, Amazon S3 // provides eventual consistency for read-after-write. RetryWhenNotFound bool // When set, Close will ignore NoSuchUpload error from S3 // CompleteMultiPartUpload and silently returns OK. // // This is to work around a bug where concurrent uploads to one file sometimes // causes an upload request to be lost on the server side. // https://console.aws.amazon.com/support/cases?region=us-west-2#/6299905521/en // https://github.com/yasushi-saito/s3uploaderror // // Set this flag only if: // // 1. you are writing to a file on S3, and // // 2. possible concurrent writes to the same file produce the same // contents, so you are ok with taking any of them. // // If you don't set this flag, then concurrent writes to the same file may // fail with a NoSuchUpload error, and it is up to you to retry. // // On non-S3 file systems, this flag is ignored. IgnoreNoSuchUpload bool }
Opts controls the file access requests, such as Open and Stat.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
fsnode represents a filesystem as a directed graph (probably a tree for many implementations).
|
fsnode represents a filesystem as a directed graph (probably a tree for many implementations). |
fsnodefuse implements github.com/hanwen/go-fuse/v2/fs for fsnode.T. It's a work-in-progress.
|
fsnodefuse implements github.com/hanwen/go-fuse/v2/fs for fsnode.T. It's a work-in-progress. |
internal
|
|
Package s3file implements grail file interface for S3.
|
Package s3file implements grail file interface for S3. |
internal/cmd/resolvetest
resolvetest simply resolves a hostname at an increasing time interval to observe the diversity in DNS lookup addresses for the host.
|
resolvetest simply resolves a hostname at an increasing time interval to observe the diversity in DNS lookup addresses for the host. |