cloudpath

package
v0.0.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 4, 2024 License: Apache-2.0 Imports: 4 Imported by: 5

README

Package cloudeng.io/path/cloudpath

import cloudeng.io/path/cloudpath

Package cloudpath provides utility routines for working with paths across both local and distributed storage systems. The set of schemes supported can be extended by providing additional implementations of the Matcher function. A cloudpath encodes two types of information:

  1. the path name itself which can be used to access the data it names.
  2. metadata about the where that filename is hosted.

For example, s3://my-bucket/a/b, contains the path '/my-bucket/a/b' as well the indication that this path is hosted on S3. Most cloud storage systems either use URI formats natively or support their use. Both AWS S3 and Google Cloud Storage support URLs: eg. storage.cloud.google.com/bucket/obj.

cloudpath provides operations for extracting both metadata and the path component, and operations for working with the extracted path directly. A common usage is to determine the 'scheme' (eg. s3, windows, unix etc) of a filename and to then operate on it appropriately. cloudpath represents a 'path' as a slice of strings to simplify often performed operations such as finding common prefixes, suffixes that are aware of the structure of the path. For example it should be possible to easily determine that s3://bucket/a/b is a prefix of https://s3.us-west-2.amazonaws.com/bucket/a/b/c.

All of the metadata for a path is represented using the Match type.

For manipulation, the path is converted to a cloudpath.T.

Constants

AWSS3, GoogleCloudStorage, UnixFileSystem, WindowsFileSystem, HTTP, HTTPS
// AWSS3 is the scheme for Amazon Web Service's S3 object store.
AWSS3 = "s3"
// GoogleCloudStorage is the scheme for Google's Cloud Storage object store.
GoogleCloudStorage = "GoogleCloudStorage"
// UnixFileSystem is the scheme for unix like systems such as linux, macos etc.
UnixFileSystem = "unix"
// WindowsFileSystem is the scheme for msdos and windows filesystems.
WindowsFileSystem = "windows"
// HTTP is the scheme for http.
HTTP = "http"
// HTTPS is the scheme for https.
HTTPS = "https"

Functions

Func HasPrefix
func HasPrefix(path, prefix []string) bool

HasPrefix returns true if path has the specified prefix.

Func Host
func Host(path string) string

Host calls DefaultMatchers.Host(path).

Func IsLocal
func IsLocal(path string) bool

IsLocal calls DefaultMatchers.IsLocal(path).

Func Key
func Key(path string) (string, rune)

Key calls DefaultMatchers.Key(path).

Func Parameters
func Parameters(path string) map[string][]string

Parameters calls DefaultMatchers.Parameters(path).

Func Path
func Path(path string) (string, rune)

Path calls DefaultMatchers.Path(path).

Func Region
func Region(url string) string

Region calls DefaultMatchers.Region(path).

Func Scheme
func Scheme(path string) string

Scheme calls DefaultMatchers.Scheme(path).

Func Volume
func Volume(path string) string

Volume calls DefaultMatchers.Volume(path).

Types

Type Match
type Match struct {
	// Original is the original string that was matched.
	Matched string
	// Scheme uniquely identifies the filesystem being used, eg. s3 or windows.
	Scheme string
	// Local is true for local filesystems.
	Local bool
	// Host will be 'localhost' for local filesystems, the host encoded
	// in a URL or otherwise empty if there is no notion of a host.
	Host string
	// Volume will be the bucket or file system share for systems that support
	// that concept, or an empty string otherwise.
	Volume string
	// Path is the filesystem path or filename to the data. It may be a prefix
	// on a cloud based system or a directory on a local one.
	Path string
	// Key is like Path except without the volume for systems where the volume
	// can appear in the path name.
	Key string
	// Region is the region for cloud based systems.
	Region string
	// Separator is the filesystem separator (e.g / or \ for windows).
	Separator rune
	// Parameters are any parameters encoded in a URL/URI based name.
	Parameters map[string][]string
}

Match is the result of a successful match.

Functions
func AWSS3Matcher(p string) Match

AWSS3Matcher implements Matcher for AWS S3 object names. It returns AWSS3 for its scheme result.

func GoogleCloudStorageMatcher(p string) Match

GoogleCloudStorageMatcher implements Matcher for Google Cloud Storage object names. It returns GoogleCloudStorage for its scheme result.

func URLMatcher(p string) Match

URLMatcher implements Matcher for http and https paths.

func UnixMatcher(p string) Match

UnixMatcher implements Matcher for unix filenames. It returns UnixFileSystem for its scheme result. It will match on file://[HOST]/[PATH].

func WindowsMatcher(p string) Match

WindowsMatcher implements Matcher for Windows filenames. It returns WindowsFileSystem for its scheme result.

Type Matcher
type Matcher func(p string) Match

Matcher is the prototype for functions that parse the supplied path to determine if it matches a specific scheme and then breaks out the metadata encoded in the path. If Match.Matched is empty then no match has been found. Matchers for local filesystems should return "" for the host.

Type MatcherSpec
type MatcherSpec []Matcher

MatcherSpec represents a set of Matchers that will be applied in order. The ordering is important, the most specific matchers need to be applied first. For example a matcher for Windows should precede that for a Unix filesystem since the latter can accept filenames in Windows format.

Variables
DefaultMatchers
DefaultMatchers MatcherSpec = []Matcher{
	AWSS3Matcher,
	GoogleCloudStorageMatcher,
	URLMatcher,
	WindowsMatcher,
	UnixMatcher,
}

DefaultMatchers represents the built in set of Matchers.

Methods
func (ms MatcherSpec) Host(path string) string

Host returns the host component of the path if there is one.

func (ms *MatcherSpec) IsLocal(path string) bool

IsLocal returns true if the path is for a local filesystem.

func (ms MatcherSpec) Key(path string) (string, rune)

Key returns the key component of path and the separator to use for it.

func (ms MatcherSpec) Match(p string) Match

Match applies all of the matchers in turn to match the supplied path.

func (ms *MatcherSpec) Parameters(path string) map[string][]string

Parameters returns the parameters in path, if any. If no parameters are present an empty (rather than nil), map is returned.

func (ms MatcherSpec) Path(path string) (string, rune)

Path returns the path component of path and the separator to use for it.

func (ms MatcherSpec) Region(url string) string

Region returns the region component for cloud based systems.

func (ms MatcherSpec) Scheme(path string) string

Scheme returns the portion of the path that precedes a leading '//' or "" otherwise.

func (ms MatcherSpec) Volume(path string) string

Volume returns the filesystem specific volume, if any, encoded in the path.

Type T
type T []string

T represents a cloudpath. Instances of T are created from native storage system paths and/or URLs and are designed to retain the following information.

  1. the path was absolute vs relative.
  2. the path was a prefix or a filepath.
  3. a path of zero length is represented as a nil slice and not an empty slice.

Redundant information is discarded:

  1. multiple consecutive instances of separator are treated as a single separator.

The resulting format is as follows:

  1. a relative path, ie. one that does not start with a separator has an empty string as the first item in the slice
  2. a path that ends with a separator has an empty string as the final component of the path

For example:

""         => []                 // empty
"/"        => ["", ""]           // absolute, prefix, IsRoot is true
"/abc"     => ["", "abc"]        // absolute, filepath
"abc"      => ["abc"]            // relative, filepath
"/abc/"    => ["", "abc", ""]    // absolute, prefix, IsRoot is false
"abc/"     => ["abc", ""]        // relative, prefix

T is defined as a type rather than using []string directly to avoid clients of this package misinterpreting the above rules and incorrectly manipulating the string slice.

Functions
func LongestCommonPrefix(paths []T) T

LongestCommonPrefix returns the longest prefix common to the specified cloudpaths.

func LongestCommonSuffix(paths []T) T

LongestCommonSuffix returns the longest suffix common to the specified cloudpaths.

func Split(path string, separator rune) T

Split slices path into an instance of T.

func SplitPath(path string) T

SplitPath calls Split with the results of cloudpath.Split(path).

func TrimPrefix(path, prefix []string) T

TrimPrefix removes the specified prefix from path. It returns nil if path and suffix are identical.

Methods
func (path T) AsFilepath() T

AsFilepath returns path as a filepath if it is not already one provided that is not a root or empty.

func (path T) AsPrefix() T

AsPrefix returns path as a path prefix if it is not already one.

func (path T) Base() string

Base returns the 'base', or 'filename' component of path, ie. the last one. If the path is a prefix then an empty string is returned.

func (path T) HasSuffix(suffix T) bool

HasSuffix returns true if path has the specified suffix.

func (path T) IsAbsolute() bool

IsAbsolute returns true if the components were derived from an absolute path.

func (path T) IsFilepath() bool

IsFilepath returns true if the path was derived from a filepath.

func (path T) IsRoot() bool

IsRoot returns true if the path was a derived from the 'root', ie. a single separator such as /.

func (path T) Join(separator rune) string

Join creates a string path from the supplied components. It follows the rules specified for Join. It is the inverse of Split, that is, newPath == origPath for:

newPath = Join(sep, Split(origPath,sep)...)
func (path T) Pop() (T, string)

Pop returns a new cloudpath.T with the trailing component removed and returned. Pop on a path for which IsRoot is true will return the root again. IsFilepath will always be false for the returned cloudpath.T.

func (path T) Prefix() T

Prefix returns the prefix component of a path.

func (path T) Push(p string) T

Push returns a new cloudpath.T with the supplied component appended. IsFilePath will always be true for the returned value unless p is an empty string in which case Push is equivalent to path.AsFilePath().

func (path T) String() string

String implements stringer. It calls path.Join with / as the separator.

func (path T) TrimSuffix(suffix T) T

TrimSuffix removes the specified suffix from path. It returns nil if path and suffix are identical.

Examples

ExampleScheme

Documentation

Overview

Package cloudpath provides utility routines for working with paths across both local and distributed storage systems. The set of schemes supported can be extended by providing additional implementations of the Matcher function. A cloudpath encodes two types of information:

  1. the path name itself which can be used to access the data it names.
  2. metadata about the where that filename is hosted.

For example, s3://my-bucket/a/b, contains the path '/my-bucket/a/b' as well the indication that this path is hosted on S3. Most cloud storage systems either use URI formats natively or support their use. Both AWS S3 and Google Cloud Storage support URLs: eg. storage.cloud.google.com/bucket/obj.

cloudpath provides operations for extracting both metadata and the path component, and operations for working with the extracted path directly. A common usage is to determine the 'scheme' (eg. s3, windows, unix etc) of a filename and to then operate on it appropriately. cloudpath represents a 'path' as a slice of strings to simplify often performed operations such as finding common prefixes, suffixes that are aware of the structure of the path. For example it should be possible to easily determine that s3://bucket/a/b is a prefix of https://s3.us-west-2.amazonaws.com/bucket/a/b/c.

All of the metadata for a path is represented using the Match type.

For manipulation, the path is converted to a cloudpath.T.

Index

Examples

Constants

View Source
const (
	// AWSS3 is the scheme for Amazon Web Service's S3 object store.
	AWSS3 = "s3"
	// GoogleCloudStorage is the scheme for Google's Cloud Storage object store.
	GoogleCloudStorage = "gs"
	// UnixFileSystem is the scheme for unix like systems such as linux, macos etc.
	UnixFileSystem = "unix"
	// WindowsFileSystem is the scheme for msdos and windows filesystems.
	WindowsFileSystem = "windows"
	// HTTP is the scheme for http.
	HTTP = "http"
	// HTTPS is the scheme for https.
	HTTPS = "https"
)

Variables

This section is empty.

Functions

func Base

func Base(scheme string, separator byte, path string) string

Base is like path.Base but for cloud storage paths which may include a scheme (eg. s3://). It does not support URI host names, parameters etc. In particular:

  • the scheme parameter should include the trailing :// or be the empty string.
  • a trailing separator means that the path is a prefix with an empty base and hence Base returns "".
  • the returned basename never includes the supplied scheme.

func HasPrefix

func HasPrefix(path, prefix []string) bool

HasPrefix returns true if path has the specified prefix.

func Host

func Host(path string) string

Host calls DefaultMatchers.Host(path).

func IsLocal

func IsLocal(path string) bool

IsLocal calls DefaultMatchers.IsLocal(path).

func Join

func Join(sep byte, components []string) string

Join will join the supplied components using the supplied separator behaviour appropriate for cloud storage paths that do not elide multiple contiguous separators. It behaves as follows:

  • empty components are ignored.
  • trailing instances of sep are preserved.
  • separators are added only when not already present as a trailing character in the previous component and leading character in the next component.
  • a leading separator is ignored/removed if the previous component ended with a separator and the next component starts with a separator.

func Key added in v0.0.5

func Key(path string) (string, rune)

Key calls DefaultMatchers.Key(path).

func Parameters

func Parameters(path string) map[string][]string

Parameters calls DefaultMatchers.Parameters(path).

func Path

func Path(path string) (string, rune)

Path calls DefaultMatchers.Path(path).

func Prefix

func Prefix(scheme string, separator byte, path string) string

Prefix is like path.Dir but for cloud storage paths which may include a scheme (eg. s3:///). It does not support URI host names, parameters etc. In particular:

  • the scheme parameter should include the trailing :// or be the empty string.
  • the returned prefix never includes the supplied scheme.
  • the returned prefix never includes a trailing separator.

func Region added in v0.0.5

func Region(url string) string

Region calls DefaultMatchers.Region(path).

func Scheme

func Scheme(path string) string

Scheme calls DefaultMatchers.Scheme(path).

Example
package main

import (
	"fmt"

	"cloudeng.io/path/cloudpath"
)

func main() {
	for _, example := range []string{
		"s3://my-bucket/object",
		"https://storage.cloud.google.com/bucket/obj",
		"gs://my-bucket",
		`c:\root\file`,
	} {
		scheme := cloudpath.Scheme(example)
		local := cloudpath.IsLocal(example)
		host := cloudpath.Host(example)
		volume := cloudpath.Volume(example)
		path, sep := cloudpath.Path(example)
		key, _ := cloudpath.Key(example)
		region := cloudpath.Region(example)
		parameters := cloudpath.Parameters(example)
		fmt.Printf("%v %q %q %q %q %q %q %c %v\n", local, scheme, host, region, volume, path, key, sep, parameters)
	}
}
Output:

false "s3" "" "" "my-bucket" "my-bucket/object" "object" / map[]
false "gs" "storage.cloud.google.com" "" "bucket" "/bucket/obj" "obj" / map[]
false "gs" "" "" "my-bucket" "my-bucket" "" / map[]
true "windows" "" "" "c" "c:\\root\\file" "\\root\\file" \ map[]

func Volume

func Volume(path string) string

Volume calls DefaultMatchers.Volume(path).

Types

type Match

type Match struct {
	// Original is the original string that was matched.
	Matched string
	// Scheme uniquely identifies the filesystem being used, eg. s3 or windows.
	Scheme string
	// Local is true for local filesystems.
	Local bool
	// Host will be 'localhost' for local filesystems, the host encoded
	// in a URL or otherwise empty if there is no notion of a host.
	Host string
	// Volume will be the bucket or file system share for systems that support
	// that concept, or an empty string otherwise.
	Volume string
	// Path is the filesystem path or filename to the data. It may be a prefix
	// on a cloud based system or a directory on a local one.
	Path string
	// Key is like Path except without the volume for systems where the volume
	// can appear in the path name.
	Key string
	// Region is the region for cloud based systems.
	Region string
	// Separator is the filesystem separator (e.g / or \ for windows).
	Separator rune
	// Parameters are any parameters encoded in a URL/URI based name.
	Parameters map[string][]string
}

Match is the result of a successful match.

func AWSS3Matcher

func AWSS3Matcher(p string) Match

AWSS3Matcher implements Matcher for AWS S3 object names assuming '/' as the separator. It returns AWSS3 for its scheme result.

func AWSS3MatcherSep added in v0.0.9

func AWSS3MatcherSep(p string, sep byte) Match

func GoogleCloudStorageMatcher

func GoogleCloudStorageMatcher(p string) Match

GoogleCloudStorageMatcher implements Matcher for Google Cloud Storage object names. It returns GoogleCloudStorage for its scheme result.

func URLMatcher added in v0.0.8

func URLMatcher(p string) Match

URLMatcher implements Matcher for http and https paths.

func UnixMatcher

func UnixMatcher(p string) Match

UnixMatcher implements Matcher for unix filenames. It returns UnixFileSystem for its scheme result. It will match on file://[HOST]/[PATH].

func WindowsMatcher

func WindowsMatcher(p string) Match

WindowsMatcher implements Matcher for Windows filenames. It returns WindowsFileSystem for its scheme result.

type Matcher

type Matcher func(p string) Match

Matcher is the prototype for functions that parse the supplied path to determine if it matches a specific scheme and then breaks out the metadata encoded in the path. If Match.Matched is empty then no match has been found. Matchers for local filesystems should return "" for the host.

type MatcherSpec

type MatcherSpec []Matcher

MatcherSpec represents a set of Matchers that will be applied in order. The ordering is important, the most specific matchers need to be applied first. For example a matcher for Windows should precede that for a Unix filesystem since the latter can accept filenames in Windows format.

DefaultMatchers represents the built in set of Matchers.

func (MatcherSpec) Host

func (ms MatcherSpec) Host(path string) string

Host returns the host component of the path if there is one.

func (*MatcherSpec) IsLocal

func (ms *MatcherSpec) IsLocal(path string) bool

IsLocal returns true if the path is for a local filesystem.

func (MatcherSpec) Key added in v0.0.5

func (ms MatcherSpec) Key(path string) (string, rune)

Key returns the key component of path and the separator to use for it.

func (MatcherSpec) Match

func (ms MatcherSpec) Match(p string) Match

Match applies all of the matchers in turn to match the supplied path.

func (*MatcherSpec) Parameters

func (ms *MatcherSpec) Parameters(path string) map[string][]string

Parameters returns the parameters in path, if any. If no parameters are present an empty (rather than nil), map is returned.

func (MatcherSpec) Path

func (ms MatcherSpec) Path(path string) (string, rune)

Path returns the path component of path and the separator to use for it.

func (MatcherSpec) Region added in v0.0.5

func (ms MatcherSpec) Region(url string) string

Region returns the region component for cloud based systems.

func (MatcherSpec) Scheme

func (ms MatcherSpec) Scheme(path string) string

Scheme returns the portion of the path that precedes a leading '//' or "" otherwise.

func (MatcherSpec) Volume

func (ms MatcherSpec) Volume(path string) string

Volume returns the filesystem specific volume, if any, encoded in the path.

type T added in v0.0.3

type T []string

T represents a cloudpath. Instances of T are created from native storage system paths and/or URLs and are designed to retain the following information.

  1. the path was absolute vs relative.
  2. the path was a prefix or a filepath.
  3. a path of zero length is represented as a nil slice and not an empty slice.

Redundant information is discarded:

  1. multiple consecutive instances of separator are treated as a single separator.

The resulting format is as follows:

  1. a relative path, ie. one that does not start with a separator has an empty string as the first item in the slice
  2. a path that ends with a separator has an empty string as the final component of the path

For example:

""         => []                 // empty
"/"        => ["", ""]           // absolute, prefix, IsRoot is true
"/abc"     => ["", "abc"]        // absolute, filepath
"abc"      => ["abc"]            // relative, filepath
"/abc/"    => ["", "abc", ""]    // absolute, prefix, IsRoot is false
"abc/"     => ["abc", ""]        // relative, prefix

T is defined as a type rather than using []string directly to avoid clients of this package misinterpreting the above rules and incorrectly manipulating the string slice.

func LongestCommonPrefix

func LongestCommonPrefix(paths []T) T

LongestCommonPrefix returns the longest prefix common to the specified cloudpaths.

func LongestCommonSuffix

func LongestCommonSuffix(paths []T) T

LongestCommonSuffix returns the longest suffix common to the specified cloudpaths.

func Split

func Split(path string, separator rune) T

Split slices path into an instance of T.

func SplitPath

func SplitPath(path string) T

SplitPath calls Split with the results of cloudpath.Split(path).

func TrimPrefix

func TrimPrefix(path, prefix []string) T

TrimPrefix removes the specified prefix from path. It returns nil if path and suffix are identical.

func (T) AsFilepath added in v0.0.3

func (path T) AsFilepath() T

AsFilepath returns path as a filepath if it is not already one provided that is not a root or empty.

func (T) AsPrefix added in v0.0.3

func (path T) AsPrefix() T

AsPrefix returns path as a path prefix if it is not already one.

func (T) Base added in v0.0.3

func (path T) Base() string

Base returns the 'base', or 'filename' component of path, ie. the last one. If the path is a prefix then an empty string is returned.

func (T) HasSuffix added in v0.0.3

func (path T) HasSuffix(suffix T) bool

HasSuffix returns true if path has the specified suffix.

func (T) IsAbsolute added in v0.0.3

func (path T) IsAbsolute() bool

IsAbsolute returns true if the components were derived from an absolute path.

func (T) IsFilepath added in v0.0.3

func (path T) IsFilepath() bool

IsFilepath returns true if the path was derived from a filepath.

func (T) IsRoot added in v0.0.3

func (path T) IsRoot() bool

IsRoot returns true if the path was a derived from the 'root', ie. a single separator such as /.

func (T) Join added in v0.0.3

func (path T) Join(separator rune) string

Join creates a string path from the supplied components. It follows the rules specified for Join. It is the inverse of Split, that is, newPath == origPath for:

newPath = Join(sep, Split(origPath,sep)...)

func (T) Pop added in v0.0.3

func (path T) Pop() (T, string)

Pop returns a new cloudpath.T with the trailing component removed and returned. Pop on a path for which IsRoot is true will return the root again. IsFilepath will always be false for the returned cloudpath.T.

func (T) Prefix added in v0.0.3

func (path T) Prefix() T

Prefix returns the prefix component of a path.

Example
package main

import (
	"fmt"

	"cloudeng.io/path/cloudpath"
)

func main() {
	date := cloudpath.Split("2012-11-27", '/').AsPrefix()
	for _, fullname := range []string{
		"s3://my-bucket/2012-11-27/shard-0000-of-0001.json",
		"/my-local-copy/2012-11-27/shard-0000-of-0001.json",
		"https://storage.cloud.google.com/google-copy/2012-11-27/shard-0001-of-0001.json",
	} {
		components := cloudpath.SplitPath(fullname)
		fmt.Printf("%v\n", components.Prefix().HasSuffix(date))

	}
}
Output:

true
true
true

func (T) Push added in v0.0.3

func (path T) Push(p string) T

Push returns a new cloudpath.T with the supplied component appended. IsFilePath will always be true for the returned value unless p is an empty string in which case Push is equivalent to path.AsFilePath().

func (T) String added in v0.0.3

func (path T) String() string

String implements stringer. It calls path.Join with / as the separator.

func (T) TrimSuffix added in v0.0.3

func (path T) TrimSuffix(suffix T) T

TrimSuffix removes the specified suffix from path. It returns nil if path and suffix are identical.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL