godirwalk

package module
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 29, 2017 License: BSD-2-Clause Imports: 8 Imported by: 394

README

godirwalk

godirwalk is a library for traversing a directory tree on a file system.

In short, why do I use this library?

  1. It's faster than filepath.Walk.
  2. It's more correct on Windows than filepath.Walk.
  3. It's more easy to use than filepath.Walk.
  4. It's more flexible than filepath.Walk.

Usage Example

This library will normalize the provided top level directory name based on the os-specific path separator by calling filepath.Clean on its first argument. However it always provides the pathname created by using the correct os-specific path separator when invoking the provided callback function.

    dirname := "some/directory/root"
    err := godirwalk.Walk(dirname, &godirwalk.Options{
        Callback: func(osPathname string, de *godirwalk.Dirent) error {
            fmt.Printf("%s %s\n", de.ModeType(), osPathname)
            return nil
        },
    })

This library not only provides functions for traversing a file system directory tree, but also for obtaining a list of immediate descendants of a particular directory, typically much more quickly than using os.ReadDir or os.ReadDirnames.

Documentation is available via GoDoc.

Description

Here's why I use godirwalk in preference to filepath.Walk, os.ReadDir, and os.ReadDirnames.

It's faster than filepath.Walk

When compared against filepath.Walk in benchmarks, it has been observed to run up to ten times the speed on unix, comparable to the speed of the unix find utility, and about four times the speed on Windows.

How does it obtain this performance boost? Primarily by not invoking os.Stat on every file system node it encounters.

While traversing a file system directory tree, filepath.Walk obtains the list of immediate descendants of a directory, and throws away the file system node type information provided by the operating system that comes with the node's name. Then, immediately prior to invoking the callback function, filepath.Walk invokes os.Stat for each node, and passes the returned os.FileInfo information to the callback.

While the os.FileInfo information provided by os.Stat is extremely helpful--and even includes the os.FileMode data--providing it requires an additional system call for each node.

Because most callbacks only care about what the node type is, this library does not throw that information away, but rather provides that information to the callback function in the form of its os.FileMode value. If the callback does care about a particular node's entire os.FileInfo data structure, the callback can easiy invoke os.Stat when needed, and only when needed.

It's more correct on Windows than filepath.Walk

I did not previously care about this either, but humor me. We all love how we can write once and run everywhere. It is essential for the language's adoption, growth, and success, that the software we create can run unmodified on both on unix like operating systems and on Windows.

When the traversed file system has a loop caused by symbolic links to directories, on Windows filepath.Walk will continue following directory symbolic links, even though it is not supposed to, eventually causing filepath.Walk to return an error when the pathname gets too long from concatenating the directories in the loop onto the pathname of the file system node. While this is clearly not intentional, until it is fixed in the standard library, it presents a compatibility problem.

This library correctly identifies symbolic links that point to directories and will only follow them when ResurseSymbolicLinks is set to true. Behavior on Windows and unix like operating systems is identical.

It's more easy to use than filepath.Walk

Since this library does not invoke os.Stat on every file system node it encounters, there is no possible error event for the callback function to filter on. The third argument in the filepath.WalkFunc function signature to pass the error from os.Stat to the callback function is no longer necessary, and thus eliminated from signature of the callback function from this library.

Also, filepath.Walk invokes the callback function with a slashed version of the pathname regardless of the os-specific path separator. This library invokes callback function with the os-specific pathname separator, obviating a call to filepath.Clean for each node in the callback function, prior to actually using the provided pathname.

In other words, even on Windows, filepath.Walk will invoke the callback with some/path/to/foo.txt, requiring well written clients to perform pathname normalization for every file prior to working with the specified file. In truth, many clients developed on unix and not tested on Windows neglect this difference, and will result in software bugs when running on Windows. This library however would invoke the callback function with some\path\to\foo.txt for the same file, eliminating the need to normalize the pathname by the client, and lessen the likelyhood that a client will work on unix but not on Windows.

It's more flexible than filepath.Walk

The filepath.Walk function attempts to ignore the problem posed by file system directory loops created by symbolic links. I say "attempts to" because it does follow symbolic links to directories on Windows, causing infinite loops, or error messages, and causing behavior to be different based on which platform is running. Even so, there are times when following symbolic links while traversing a file system directory tree is desired, and this library allows that by providing the FollowSymbolicLinks option parameter when the upstream client requires the functionality.

The filepath.Walk function also always sorts the immediate descendants of a directory prior to traversing them. While this is usually desired for consistent file system traversal, it is not always needed, and may impact performance. This library provides the Unsorted option to skip sorting directory descendants when the order of file system traversal is not important for some applications.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ReadDirnames added in v0.0.2

func ReadDirnames(osDirname string, scratchBuffer []byte) ([]string, error)

ReadDirnames returns a slice of strings, representing the immediate descendants of the specified directory. If the specified directory is a symbolic link, it will be resolved.

If an optional scratch buffer is provided that is at least one page of memory, it will be used when reading directory entries from the file system.

Note that this function, depending on operating system, may or may not invoke the ReadDirents function, in order to prepare the list of immediate descendants. Therefore, if your program needs both the names and the file system mode types of descendants, it will always be faster to invoke ReadDirents directly, rather than calling this function, then looping over the results and calling os.Stat for each child.

children, err := godirwalk.ReadDirnames(osDirname, nil)
if err != nil {
    return nil, errors.Wrap(err, "cannot get list of directory children")
}
sort.Strings(children)
for _, child := range children {
    fmt.Printf("%s\n", child)
}

func Walk added in v0.1.0

func Walk(pathname string, options *Options) error

Walk walks the file tree rooted at the specified directory, calling the specified callback function for each file system node in the tree, including root, symbolic links, and other node types. The nodes are walked in lexical order, which makes the output deterministic but means that for very large directories this function can be inefficient.

This function is often much faster than filepath.Walk because it does not invoke os.Stat for every node it encounters, but rather obtains the file system node type when it reads the parent directory.

func main() {
    dirname := "."
    if len(os.Args) > 1 {
        dirname = os.Args[1]
    }
    err := godirwalk.Walk(dirname, &godirwalk.Options{
        Callback: func(osPathname string, de *godirwalk.Dirent) error {
            fmt.Printf("%s %s\n", de.ModeType(), osPathname)
            return nil
        },
    })
    if err != nil {
        fmt.Fprintf(os.Stderr, "%s\n", err)
        os.Exit(1)
    }
}

Types

type Dirent

type Dirent struct {
	// contains filtered or unexported fields
}

Dirent stores the name and file system mode type of discovered file system entries.

func (Dirent) IsDir added in v1.0.0

func (de Dirent) IsDir() bool

IsDir returns true if and only if the Dirent represents a file system directory. Note that on some operating systems, more than one file mode bit may be set for a node. For instance, on Windows, a symbolic link that points to a directory will have both the directory and the symbolic link bits set.

func (de Dirent) IsSymlink() bool

IsSymlink returns true if and only if the Dirent represents a file system symbolic link. Note that on some operating systems, more than one file mode bit may be set for a node. For instance, on Windows, a symbolic link that points to a directory will have both the directory and the symbolic link bits set.

func (Dirent) ModeType

func (de Dirent) ModeType() os.FileMode

ModeType returns the mode bits that specify the file system node type. We could make our own enum-like data type for encoding the file type, but Go's runtime already gives us architecture independent file modes, as discussed in `os/types.go`:

Go's runtime FileMode type has same definition on all systems, so that
information about files can be moved from one system to another portably.

func (Dirent) Name

func (de Dirent) Name() string

Name returns the basename of the file system entry.

type Dirents

type Dirents []*Dirent

Dirents represents a slice of Dirent pointers, which are sortable by name. This type satisfies the `sort.Interface` interface.

func ReadDirents added in v0.0.2

func ReadDirents(osDirname string, scratchBuffer []byte) (Dirents, error)

ReadDirents returns a sortable slice of pointers to Dirent structures, each representing the file system name and mode type for one of the immediate descendant of the specified directory. If the specified directory is a symbolic link, it will be resolved.

If an optional scratch buffer is provided that is at least one page of memory, it will be used when reading directory entries from the file system.

children, err := godirwalk.ReadDirents(osDirname, nil)
if err != nil {
    return nil, errors.Wrap(err, "cannot get list of directory children")
}
sort.Sort(children)
for _, child := range children {
    fmt.Printf("%s %s\n", child.ModeType, child.Name)
}

func (Dirents) Len

func (l Dirents) Len() int

Len returns the count of Dirent structures in the slice.

func (Dirents) Less

func (l Dirents) Less(i, j int) bool

Less returns true if and only if the Name of the element specified by the first index is lexicographically less than that of the second index.

func (Dirents) Swap

func (l Dirents) Swap(i, j int)

Swap exchanges the two Dirent entries specified by the two provided indexes.

type Options added in v1.0.0

type Options struct {
	// FollowSymbolicLinks specifies whether Walk will follow symbolic links
	// that refer to directories. When set to false or left as its zero-value,
	// Walk will still invoke the callback function with symbolic link nodes,
	// but if the symbolic link refers to a directory, it will not recurse on
	// that directory. When set to true, Walk will recurse on symbolic links
	// that refer to a directory.
	FollowSymbolicLinks bool

	// Unsorted controls whether or not Walk will sort the immediate descendants
	// of a directory by their relative names prior to visiting each of those
	// entries.
	//
	// When set to false or left at its zero-value, Walk will get the list of
	// immediate descendants of a particular directory, sort that list by
	// lexical order of their names, and then visit each node in the list in
	// sorted order. This will cause Walk to always traverse the same directory
	// tree in the same order, however may be inefficient for directories with
	// many immediate descendants.
	//
	// When set to true, Walk skips sorting the list of immediate descendants
	// for a directory, and simply visits each node in the order the operating
	// system enumerated them. This will be more fast, but with the side effect
	// that the traversal order may be different from one invocation to the
	// next.
	Unsorted bool

	// Callback is the function that Walk will invoke for every file system node
	// it encounters.
	Callback WalkFunc

	// ScratchBuffer is an optional scratch buffer for Walk to use when reading
	// directory entries, to reduce amount of garbage generation. Not all
	// architectures take advantage of the scratch buffer.
	ScratchBuffer []byte
}

Options provide parameters for how the Walk function operates.

type WalkFunc added in v0.1.0

type WalkFunc func(osPathname string, directoryEntry *Dirent) error

WalkFunc is the type of the function called for each file system node visited by Walk. The pathname argument will contain the argument to Walk as a prefix; that is, if Walk is called with "dir", which is a directory containing the file "a", the provided WalkFunc will be invoked with the argument "dir/a", using the correct os.PathSeparator for the Go Operating System architecture, GOOS. The directory entry argument is a pointer to a Dirent for the node, providing access to both the basename and the mode type of the file system node.

If an error is returned by the walk function, processing stops. The sole exception is when the function returns the special value filepath.SkipDir. If the function returns filepath.SkipDir when invoked on a directory, Walk skips the directory's contents entirely. If the function returns filepath.SkipDir when invoked on a non-directory file system node, Walk skips the remaining files in the containing directory.

Directories

Path Synopsis
examples
find-fast Module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL