fastwalk

package module
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 4, 2024 License: MIT Imports: 10 Imported by: 11

README

GoDoc Test fastwalk on macOS Test fastwalk on Linux Test fastwalk on Windows

fastwalk

Fast parallel directory traversal for Golang.

Package fastwalk provides a fast parallel version of filepath.WalkDir that is ~2x faster on macOS, ~4x faster on Linux, ~6x faster on Windows, allocates 50% less memory, and requires 25% fewer memory allocations. Additionally, it is ~4-5x faster than godirwalk across OSes.

Inspired by and based off of golang.org/x/tools/internal/fastwalk.

Features

Usage

Usage is the same as filepath.WalkDir, but the walkFn argument to fastwalk.Walk must be safe for concurrent use.

Examples can be found in the examples directory.

The below example is a very simple version of the POSIX find utility:

// fwfind is a an example program that is similar to POSIX find,
// but faster and worse (it's an example).
package main

import (
	"flag"
	"fmt"
	"io/fs"
	"os"
	"path/filepath"

	"github.com/charlievieth/fastwalk"
)

const usageMsg = `Usage: %[1]s [-L] [-name] [PATH...]:

%[1]s is a poor replacement for the POSIX find utility

`

func main() {
	flag.Usage = func() {
		fmt.Fprintf(os.Stdout, usageMsg, filepath.Base(os.Args[0]))
		flag.PrintDefaults()
	}
	pattern := flag.String("name", "", "Pattern to match file names against.")
	followLinks := flag.Bool("L", false, "Follow symbolic links")
	flag.Parse()

	// If no paths are provided default to the current directory: "."
	args := flag.Args()
	if len(args) == 0 {
		args = append(args, ".")
	}

	// Follow links if the "-L" flag is provided
	conf := fastwalk.Config{
		Follow: *followLinks,
	}

	walkFn := func(path string, d fs.DirEntry, err error) error {
		if err != nil {
			fmt.Fprintf(os.Stderr, "%s: %v\n", path, err)
			return nil // returning the error stops iteration
		}
		if *pattern != "" {
			if ok, err := filepath.Match(*pattern, d.Name()); !ok {
				// invalid pattern (err != nil) or name does not match
				return err
			}
		}
		_, err = fmt.Println(path)
		return err
	}
	for _, root := range args {
		if err := fastwalk.Walk(&conf, root, walkFn); err != nil {
			fmt.Fprintf(os.Stderr, "%s: %v\n", root, err)
			os.Exit(1)
		}
	}
}

Benchmarks

Benchmarks were created using go1.17.6 and can be generated with the bench_comp make target:

$ make bench_comp
Darwin

Hardware:

goos: darwin
goarch: arm64
cpu: Apple M1 Max
filepath.WalkDir vs. fastwalk.Walk():
              filepath       fastwalk       delta
time/op       27.9ms ± 1%    13.0ms ± 1%    -53.33%
alloc/op      4.33MB ± 0%    2.14MB ± 0%    -50.55%
allocs/op     50.9k ± 0%     37.7k ± 0%     -26.01%
godirwalk.Walk() vs. fastwalk.Walk():
              godirwalk      fastwalk       delta
time/op       58.5ms ± 3%    18.0ms ± 2%    -69.30%
alloc/op      25.3MB ± 0%    2.1MB ± 0%     -91.55%
allocs/op     57.6k ± 0%     37.7k ± 0%     -34.59%
Linux

Hardware:

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
drive: Samsung SSD 970 PRO 1TB
filepath.WalkDir vs. fastwalk.Walk():
              filepath       fastwalk       delta
time/op       10.1ms ± 2%    2.8ms ± 2%     -72.83%
alloc/op      2.44MB ± 0%    1.70MB ± 0%    -30.46%
allocs/op     47.2k ± 0%     36.9k ± 0%     -21.80%
godirwalk.Walk() vs. fastwalk.Walk():
              filepath       fastwalk       delta
time/op       13.7ms ±16%    2.8ms ± 2%     -79.88%
alloc/op      7.48MB ± 0%    1.70MB ± 0%    -77.34%
allocs/op     53.8k ± 0%     36.9k ± 0%     -31.38%
Windows

Hardware:

goos: windows
goarch: amd64
pkg: github.com/charlievieth/fastwalk
cpu: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
filepath.WalkDir vs. fastwalk.Walk():
              filepath       fastwalk       delta
time/op       88.0ms ± 1%    14.6ms ± 1%    -83.47%
alloc/op      5.68MB ± 0%    6.76MB ± 0%    +19.01%
allocs/op     69.6k ± 0%     90.4k ± 0%     +29.87%
godirwalk.Walk() vs. fastwalk.Walk():
              filepath       fastwalk       delta
time/op       87.4ms ± 1%    14.6ms ± 1%    -83.34%
alloc/op      6.14MB ± 0%    6.76MB ± 0%    +10.24%
allocs/op     100k ± 0%      90k ± 0%       -9.59%

Darwin: getdirentries64

The nogetdirentries build tag can be used to prevent fastwalk from using and linking to the non-public __getdirentries64 syscall. This is required if an app using fastwalk is to be distributed via Apple's App Store (see https://github.com/golang/go/issues/30933 for more details). When using __getdirentries64 is disabled, fastwalk will use readdir_r instead, which is what the Go standard library uses for os.ReadDir and is about ~10% slower than __getdirentries64 (benchmarks).

Example of how to build and test that your program is not linked to __getdirentries64:

# NOTE: the following only applies to darwin (aka macOS)

# Build binary that imports fastwalk without linking to __getdirentries64.
$ go build -tags nogetdirentries -o YOUR_BINARY
# Test that __getdirentries64 is not linked (this should print no output).
$ ! otool -dyld_info YOUR_BINARY | grep -F getdirentries64

There is a also a script scripts/links2getdirentries.bash that can be used to check if a program binary links to getdirentries.

Documentation

Overview

Package fastwalk provides a faster version of filepath.Walk for file system scanning tools.

Index

Constants

This section is empty.

Variables

View Source
var DefaultConfig = Config{
	Follow:     false,
	NumWorkers: DefaultNumWorkers(),
}

DefaultConfig is the default Config used when none is supplied.

View Source
var ErrSkipFiles = errors.New("fastwalk: skip remaining files in directory")

ErrSkipFiles is a used as a return value from WalkFuncs to indicate that the callback should not be called for any other files in the current directory. Child directories will still be traversed.

View Source
var ErrTraverseLink = errors.New("fastwalk: traverse symlink, assuming target is a directory")

ErrTraverseLink is used as a return value from WalkFuncs to indicate that the symlink named in the call may be traversed.

View Source
var SkipDir = fs.SkipDir

SkipDir is used as a return value from WalkDirFuncs to indicate that the directory named in the call is to be skipped. It is not returned as an error by any function.

Functions

func DefaultNumWorkers

func DefaultNumWorkers() int

DefaultNumWorkers returns the default number of worker goroutines to use in fastwalk.Walk and is the value of runtime.GOMAXPROCS(-1) clamped to a range of 4 to 32 except on Darwin where it is either 4 (8 cores or less) or 6 (more than 8 cores). This is because Walk / IO performance on Darwin degrades with more concurrency.

The optimal number for your workload may be lower or higher. The results of BenchmarkFastWalkNumWorkers benchmark may be informative.

func IgnoreDuplicateDirs

func IgnoreDuplicateDirs(walkFn fs.WalkDirFunc) fs.WalkDirFunc

IgnoreDuplicateDirs wraps fs.WalkDirFunc walkFn to make it follow symbolic links and ignore duplicate directories (if a symlink points to a directory that has already been traversed it is skipped). The walkFn is called for for skipped directories, but the directory is not traversed (this is required for error handling).

The Config.Follow setting has no effect on the behavior of Walk when this wrapper is used.

In most use cases, the returned fs.WalkDirFunc should not be reused between in another call to Walk. If it is reused, any previously visited file will be skipped.

NOTE: The order of traversal is undefined. Given an "example" directory like the one below where "dir" is a directory and "smydir1" and "smydir2" are links to it, only one of "dir", "smydir1", or "smydir2" will be traversed, but which one is undefined.

example
├── dir
├── smydir1 -> dir
└── smydir2 -> dir

func IgnoreDuplicateFiles

func IgnoreDuplicateFiles(walkFn fs.WalkDirFunc) fs.WalkDirFunc

IgnoreDuplicateFiles wraps walkFn so that symlinks are followed and duplicate files are ignored. If a symlink resolves to a file that has already been visited it will be skipped.

In most use cases, the returned fs.WalkDirFunc should not be reused between in another call to Walk. If it is reused, any previously visited file will be skipped.

This can significantly slow Walk as os.Stat() is called for each path (on Windows, os.Stat() is only needed for symlinks).

func IgnorePermissionErrors

func IgnorePermissionErrors(walkFn fs.WalkDirFunc) fs.WalkDirFunc

IgnorePermissionErrors wraps walkFn so that permission errors are ignored. The returned fs.WalkDirFunc may be reused.

func StatDirEntry

func StatDirEntry(path string, de fs.DirEntry) (fs.FileInfo, error)

StatDirEntry returns the fs.FileInfo for the file or subdirectory described by the entry. If the entry is a symbolic link, StatDirEntry returns the fs.FileInfo for the file the line references (os.Stat). If fs.DirEntry de is a fastwalk.DirEntry it's Stat() method is used and the returned fs.FileInfo may be a previously cached result.

func Walk

func Walk(conf *Config, root string, walkFn fs.WalkDirFunc) error

Walk is a faster implementation of filepath.Walk.

filepath.Walk's design necessarily calls os.Lstat on each file, even if the caller needs less info. Many tools need only the type of each file. On some platforms, this information is provided directly by the readdir system call, avoiding the need to stat each file individually. fastwalk_unix.go contains a fork of the syscall routines.

See golang.org/issue/16399

Walk walks the file tree rooted at root, calling walkFn for each file or directory in the tree, including root.

If walkFn returns filepath.SkipDir, the directory is skipped.

Unlike filepath.WalkDir:

  • File stat calls must be done by the user and should be done via the DirEntry argument to walkFn since it caches the results of Stat and Lstat.
  • The fs.DirEntry argument is always a fastwalk.DirEntry, which has a Stat() method that returns the result of calling os.Stat() on the file. The result of Stat() may be cached.
  • Multiple goroutines stat the filesystem concurrently. The provided walkFn must be safe for concurrent use.
  • Walk can follow symlinks if walkFn returns the ErrTraverseLink sentinel error. It is the walkFn's responsibility to prevent Walk from going into symlink cycles.

Types

type Config

type Config struct {

	// Follow symbolic links ignoring directories that would lead
	// to infinite loops; that is, entering a previously visited
	// directory that is an ancestor of the last file encountered.
	//
	// The sentinel error ErrTraverseLink is ignored when Follow
	// is true (this to prevent users from defeating the loop
	// detection logic), but SkipDir and ErrSkipFiles are still
	// respected.
	Follow bool

	// Number of parallel workers to use. If NumWorkers if ≤ 0 then
	// the greater of runtime.NumCPU() or 4 is used.
	NumWorkers int
}

type DirEntry

type DirEntry interface {
	fs.DirEntry

	// Stat returns the FileInfo for the file or subdirectory described
	// by the entry. The returned FileInfo may be from the time of the
	// original directory read or from the time of the call to Stat.
	// If the entry denotes a symbolic link, Stat reports the information
	// about the target itself, not the link.
	Stat() (fs.FileInfo, error)
}

A DirEntry extends the fs.DirEntry interface to add a Stat() method that returns the result of calling os.Stat() on the underlying file. The results of Info() and Stat() are cached.

The fs.DirEntry argument passed to the fs.WalkDirFunc by Walk is always a DirEntry. The only exception is the root directory with with Walk is called.

type EntryFilter

type EntryFilter struct {
	// contains filtered or unexported fields
}

An EntryFilter keeps track of visited directory entries and can be used to detect and avoid symlink loops or processing the same file twice.

func NewEntryFilter

func NewEntryFilter() *EntryFilter

NewEntryFilter returns a new EntryFilter

func (*EntryFilter) Entry

func (e *EntryFilter) Entry(path string, de fs.DirEntry) (seen bool)

Entry returns if path and fs.DirEntry have been seen before.

Directories

Path Synopsis
examples
fwfind
fwfind is a an example program that is similar to POSIX find, but faster and worse (it's an example).
fwfind is a an example program that is similar to POSIX find, but faster and worse (it's an example).
fwwc
fwwc is a an example program that recursively walks directories and prints the number of lines in each file it encounters.
fwwc is a an example program that recursively walks directories and prints the number of lines in each file it encounters.
internal
dirent
Package dirent parses raw syscall dirents
Package dirent parses raw syscall dirents

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL