aistore

module
v0.0.0-...-28c66ae Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 14, 2019 License: MIT

README

AIStore: scalable storage for AI

License Go Report Card

AIStore (AIS for short) is a built from scratch, lightweight storage stack tailored for AI apps. At its version 2.x, AIS consistently shows balanced I/O distribution across arbitrary numbers of clustered servers and hard drives (consistently) producing performance charts that look as follows:

I/O distribution

The picture above comprises 120 HDDs.

The capability to linearly scale-out for millions of stored objects (often also referred to as shards) was, and remains, one of the main incentives to build AIStore. But not the only one.

Further, AIS:

  • scales-out with no downtime and no limitation;
  • supports n-way mirroring (RAID-1), m/k erasure coding, end-to-end data protection;
  • includes Map/Reduce extension to speed-up shuffle/resize for large datasets;
  • runs on commodity hardware with no limitations and no special requirements;
  • provides a proper subset of S3-like REST API;
  • leverages Linux 4.15+ storage stack;
  • natively supports Amazon S3 and Google Cloud backends;
  • focuses on AI and, specifically, on the performance of large-scale deep learning.

Last but not the least, AIS features open format and, therefore, freedom to copy or move your data off of AIS at any time using familiar Linux scp and such. For a detailed introduction that includes design philosophy, key concepts, and system components, please see AIS overview.

Table of Contents

Prerequisites

  • Linux (with gcc, sysstat and attr packages, and kernel 4.15+)
  • Go 1.12.5 or later
  • Extended attributes (xattrs - see below)
  • Optionally, Amazon (AWS) or Google Cloud Platform (GCP) account(s)

Depending on your Linux distribution you may or may not have gcc, sysstat, and/or attr packages - to install, use apt-get (Debian), yum (RPM), or other applicable package management tool, e.g.:

$ apt-get install sysstat

The capability called extended attributes, or xattrs, is a long time POSIX legacy and is supported by all mainstream filesystems with no exceptions. Unfortunately, extended attributes (xattrs) may not always be enabled (by the Linux distribution you are using) in the Linux kernel configurations - the fact that can be easily found out by running setfattr command.

If disabled, please make sure to enable xattrs in your Linux kernel configuration.

Getting Started

AIStore runs on commodity Linux machines with no special hardware requirements.

It is expected, though, that all AIS target machines are identical, hardware-wise.

The implication is that the number of possible deployment options is practically unlimited. This section covers 3 (three) ways to deploy AIS on a single Linux machine and is intended for developers and development, and/or for a quick trial.

Local non-Containerized

Assuming that Go is already installed, the remaining getting-started steps are:

$ cd $GOPATH/src
$ go get -v github.com/NVIDIA/aistore/ais
$ cd github.com/NVIDIA/aistore/ais
$ make deploy
$ go test ./tests -v -run=Mirror

The go get command installs AIS sources and all the versioned dependencies under your configured $GOPATH.

The make deploy command deploys AIStore daemons locally based on a few prompted Q&A. The example shown below deploys 10 targets (each with 2 local simulated filesystems) and 3 gateways, and will not require (or expect) to access Cloud storage (notice the "Cloud Provider" prompt below):

# make deploy
Enter number of storage targets:
10
Enter number of proxies (gateways):
3
Number of local cache directories (enter 0 to use preconfigured filesystems):
2
Select Cloud Provider:
1: Amazon Cloud
2: Google Cloud
3: None
Enter your choice:
3

Or, you can run all of the above with a single command:

# make kill; ./setup/deploy.sh <<< $'10\n3\n2\n3'

make kill will terminate local AIStore if it's already running.

To enable optional AIStore authentication server, execute instead $ CREDDIR=/tmp/creddir AUTHENABLED=true make deploy. For information on AuthN server, please see AuthN documentation.

Finally, the go test (above) will create an ais bucket, configure it as a two-way mirror, generate thousands of random objects, read them all several times, and then destroy the replicas and eventually the bucket as well.

Alternatively, if you happen to have Amazon and/or Google Cloud account, make sure to specify the corresponding bucket name when running go test For example, the following will download objects from your (presumably) S3 bucket and distribute them across AIStore:

$ BUCKET=myS3bucket go test ./tests -v -run=download

Here's a minor variation of the above:

$ BUCKET=myS3bucket go test ./tests -v -run=download -args -numfiles=100 -match='a\d+'

This command runs test that matches the specified string ("download"). The test then downloads up to 100 objects from the bucket called myS3bucket, whereby the names of those objects match a\d+ regex.

In addition to the AIS cluster itself you can deploy AIS CLI - an easy-to-use AIS-integrated command-line management tool. The tool supports multiple commands and options; the first one that you may want to try is ais status to show state and status of the AIS cluster and its nodes. AIS CLI deployment is documented in the CLI readme and includes two easy steps: building the binary (via cli/install.sh) and sourcing Bash auto-completions.

For more testing commands and command line options, please refer to the corresponding README and/or the test sources. For other useful commands, see the Makefile.

For tips and help pertaining to local non-containerized deployment, please see the tips.

For info on how to run AIS executables, see command-line arguments.

For helpful links and background on Go, AWS, GCP, and Deep Learning, please see helpful links.

Local Docker-Compose

The 2nd option to run AIS on your local machine requires Docker and Docker-Compose. It also allows for multi-clusters deployment with multiple separate networks. You can deploy a simple AIS cluster within seconds or deploy a multi-container cluster for development.

AIS v2.1 supports up to 3 (three) logical networks: user (or public), intra-cluster control and intra-cluster data networks.

To get started with AIStore and Docker, see: Getting started with Docker.

Local Kubernetes

The 3rd and final local-deployment option makes use of Kubeadm and is documented here.

Containerized Deployment and Host Resource Sharing

The following applies to all containerized deployments: local and non-local - the latter including those that are "kubernetized".

  1. AIS nodes always automatically detect containerization.
  2. If deployed as a container, each AIS node independently discovers whether its own container's memory and/or CPU resources are restricted.
  3. Finally, the node then abides by those restrictions.

To that end, each AIS node at startup loads and parses cgroup settings for the container and, if the number of CPUs is restricted, adjusts the number of allocated system threads for its goroutines.

This adjustment is accomplished via the Go runtime GOMAXPROCS variable. For in-depth information on CPU bandwidth control and scheduling in a multi-container environment, please refer to the CFS Bandwidth Control document.

Further, given the container's cgroup/memory limitation, each AIS node adjusts the amount of memory available for itself. Note, however, that memory in particular may affect dSort and erasure coding performance "forcing" those two to, effectively, "spill" their temporary content onto local drives, etc.

For technical details on AIS memory management, please see this readme.

Performance Monitoring

As is usually the case with storage clusters, there are multiple ways to monitor their performance.

AIStore includes aisloader - the tool to stress-test and benchmark storage performance. For background, command-line options and usage, please see AIS Load Generator.

For starters, AIS collects and logs a fairly large and constantly growing number of counters that describe all aspects of its operation including (but not limited to) those that reflect cluster recovery/rebalancing, all extended long-running operations, and, of course, object storage transactions.

In particular:

For dSort monitoring, please see dSort For Downloader monitoring, please see Internet Downloader

The logging interval is called stats_time (default 10s) and is configurable on the level of both each specific node and the entire cluster.

However. Speaking of ways to monitor AIS remotely, the two most obvious ones would be:

As far as Graphite/Grafana, AIS integrates with these popular backends via StatsD - the daemon for easy but powerful stats aggregation. StatsD can be connected to Graphite which then can be used as a data-source for Grafana to get visual overview of the statistics and metrics.

The scripts for easy deployment of both Graphite and Grafana are included (see below).

For local non-containerized deployments, use ./ais/setup/deploy_grafana.sh to start Graphite and Grafana containers. Local deployment will automatically notice the presence of the containers and will send statistics to the Graphite.

For local docker-compose based deployments, make sure to use -grafana command-line option. The deploy_docker.sh script will then spin-up Graphite and Grafana containers.

In both of these cases, Grafana will be accessible at localhost:3000.

For information on AIS statistics, please see Statistics, Collected Metrics, Visualization

Guides and References

Selected Package READMEs

Directories

Path Synopsis
3rdparty
atomic
Package atomic provides simple wrappers around numerics to enforce atomic access.
Package atomic provides simple wrappers around numerics to enforce atomic access.
glog
Package glog implements logging analogous to the Google-internal C++ INFO/ERROR/V setup.
Package glog implements logging analogous to the Google-internal C++ INFO/ERROR/V setup.
webdav
Package webdav provides a WebDAV server implementation.
Package webdav provides a WebDAV server implementation.
webdav/internal/xml
Package xml implements a simple XML 1.0 parser that understands XML name spaces.
Package xml implements a simple XML 1.0 parser that understands XML name spaces.
ais
Package ais provides core functionality for the AIStore object storage.
Package ais provides core functionality for the AIStore object storage.
setup command
This file is used to start the AIS daemon * Copyright (c) 2018, NVIDIA CORPORATION.
This file is used to start the AIS daemon * Copyright (c) 2018, NVIDIA CORPORATION.
Package api provides RESTful API to AIS object storage * Copyright (c) 2018, NVIDIA CORPORATION.
Package api provides RESTful API to AIS object storage * Copyright (c) 2018, NVIDIA CORPORATION.
bench
aisloader command
aisloader/namegetter
* Copyright (c) 2019, NVIDIA CORPORATION.
* Copyright (c) 2019, NVIDIA CORPORATION.
aisloader/stats
Package stats provides various structs for collecting stats * Copyright (c) 2018, NVIDIA CORPORATION.
Package stats provides various structs for collecting stats * Copyright (c) 2018, NVIDIA CORPORATION.
disk/compare command
frandread command
Package frandread is a file-reading benchmark that makes a special effort to visit the files randomly and equally.
Package frandread is a file-reading benchmark that makes a special effort to visit the files randomly and equally.
hrw
Package hrw_bench provides a way to benchmark different HRW variants.
Package hrw_bench provides a way to benchmark different HRW variants.
http2 command
This go script puts a given number of files with a given size into AIStore.
This go script puts a given number of files with a given size into AIStore.
map command
soaktest command
soaktest/recipes
Package recipes contains all the recipes for soak test * Copyright (c) 2019, NVIDIA CORPORATION.
Package recipes contains all the recipes for soak test * Copyright (c) 2019, NVIDIA CORPORATION.
soaktest/report
Package report provides the framework for collecting results of the soaktest * Copyright (c) 2019, NVIDIA CORPORATION.
Package report provides the framework for collecting results of the soaktest * Copyright (c) 2019, NVIDIA CORPORATION.
soaktest/scheduler
Package scheduler provides scheduling of recipes and regression within soaktest * Copyright (c) 2019, NVIDIA CORPORATION.
Package scheduler provides scheduling of recipes and regression within soaktest * Copyright (c) 2019, NVIDIA CORPORATION.
soaktest/soakcmn
Package soakcmn provides constants and variables shared across soaktest * Copyright (c) 2019, NVIDIA CORPORATION.
Package soakcmn provides constants and variables shared across soaktest * Copyright (c) 2019, NVIDIA CORPORATION.
soaktest/soakprim
Package soakprim provides the framework for running soak tests * Copyright (c) 2019, NVIDIA CORPORATION.
Package soakprim provides the framework for running soak tests * Copyright (c) 2019, NVIDIA CORPORATION.
soaktest/stats
Package stats keeps track of all the different statistics collected by the report * Copyright (c) 2019, NVIDIA CORPORATION.
Package stats keeps track of all the different statistics collected by the report * Copyright (c) 2019, NVIDIA CORPORATION.
cli
Package main is used as command-line interpreter for AIS * Copyright (c) 2019, NVIDIA CORPORATION.
Package main is used as command-line interpreter for AIS * Copyright (c) 2019, NVIDIA CORPORATION.
commands
Package commands provides the set of CLI commands used to communicate with the AIS cluster.
Package commands provides the set of CLI commands used to communicate with the AIS cluster.
templates
Package templates provides the set of templates used to format output for the CLI.
Package templates provides the set of templates used to format output for the CLI.
Package cluster provides common interfaces and local access to cluster-level metadata * Copyright (c) 2019, NVIDIA CORPORATION.
Package cluster provides common interfaces and local access to cluster-level metadata * Copyright (c) 2019, NVIDIA CORPORATION.
Package cmn provides common low-level types and utilities for all aistore projects * Copyright (c) 2018, NVIDIA CORPORATION.
Package cmn provides common low-level types and utilities for all aistore projects * Copyright (c) 2018, NVIDIA CORPORATION.
Package containers provides common utilities for managing containerized deployments of AIS * Copyright (c) 2019, NVIDIA CORPORATION.
Package containers provides common utilities for managing containerized deployments of AIS * Copyright (c) 2019, NVIDIA CORPORATION.
Package downloader implements functionality to download resources into AIS cluster from external source.
Package downloader implements functionality to download resources into AIS cluster from external source.
Package dsort provides APIs for distributed archive file shuffling.
Package dsort provides APIs for distributed archive file shuffling.
extract
Package extract provides provides functions for working with compressed files * Copyright (c) 2018, NVIDIA CORPORATION.
Package extract provides provides functions for working with compressed files * Copyright (c) 2018, NVIDIA CORPORATION.
filetype
Package filetype provides the implementation of custom content file type for dsort.
Package filetype provides the implementation of custom content file type for dsort.
Package ec provides erasure coding (EC) based data protection for AIStore.
Package ec provides erasure coding (EC) based data protection for AIStore.
Package filter implements fully features dynamic probabilistic filter.
Package filter implements fully features dynamic probabilistic filter.
Package fs provides mountpath and FQN abstractions and methods to resolve/map stored content * Copyright (c) 2018, NVIDIA CORPORATION.
Package fs provides mountpath and FQN abstractions and methods to resolve/map stored content * Copyright (c) 2018, NVIDIA CORPORATION.
Package health provides a basic mountpath health monitor.
Package health provides a basic mountpath health monitor.
housekeep
hk
Package hk provides mechanism for registering cleanup functions which are invoked at specified intervals.
Package hk provides mechanism for registering cleanup functions which are invoked at specified intervals.
lru
Package lru provides least recently used cache replacement policy for stored objects and serves as a generic garbage-collection mechanism for orhaned workfiles.
Package lru provides least recently used cache replacement policy for stored objects and serves as a generic garbage-collection mechanism for orhaned workfiles.
Package ios is a collection of interfaces to the local storage subsystem; the package includes OS-dependent implementations for those interfaces.
Package ios is a collection of interfaces to the local storage subsystem; the package includes OS-dependent implementations for those interfaces.
Package memsys provides memory management and Slab allocation with io.Reader and io.Writer interfaces on top of a scatter-gather lists (of reusable buffers) * Copyright (c) 2018, NVIDIA CORPORATION.
Package memsys provides memory management and Slab allocation with io.Reader and io.Writer interfaces on top of a scatter-gather lists (of reusable buffers) * Copyright (c) 2018, NVIDIA CORPORATION.
Package mirror provides local mirroring and replica management * Copyright (c) 2018, NVIDIA CORPORATION.
Package mirror provides local mirroring and replica management * Copyright (c) 2018, NVIDIA CORPORATION.
Package objwalk provides core functionality for reading the list of a bucket objects * Copyright (c) 2019, NVIDIA CORPORATION.
Package objwalk provides core functionality for reading the list of a bucket objects * Copyright (c) 2019, NVIDIA CORPORATION.
Package stats provides methods and functionality to register, track, log, and StatsD-notify statistics that, for the most part, include "counter" and "latency" kinds.
Package stats provides methods and functionality to register, track, log, and StatsD-notify statistics that, for the most part, include "counter" and "latency" kinds.
Package sys provides methods to read system information * Copyright (c) 2019, NVIDIA CORPORATION.
Package sys provides methods to read system information * Copyright (c) 2019, NVIDIA CORPORATION.
Package transport provides streaming object-based transport over http for intra-cluster continuous intra-cluster communications (see README for details and usage example).
Package transport provides streaming object-based transport over http for intra-cluster continuous intra-cluster communications (see README for details and usage example).
Package tutils provides common low-level utilities for all aistore unit and integration tests * Copyright (c) 2018, NVIDIA CORPORATION.
Package tutils provides common low-level utilities for all aistore unit and integration tests * Copyright (c) 2018, NVIDIA CORPORATION.
tassert
Package tassert provides common asserts for tests * Copyright (c) 2019, NVIDIA CORPORATION.
Package tassert provides common asserts for tests * Copyright (c) 2019, NVIDIA CORPORATION.
Package xoshiro256 implements the xoshiro256** RNG Translated from http://xoshiro.di.unimi.it/xoshiro256starstar.c Scrambled Linear Pseudorandom Number Generators David Blackman, Sebastiano Vigna https://arxiv.org/abs/1805.01407 http://www.pcg-random.org/posts/a-quick-look-at-xoshiro256.html
Package xoshiro256 implements the xoshiro256** RNG Translated from http://xoshiro.di.unimi.it/xoshiro256starstar.c Scrambled Linear Pseudorandom Number Generators David Blackman, Sebastiano Vigna https://arxiv.org/abs/1805.01407 http://www.pcg-random.org/posts/a-quick-look-at-xoshiro256.html

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL