data_versioning/

directory
v0.0.0-...-b3f521c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 19, 2017 License: Apache-2.0

README

Data Versioning

Even if you have code versioned (e.g., via git), you simply can’t reproduce an analysis if you don’t run the code on the same data. This means that you need to have a plan and tooling in place to retrieve the state of both your analysis and your data at certain points in history. Data science prior to data versioning is a little bit like software engineering before Git.

Notes

Pachyderm - the system we will use for data versioning
github.com/pachyderm/pachyderm/src/client docs

Code Review

Connecting to a running instance of Pachyderm
Creating a data repository
Committing data into a repository
Retrieving data from a repository

Exercises

Exercise 1

Create another data repository called "diabetes." We will use this repository to version other data that we will use throughout the course.

Template | Answer

Exercise 2

Make a commit of the data in diabetes.csv to the newly created "diabetes" data repository.

Template | Answer


All material is licensed under the Apache License Version 2.0, January 2004.

Directories

Path Synopsis
Sample program that connects to a running instance of Pachyderm.
Sample program that connects to a running instance of Pachyderm.
Sample program that creates a pachyderm data repository.
Sample program that creates a pachyderm data repository.
Sample program that commits data into Pachyderm data versioning.
Sample program that commits data into Pachyderm data versioning.
Sample program that gets a versioned dataset/file from Pachyderm.
Sample program that gets a versioned dataset/file from Pachyderm.
exercises
exercise1
Sample program that creates a pachyderm data repository.
Sample program that creates a pachyderm data repository.
exercise2
Sample program that commits data into pachyderm's data versioning.
Sample program that commits data into pachyderm's data versioning.
template1
Sample program that creates a pachyderm data repository.
Sample program that creates a pachyderm data repository.
template2
Sample program that commits data into pachyderm's data versioning.
Sample program that commits data into pachyderm's data versioning.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL