core

module
v0.3.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 8, 2021 License: Apache-2.0

README

WRGL

Git-like data versioning. It can handle files up to 10s of Gigabytes. How it differs from other alternative such as dolt is that it doesn't require a schema up front, any arbitrary CSV file can be commited and that it can display much more detailed diff.

Installation

sudo bash -c 'curl -L https://github.com/wrgl/core/releases/latest/download/install.sh | bash'

Usage

# initialize repository. This will create a .wrgl folder
wrgl init

# create a new branch by committing a CSV file
wrgl commit my-branch my_data.csv --primary-key id

# create another commit under the same branch
wrgl commit my-branch my_other_data.csv --primary-key id

# show diff between the last 2 commits
wrgl diff my-branch my-branch^

# output diff to JSON stream
wrgl diff my-branch my-branch^ --format json

# display list of commits within a branch
wrgl log my-branch

# preview data withint a commit
wrgl preview my-branch~2

# export data back to CSV
wrgl export my-branch > data.csv

# list branches
wrgl branch

# delete branch
wrgl branch -d my-branch

Version compatibility

This software isn't ready for production so all new minor version introduce breaking changes. You will need to throw away the entire .wrgl folder once you upgrade to a new minor version e.g. 0.1.x to 0.2.x.

Dealing with big file

For tables that have more than 1 << 24 (~16M) rows, the system automatically save table to big store. Big store does not keep the table in memory so while it is slightly slower it should be able to deal with arbitrarily large files. If you are committing huge files remember to set ulimit to something big:

ulimit -n 10000

wrgl commit my-branch big_file.csv --primary-key my_key

Roadmap

  • Add ability to setup remote and sync files between local and remotes just like Git

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL