git-backup

command module
v0.0.0-...-c9db60e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 7, 2021 License: GPL-3.0 Imports: 30 Imported by: 0

README

=======================================================================
 Git-backup - Backup set of Git repositories & just files; efficiently
=======================================================================

:author: Kirill Smelkov <kirr@nexedi.com>
:date:   2015 Aug 31


This program backups files and set of bare Git repositories into one Git repository.
Files are copied to blobs and then added to tree under certain place, and for
Git repositories, all reachable objects are pulled in with maintaining index
which remembers reference -> sha1 for all pulled repositories.

This allows to leverage Git's good data deduplication ability, especially for
cases when there are many hosted repositories which are forks of each other,
and for backup to have history and be otherwise managed as a usual Git
repository.  In particular it is possible to use standard git pull/push to
synchronize backups in several places.

Backup workflow is:

1. create backup repository::

     $ mkdir backup
     $ cd backup
     $ git init         # both bare and non-bare possible

2. pull files and Git repositories into backup repository::

     $ git-backup pull dir1:prefix1 dir2:prefix2 ...

   This will pull bare Git repositories & just files from `dir1` into backup
   under `prefix1`, from `dir2` into backup prefix `prefix2`, etc...

3. restore files and Git repositories from backup::

     $ git-backup restore <backup-state-sha1> prefix1:dir1

   Restore Git repositories & just files from backup `prefix1` into `dir1`,
   from backup `prefix2` into `dir2`, etc...

   Backup state to restore is taken from <backup-state-sha1> which is sha1 or
   ref pointing to backup repository state.

4. backup repository itself can be managed with Git. In particular it can be
   synchronized between several places with standard git pull/push, be
   repacked, etc::

     $ git push ...
     $ git pull ...


Please see `git-backup.go`__ source with technical overview on how it works.

We also provide convenience program to pull/restore backup data for a GitLab
instance into/from git-backup managed repository. See `contrib/gitlab-backup`__
for details.


__ git-backup.go
__ contrib/gitlab-backup

Documentation

Overview

Git-backup - Backup set of Git repositories & just files; efficiently.

This program backups files and set of bare Git repositories into one Git repository. Files are copied to blobs and then added to tree under certain place, and for Git repositories, all reachable objects are pulled in with maintaining index which remembers reference -> sha1 for every pulled repositories.

After objects from backuped Git repositories are pulled in, we create new commit which references tree with changed backup index and files, and also has all head objects from pulled-in repositories in its parents(*). This way backup has history and all pulled objects become reachable from single head commit in backup repository. In particular that means that the whole state of backup can be described with only single sha1, and that backup repository itself could be synchronized via standard git pull/push, be repacked, etc.

Restoration process is the opposite - from a particular backup state, files are extracted at a proper place, and for Git repositories a pack with all objects reachable from that repository heads is prepared and extracted from backup repository object database.

This approach allows to leverage Git's good ability for object contents deduplication and packing, especially for cases when there are many hosted repositories which are forks of each other with relatively minor changes in between each other and over time, and mostly common base. In author experience the size of backup is dramatically smaller compared to straightforward "let's tar it all" approach.

Data for all backuped files and repositories can be accessed if one has access to backup repository, so either they all should be in the same security domain, or extra care has to be taken to protect access to backup repository.

File permissions are not managed with strict details due to inherent nature of Git. This aspect can be improved with e.g. etckeeper-like (http://etckeeper.branchable.com/) approach if needed.

Please see README.rst with user-level overview on how to use git-backup.

NOTE the idea of pulling all refs together is similar to git-namespaces

http://git-scm.com/docs/gitnamespaces

(*) Tag objects are handled specially - because in a lot of places Git insists and

assumes commit parents can only be commit objects. We encode tag objects in
specially-crafted commit object on pull, and decode back on backup restore.

We do likewise if a ref points to tree or blob, which is valid in Git.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL