Incremental tar backups
GNU Tar has a reputation for being slightly awkward to
use. Despite this, tar
is one of the most frequently used archive
commands, and is installed by default on most Linux distributions.
A very common use for tar
is creating regular backups. However if you backup
a directory which changes infrequently, this can lead to a large number of
duplicate files being stored. This post is going to look at using incremental
tar files to get around this problem.
A simple tar backup
A very simple backup script might look something like the following:
#!/bin/sh
#
# Create a daily backup of /home/someuser
#
set -e
PATH='/sbin:/bin:/usr/sbin:/usr/bin'
[ -d /var/home_backups ] || mkdir --mode=0700 /var/home_backups
tar -cjf "$(date +'/var/home_backups/%Y-%m-%d_someuser_home.tar.bz2')" /home/someuser
Each time the script above is run, a full date stamped tar archive will be
created. If the script is run daily (e.g. via cron),
/var/home_backups
will look something like the following after a few days:
+-- /var/home_backups
+-- 2018-01-21_someuser_home.tar.bz2
+-- 2018-01-22_someuser_home.tar.bz2
+-- 2018-01-23_someuser_home.tar.bz2
+-- 2018-01-24_someuser_home.tar.bz2
+-- 2018-01-25_someuser_home.tar.bz2
+-- 2018-01-26_someuser_home.tar.bz2
Unless the contents of /home/someuser
changes rapidly, each archive will
contain duplicate files.
Using incremental tar files
Instead of always using full backups, tar can create incremental backups. This involves two steps:
-
First create a full backup and a snapshot file:
tar --listed-incremental full_backup.snar -cjf full_backup.tar.bz2 /home/someuser
This will create two files, a full backup of the directory (
full_backup.tar.bz2
), and a snapshot file (full_backup.snar
). The snapshot file will contain timestamps and file metadata. The snapshot file format is described in the tar docs. -
Subsequent backups can then use the snapshot files to create an incremental tar archive which will skip unmodified files:
cp full_backup.snar increment.snar tar --listed-incremental increment.snar -cjf incremental_backup.tar.bz2 /home/someuser rm increment.snar
Note: by default the snapshot file is overwritten when an incremental backup is created. If you want to create multiple incremental backups from the base archive, make sure you make a copy of the snapshot file.
Using the steps above, the example script can be updated to something similar to the following:
#!/bin/sh
#
# Create a daily backup of /home/someuser
#
set -e
PATH='/sbin:/bin:/usr/sbin:/usr/bin'
[ -d /var/home_backups ] || mkdir --mode=0700 /var/home_backups
if [ -f /var/home_backups/full_backup.snar ]; then
snapshot_copy="$(mktemp)"
cp /var/home_backups/full_backup.snar "$snapshot_copy"
tar --listed-incremental "$snapshot_copy" \
-cjf "$(date +'/var/home_backups/%Y-%m-%d_someuser_home.tar.bz2')" /home/someuser
rm "$snapshot_copy"
else
tar --listed-incremental /var/home_backups/full_backup.snar \
-cjf /var/home_backups/full_backup.tar.bz2 /home/someuser
fi
This script will initially create a full backup when it is first run. Subsequent backups will then be incremental backups based on the initial full backup:
+-- /var/home_backups
+-- 2018-01-21_someuser_home.tar.bz2
+-- 2018-01-21_someuser_home.tar.bz2
+-- 2018-01-22_someuser_home.tar.bz2
+-- 2018-01-23_someuser_home.tar.bz2
+-- 2018-01-24_someuser_home.tar.bz2
+-- 2018-01-25_someuser_home.tar.bz2
+-- 2018-01-26_someuser_home.tar.bz2
+-- full_backup.snar
+-- full_backup.tar.bz2
Restoring incremental backups
To restore an incremental backup, the base archive needs to be extracted, followed by the incremental backup:
tar --listed-incremental /dev/null -xf /var/home_backups/full_backup.tar.bz2
tar --listed-incremental /dev/null -xf /var/home_backups/2018-01-25_someuser_home.tar.bz2
The --listed-incremental
option is required to ensure files deleted before
the final incremental file was created, are not present in the restored folder.
To achieve this tar stores additional metadata for directories in each archive.
The metadata can be viewed by using two verbose (-v
) options:
$ tar --incremental -tvvf /var/home_backups/2018-01-26_someuser_home.tar.bz2
drwx------ someuser/someuser 71 2018-01-26 22:36 home/someuser/
N .bash_logout
N .bash_profile
N .bashrc
Y file.0
Y file.2
N file.5
Y wibble
-rw-r--r-- someuser/someuser 1048576 2018-01-26 22:36 home/someuser/file.0
-rw-r--r-- someuser/someuser 1048576 2018-01-26 22:36 home/someuser/file.2
-rw-r--r-- someuser/someuser 94 2018-01-26 22:36 home/someuser/wibble
Note: the snapshot file is not required to restore incremental backup files because the metadata embedded in the archive is sufficient.