ETBS backup scripts
From ExtraTerrestrik
ETBS is a service to pull backups of important filesystems from workstation computers, mostly the local /home directories on your computers.
A cron script on the central server called etbs will pull the backups.
The backups are pulled via rsync(1) into a daily snapshot. ETBS thus also provides a history of your filesystems. Files that did not change will be hardlinked, so each daily copy does not use extra disk space on the backup partition. The service is mostly for Linux workstation, but Windows can be supported as well, see below.
Contents[hide] |
HOWTO get access to your backups
The backup partition is available via NFS as etbs:/data/backups You need to configure your automounter /etc/auto.data or your /etc/fstab to mount this directory on your computer. Generally your files can be found in /data/backups/HOST/DIR/YYYY-MM-DD/, where HOST is the name of your computer, DIR is "home", YYYY-MM-DD is tha date of the snapshot.
The directory /data/backups/windows/ is available through SAMBA as //ETBS/BACKUPS. Windows machine backups will go there.
Both NFS and SAMBA access is read-only.
HOWTO setup your machine for backups
A two-step process is required for your files to get backed up. First, grant root access to your computer from the etbs server. Second tell the ETBS administrator what filesystem you need backup service for. Third, optionally, configue exclusion filters to limit the backup to the important stuff, so your backup request will not get rejected due to size.
Step 1: client setup
Append the ssh key in /data/backups/etc/id_rsa.etbs to your computers root account /root/.ssh/authorized_keys
That key looks like this:
from="10.1.1.1" ssh-rsa AA...WQ== root@etbs
Such a key will only allow login from the backup server, no other host will be able to use the key to get access to your computer. This is not entirely true, since an attacker can spoof the IP-address of the backup server. The secret key is password protected and better not stored on the virtual server itself. To startup the etbs service after reboot, an admin must start an ssh-agent and provide it with the secret key.
Step 2: server setup
Contact a backup administrator, who will check the size of your /home/HOST/ directory, to see if it complies with the ideas how big a home directory should be, and then setup the backup service. If necessary, proceed with step three, to get the backup volume down.
At this point the administrator will login manually to your machine, so that ssh knows about your computers identity, so that future logins happen unattended.
source directory
The source of the backup is derived from that backup target directory name. If the target is /data/backups/HOST/home, then the source will be HOST:/home/HOST/. Else, if the target is /data/backups/HOST/PATH, then the source will be HOST:/PATH, with all occurences of a dash '-' in PATH replaced by a slash '/'.
source hostname
If the unqualified HOST does not resolve on the backup server, the administrator can install a file named hosts on the backup target directory. The hostname of the backup source is the first word in the first line of that file, which includes the string HOST. In most cases, just put the fully qualified hostname into that file.
overrides
In case all that is not sufficient to specify the backup source, or target, the administrator can put a file named env into the backup target directory, which can override the source or target of the backup, with lines line this:
SOURCE=rsync-source-specification TDIR=real-target-directory
Step 3: exclusion filters
There are two reasons why you may want to exclude certain parts of your filesystem from backup: 1. To get the volume down, 2. for privacy reasons. There are two ways to exclude parts of your filesystem from backup.
exclude file
Into your backup target directory /data/backups/HOST/PATH/ the administrator can put a file named exclude. This file shall include rsync exclusion rules for your filesystem. In the simple case, this is just filenames or direcory names to exclude.
(Note: the rsync option used is --exclude-from=/data/backups/HOME/DIR/exclude)
When that exclude file does not exist, some directories will be excluded. This is basically your browser cache.
If you find that an administrator placed an exclude file into your backup directory, please verify that no important stuff is excluded.
.etbs-filter
You may place files named .etbs-filter anywhere into your home directory tree. This file shall provide rsync filter rules that apply only to the subdirectory tree where the .etbs-filter file is found. Please refer to the rsync(1) man page for the syntax.
For example, these two lines (without leading spaces) will exclude the whole directory tree from this point down. But the filter file itself is included.
+ .etbs-filter - *
There is a file /data/backups/etc/all.etbs-filter with these exact lines, that you can copy and rename into your directories that shall not be backed up.
(Note: the rsync option used is --filter=dir-merge_/.etbs-filter)
Step 4: options
The rsync is started with the options
-aqHx --ignore-existing --numeric-ids
which says to do an archive copy without traversing into submounted filesystems and without translating owner ids. In addition, the options
--max-size=100M --bwlimit=10000 --filter=dir-merge_/.etbs-filter
are use, except when there is a file options in your backup directory: /data/backups/HOST/PATH/options. When that files exists, the above options are replaced by the contents of that file. This allows to finetune the backup, in case you have special needs.
Without cusomization, any file larger than 100 MByte will not be included in the backup. And the backup will proceed at most with 10MByte per second, to avoid loading network.
When there is an options file, the dir-merge filter option needs to be included in the file if required.
(Note: rsync will hardlink to unchanged files in the last four snapshots it finds in the backup directory, using --link-dest=SNAPSHOT. In case that the backup runs multiple times a day, rsync shall not change existing files from earlier that day. That is why --ignore-existing is a mandatory option. rsync will not delete files that disappeared during the day. That means, the SNAPSHOTS are not consistent atomic snapshots. In case you need this kind of atomicity, please proceed to Step 6.)
Step 5: very special backup needs
If all of the above is not sufficient to fulfill your backup needs, the administrator can replace the file /data/backups/HOST/PATH/dobackup with any other script or program to perform your backup. Preferably by generating those kind of hard-linked daily snapshots.
Step 6: backup time
The script /data/backups/HOST/DIR/dobackup is run two or three times a day, e.g., at 04:00, 13:00, and 16:00. If you have special needs how to schedule your backups, the administrator can setup a customised crontab entry for you.
(Note: The script /data/backups/bin/dobackups searches for all files called /data/backups/HOST/DIR/dobackup$1 and executes them as root. The standard crontab entry calls it without argument $1, but for customisation is can be called with an argument, to search for properly renamed dobackup scripts at other times.)
WINDOWS:
There are two ways to do backups from windows machines in a similar fashion. Either you run an rsync daemon on your machine, where etbs can connect and pull the files, or etbs mounts your share as an cifs and pulls the ryncs from the mounted filesystem. dobackup scripts to perform this kind of backup need to be written, but that is not a big deal.
How it works
dobackup
The backup script /data/backups/bin/dobackup.gen is quite short. To setup a backup target directory, just hardlink this file into that directory as /data/backups/HOST/PATH/dobackup
#! /bin/bash # Generic dobackup, edit with care!
Since this script does all backups, do not change is carelessly. If in doublt, make a copy into a specific backup target directory, and edit there. If the changes work generically, after some testing, apply the changes to the master copy.
TDIR=`/usr/bin/dirname $0` case "$TDIR" in /data/backups/*/*) HOST=`echo $TDIR | sed 's,^/data/backups/\([^/]*\)/.*$,\1,'` DDIR=`echo $TDIR | sed 's,^/data/backups/\([^/]*\)/\(.*\)$,\2,'` ;; *) echo please call with absolute pathname from /data/backups/HOST/DIR exit 1 ;; esac
Extract the hostname HOST and destination directory TDIR and DDIR from the full script path. The script must be called with a fully qualified path, else it won't run.
[ -f $TDIR/hosts ] && NHOST=`/usr/bin/awk "/$HOST/{print \\\$1;exit}" $TDIR/hosts` [ -z "$NHOST" ] && NHOST=$HOST
Resolve the real hostname NHOST if specified.
SDIR=/$DDIR case "$DDIR" in home) SDIR=/home/$HOST ;; *-*) SDIR=`echo /$DDIR | sed 's,-,/,g'` ;; esac if [ "$NHOST" = "localhost" ] then SOURCE=$SDIR/ else SOURCE=$NHOST:$SDIR/ fi
Derive the backup source SOURCE from the DDIR, HOST and NHOST.
[ -f $TDIR/env ] && . $TDIR/env
Include a wildcard file. Use that to override SOURCE or TDIR.
TAG=`/bin/date +%Y-%m-%d` DEST=$TDIR/$TAG LAST=`/bin/ls -1d $TDIR/20[0-9][0-9]-[01][0-9]-[0123][0-9] | /usr/bin/tail -4` LINKDEST= for L in $LAST do [ "$L" != "$DEST" ] && [ -d $L ] && LINKDEST="--link-dest=$L $LINKDEST" done
Create the date-TAG, and find the last four snapshots. Set the backup destination DEST, and a list of --link-dest options LINKDEST, to pull hardlinks from during the backup.
EXCLUDE='--exclude=cache/ --exclude=Cache/ --exclude=.gvfs --exclude=trash/ --exclude=Trash/' [ -f $TDIR/exclude ] && EXCLUDE=--exclude-from=$TDIR/exclude
Read the exclude filters from exclude or keep the default list.
[ -f $TDIR/options ] && OPTIONS=`grep -v '^#' $TDIR/options` [ -z "$OPTIONS" ] && OPTIONS='--max-size=100M --bwlimit=10000 --filter=dir-merge_/.etbs-filter'
rsync(1) options. If there are none in options, use the defaults.
/usr/bin/rsync -aqHx --ignore-existing --numeric-ids \ $EXCLUDE $OPTIONS \ $LINKDEST $SOURCE $DEST
Run is. Done.
dobackups
The cron script /data/backups/bin/dobackups runs a set ot dobackup scripts. It takes one optional argument, which tells which set to run. That argument usually starts with a dash '-', so it looks like an options.
#! /bin/bash export SSH_AUTH_SOCK=/data/backups/bin/ssh_auth_sock
if /usr/bin/ssh-add -l | /bin/grep -q 4d:ca:d4:9e:53:c9:eb:a3:e7:1c:22:37:40:70:75:eb then : else echo No SSH Key available exit 1 fi
Test if the ssh-auth is running and knows about our backup key. replace the key fingerprint as needed.
if LOCKFILE=$(tempfile -n /var/run/dobackups$1.lock) then trap "rm -f $LOCKFILE" EXIT echo `date` > $LOCKFILE else echo "another dobackups in progess:" cat /var/run/dobackups$1.lock exit 1 fi
Make sure we are not already running
ls -1 /data/backups/[a-z]*[a-z0-9]/[a-z]*[a-z0-9]/dobackup$1 | xargs -r -i bash -c '{}; /bin/true'
Find all files called /data/backup/*/*/dobackup[-SET] and run them.
crontab
The crontab /etc/cron.d/dobackup looks like this:
0 2,13 * * * root /data/backups/bin/dobackups 0 3 * * * root /data/backups/bin/dobackups -slow 0 5 * * * root /data/backups/bin/dobackups -at5 30 12 * * * root /data/backups/bin/dobackups -infreq >/var/log/backups-infreq 2>&1
Most jobs run at two in the night, and once again during lunchbreak, to cover those who turned off their woirkstation at night.
Some slow remote jobs run a three in the night.
Some people asked for a backup at five in the morning, to cooperate with some local nightly cron work.
And finally some computers that are mostly turned off get their backup error messages suppressed, so the administrators won't get too many emails.
License
These scripts are rather trival and obvious. In case they are considered to be protected by copyright, I make them available here under the conditions of the GNU General Public License, Version 2..
© Copyright 2009-2011 Stephan I. Böttcher <boettcher@physik.uni-kiel.de>