A Generic Backup Scheme
This page serves as a planning document for a network based file backup method. The backup scheme will be implemented between two linux machines connected over the internet. Bash scripts (or similar) will be used to perform the backups, in coordination with cron jobs. This should be viewed as a living document that will be 1) used as a plan before the system is built and 2) be used to document the backup method after the backup scripts have been created.
Backups are performed by copying files over a secure network connection from server A to server B. The idea is to backup data from server A to server B. This will be notated here simply as A->B. A server in this case will be a *nix based server.
Pull instead of Push
Does server A force the data onto server B, or does server B grab the data from server A? Since the backups are one way (A->B), we need to define who is responsible for performing the network file transfer: does server A or server B peform the copy? There are two basic possibilities, 'push' and 'pull'.
Push means that the server A forces the data onto server B.
Pull means that server B grabs the data off of server A.
The design decision made here is to have server B pull the data from server A. In other words, server B initiates the file transfers. This means that server A is the 'provider' and server B is the 'consumer'.
Hosting Files Securely
Since server A is the provider, its primary responsibility is to host the files to be backed up. These files should not be openly readable to the casual internet browser.
SFTP is the method chosen here to host the files. This is a 'Secure File Transfer Protocol' that is used for copying files securely over an unsecure connection (i.e. the internet). This requires that an account is created on server A, such as 'backupop'. This backup operator account should be a very limited account on server A and should not allow shell access. Here are the basic steps (this should only be performed on server A):
# useradd backupop # passwd backupop # groupadd sftp # usermod -G sftp backupop # usermod -s /bin/false backupop # mkdir -p /home/backupop # chown root:root /home/backupop # chmod 755 /home/backupop
This creates a user called backupop (short for backup operator). Then a group is added called sftp and the user backupop is added to that group. The sftp group will be used to provide restricted access to the server as will be seen shortly. Adding the '/bin/false' bit above prevents the user backupop from loging into a shell on the machine--this means that the user will only be able to view files, but the won't be able to run programs. A home directory is created, but because it is owned by root, the user will only be able to view files, but not add or delete files.
To verify these settings, check the files
Security of the backupop User on Server A
The user created above is a special user. The backupop user should not be allowed to view any files that are not placed in the /home/backupop directory. In fact, any kind of access that provides a shell (such as ssh) should be denied for this user. The only kind of access we want to provide is sftp. The user can log into the server as follows:
The user will have be at an sftp prompt, similar to the following:
The typical commands are:
ls - display remote directory listing pwd - display remote working directory cd - change remote directory get - download file help - display available commands quit - quit sftp (logout)
When users in the sftp group log in remotely to server A, they will have extremely restricted access. This restricted assess is provided via ssh. Edit the following file:
And make the changes as indicated.
Subsystem sftp internal-sftp Match Group sftp ChrootDirectory %h ForceCommand internal-sftp AllowTcpForwarding no
These options are described here Chroot Users with OpenSSH
However, the above commands will restrict 'all' users that use sftp to the restricted internal-only subsystem (internal-sftp). To allow trusted users to have read/write permissions and untrusted users to have read-only permissions, define a new subsystem:
# keep this for trusted users Subsystem sftp /usr/libexec/openssh/sftp-server
# define new subsystem sftpi for chrooting users and read-only access Subsystem sftpi internal-sftp Match Group sftponly ChrootDirectory %h ForceCommand internal-sftp AllowTcpForwarding no
Trusted users can use ssh/sftp as usual, however untrusted users (or any user that matches the group sftponly) will need to explicitly state the subsystem when using sftp:
$ sftp -s sftpi user@website
Basically, this facility means that when the backupop user logs into server A via sftp, they are 'chroot'ed to the /home/backupop directory. In other words, when they type
sftp> cd / sftp> ls
the backuop user will 'not' see the usual /etc, /bin, etc folders. The vunerability that exists here is that if the backuop account password is compromised, then the files could be read by the attacker, but not modified or deleted. The attacker could not use that account to log into a shell (i.e. bash or sh) on to server A over the network or locally.
Using sftp Without A Password
There is currently no option available (e.g. -p) to pass a user's password information to the sftp command. As such, sftp is restricted to password-less logons. This can be made secure by creating a private-public keypair:
ssh-keygen -t rsa
|Warning:||The default location for saving the key files is the current user's .ssh directory. Because files by this name might already exist, care must be taken to not overwrite any existing files. It is best to specify an absolute path and filename when creating new keys and then manually move them to the proper directories.|
Do not enter a password when prompted; the key pair will replace the need for passwords.
Copy the public key (e.g. id_rsa.pub) to server B (remote server) under backupop's user directory. It may be necessary to make a /home/backupop/.ssh directory if one does not already exist. The file should be appended to authorized_keys. If /home/backupop/.ssh/authorized_keys does not exist already, simply move id_rsa.pub to /home/backupop/.ssh/authorized_keys.
|Warning:||This is not a 'regular' public key! Do not transmit it in an unsecure fashion or publish it freely! In this usage scenario, treat the public key as sensitive information. If it is compromised, a malicious user could gain access to server A via sftp masquerading as the backupop user.|
This will allow the sftp command to be used in scripts without the need for a password:
sftp -s sftpi -b /dev/stdin backupop@serverA.com <<EOF get file1 get file2 quit EOF
Backup Directory Structure
The question now is how to determine what files need to be transferred from server A to server B, and in what fashion they will be copied.
Because server A is the provider, server A places any files it wants server B to backup into the /home/backupop directory. Let's assume that server A has the following files to back up:
/home/backupop/backups/forum/jan/file1.txt /home/backupop/backups/forum/feb/file2.txt /home/backupop/backups/forum/mar/file3.txt /home/backupop/backups/wiki/file4.txt
The server structure from the perspective of server A looks exactly as displayed above. However, to server B that uses the backupop user to log into server A, the file structure appears as:
/backups/forum/jan/file1.txt /backups/forum/feb/file2.txt /backups/forum/mar/file3.txt /backups/wiki/file4.txt
Because the copy is in the A->B direction, the following sequence of events must happen for a successful backup operation:
- Server A must provide the files to be backed up by copying them to the /home/backupop directory.
- Server A will create an index of files to be copied
- Server B logs on to server A with the backupop account and copies the index file on server A
- Server B then compares the index file copied from server A with its own index, copying only new files
- Server B maintains its own index of files that it has copied
In addition to the files to be copied A->B, index files are maintained by both servers. The index files are used slightly differently by each server.
Server A Index is created each time files are added to the server A backup directory. If a single file is added or deleted from server A, the index is rebuilt. The index file reflects exactly what is available on server A at any given time.
Server B Index is initially created when the first copy from A->B is performed. For each copy performed after this, the index file is updated to reflect each file that has ever been trasferred. If files on server B are added or deleted, the index is not updated as it is on server A. Instead, only the files that were copied from A->B are added to this index. However, if the server B index lists a file that no longer exists on server A, then the index entry is removed from the index; the file is not removed.
The index files are text files and have the same format:
[md5sum] [fullpath] d41d8cd98f00b204e9800998ecf8427e /backups/alpha d41d8cd98f00b204e9800998ecf8427e /backups/bravo 6812edcbb9edc05b936b8cbe4b515ec1 /backups/forumbackup.sql
The md5 checksums are created by the md5sum command which should be installed if coreutils are installed on the linux system (usually the case). The md5 sums provide a checksum used when comparing files. If two files have the same checksum, then they are identical. (There is a very small probability that they are the same, but the chances of this are extremely small.) In the example above, alpha and bravo both have the same contents (they are empty in this case). Because the md5 checksum can be used to identify if two files have the same contents, this will be used when determining which files have changed. Two files are identical if and only if:
- The absolute pathnames of both files match
- The md5sum is identical for both files
Given this criteria, alpha and bravo above are not considered identical.
In additon to detecting file changes, the md5 checksums can be used to verify that the files were copied correctly from server A to server B.
When it comes time to perform the copy, the indexes from server A and server B are compared. The index files provide the only means of determining which files are to be copied. The following policy is followed:
- A not B -> copy
- A and B -> no copy
- not A and B -> no copy
- not A and not B -> no copy
In other words, the only time a file is copied is if it exists on A but does not exist on B (according to the index files). This has the following consequences:
- If a group of files is copied from A to B, but then deleted from B, they will not be re-copied.
- If a group of files is copied from A to B, but later deleted from A, they will not be deleted from B. In this case, however, the index file on server B will be updated to reflect that these files were removed from server A, therefore index data does not need to be retained.
- The index file on server B serves to keep track of which files have already been copied to B. If server B wishes to recopy all files in the backup directory on A, manually deleting the index on server B and re-running server B's backup script will cause a full copy to be performed.
- If a file is copied from A to B, but the version of that file on server A changes contents, the file will be recopied to B. The old file will be placed in a .old directory in the respective path. This determiniation is facilitated by the md5sum checksum provided above.
Full Backups and Version History
Generally, a full backup is performed. This means that entire files are copied instead of only the portion of the files that changed. However, if a file changes on A, it will be recopied A->B. The old version on B will be archived in a directory called .old. If the file changes on server A yet again, it will be recopied A->B. The version on B will be archived to the .old directory, but will not overwrite any other old versions. This provides the benefit of a full (copy) backup with the benefit of having a history of changes made to individual files. This is especially useful for configuration files that do not change names but are regularly edited.
Files are never deleted automatically by the backup scripts proposed here.
The backup scheme is implemented by two backup scripts, one ran on server A, the other on server B.
These backup scripts should provide for fairly robust backups. Version control systems like SVN and CVS could provide more flexibility, but would also be more complicated to administer. Industrial strength backup systems like Amanda provide an overkill of features but at the cost of administration overhead. The utility rsync could be used, but would not provide for the ability to use the secure sftp method described here.
The advantages of this system are security, simplicity, and maintainability. The ssh/sftp method here provides a secure, read-only conduit for a restricted (but trusted) user to perform file transfers. The backup scripts developed here provide an open-source method that not only safely performs network file backups but also serves to document the backup policy itself. Because simple scripts are used, they can be easily modified and tailored to individual systems.