rsync command

Introduction

rsync is a commonly used Linux application for file synchronization.

It can synchronize files between a local computer and a remote computer, or between two local directories (but does not support synchronization between two remote computers). It can also be used as a file copy tool, replacing the cp and mv commands.

The r in its name refers to remote, and rsync actually means "remote sync". Unlike other file transfer tools (such as FTP or scp), the biggest feature of rsync is that it checks the existing files of the sender and receiver, and only transfers the changed parts (the default rule is that the file size or modification time changes).

Although rsync is not part of the SSH tool set, it also involves remote operations, so it is introduced here.

Installation

If rsync is not installed on the local or remote computer, you can install it with the following command.

# Debian
$ sudo apt-get install rsync

# Red Hat
$ sudo yum install rsync

# Arch Linux
$ sudo pacman -S rsync

Note that rsync must be installed on both sides of the transmission.

Basic usage

rsync can be used to synchronize two directories on the local computer. The following is an example of local synchronization, by the way, to explain the usage of several main parameters of rsync.

-r parameter

When the machine uses the rsync command, it can be used as an alternative to the cp and mv commands to copy the source directory to the target directory.

$ rsync -r source destination

In the above command, -r means recursion, that is, including subdirectories. Note that -r is necessary, otherwise rsync will not run successfully. The source directory represents the source directory, and the destination represents the destination directory. After the above command is executed, the subdirectory destination/source will appear in the target directory.

If there are multiple files or directories that need to be synchronized, they can be written as follows.

$ rsync -r source1 source2 destination

In the above command, source1 and source2 will be synchronized to the destination directory.

-a parameter

The -a parameter can replace -r. In addition to recursive synchronization, meta-information (such as modification time, permissions, etc.) can also be synchronized. Since rsync uses file size and modification time by default to determine whether a file needs to be updated, -a is more useful than -r. The following usage is the common way of writing.

$ rsync -a source destination

If the destination directory destination does not exist, rsync will automatically create it. After executing the above command, the source directory source is completely copied to the target directory destination, which forms the directory structure of destination/source.

If you only want to synchronize the contents of the source directory source to the destination directory destination, you need to add a slash after the source directory.

$ rsync -a source/ destination

After the above command is executed, the contents of the source directory are copied to the destination directory, and a source subdirectory will not be created under the destination.

-n parameter

If you are not sure what result will be produced after rsync is executed, you can use the -n or --dry-run parameter to simulate the execution result.

$ rsync -anv source/ destination

In the above command, the -n parameter simulates the result of the command execution, and does not actually execute the command. The -v parameter is to output the results to the terminal, so that you can see what content will be synchronized.

--delete parameter

By default, rsync only ensures that all the contents of the source directory (except for explicitly excluded files) are copied to the target directory. It does not keep the two directories the same, and it does not delete files. If you want to make the target directory a mirror copy of the source directory, you must use the --delete parameter, which will delete files that only exist in the target directory but not in the source directory.

$ rsync -av --delete source/ destination

In the above command, the --delete parameter will make destination a mirror image of source.

Exclude files

--exclude parameter

Sometimes, we want to exclude certain files or directories during synchronization. At this time, we can use the --exclude parameter to specify the exclusion mode.

$ rsync -av --exclude='*.txt' source/ destination
# Or
$ rsync -av --exclude'*.txt' source/ destination

The above command excludes all TXT files.

Note that rsync will synchronize hidden files starting with "dot". If you want to exclude hidden files, you can write --exclude=".*" like this.

If you want to exclude all files in a directory, but do not want to exclude the directory itself, you can write it as follows.

$ rsync -av --exclude'dir1/*' source/ destination

For multiple exclude modes, multiple --exclude parameters can be used.

$ rsync -av --exclude'file1.txt' --exclude'dir1/*' source/ destination

Multiple exclude modes can also take advantage of the extended function of Bash's large extension, using only one --exclude parameter.

$ rsync -av --exclude={'file1.txt','dir1/*'} source/ destination

If there are many excluded patterns, you can write them into a file with one line per pattern, and then use the --exclude-from parameter to specify this file.

$ rsync -av --exclude-from='exclude-file.txt' source/ destination

--include parameter

The --include parameter is used to specify the file mode that must be synchronized, and is often used in combination with --exclude.

$ rsync -av --include="*.txt" --exclude='*' source/ destination

The above command specifies that when synchronizing, all files will be excluded, but TXT files will be included.

Remote synchronization

SSH protocol

In addition to supporting synchronization between two local directories, rsync also supports remote synchronization. It can synchronize local content to a remote server.

$ rsync -av source/ username@remote_host:destination

You can also synchronize remote content to the local.

$ rsync -av username@remote_host:source/ destination

By default, rsync uses SSH for remote login and data transmission.

Since the early rsync did not use the SSH protocol, it was necessary to specify the protocol with the -e parameter, which was changed later. Therefore, the following -e ssh can be omitted.

$ rsync -av -e ssh source/ user@remote_host:/destination

However, if the ssh command has additional parameters, you must use the -e parameter to specify the SSH command to be executed.

$ rsync -av -e'ssh -p 2234' source/ user@remote_host:/destination

In the above command, the -e parameter specifies SSH to use port 2234.

rsync protocol

In addition to using SSH, if the rsync daemon is installed and running on another server, you can also use the rsync:// protocol (default port 873) for transmission. The specific wording is to use a double colon to separate :: between the server and the target directory.

$ rsync -av source/ 192.168.122.32::module/destination

Note that the module in the above address is not the actual path name, but a resource name specified by the rsync daemon, which is assigned by the administrator.

If you want to know the list of all modules allocated by the rsync daemon, you can execute the following command.

$ rsync rsync://192.168.122.32

In addition to using double colons in the rsync protocol, you can also directly use the rsync:// protocol to specify an address.

$ rsync -av source/ rsync://192.168.122.32/module/destination

Incremental backup

The biggest feature of rsync is that it can complete incremental backups, that is, by default, only files that have changed are copied.

In addition to the direct comparison between the source directory and the target directory, rsync also supports the use of a reference directory, that is, the changed parts between the source directory and the reference directory are synchronized to the target directory.

The specific method is that the first synchronization is a full backup, and all files are synchronized in the base directory. Each subsequent synchronization is an incremental backup, only the part that has changed between the source directory and the base directory is synchronized, and this part is saved in a new target directory. This new target directory also contains all files, but in fact, only those changed files exist in this directory, and the other files that have not changed are hard links to the base directory files.

The --link-dest parameter is used to specify the base directory for synchronization.

$ rsync -a --delete --link-dest /compare/path /source/path /target/path

In the above command, the --link-dest parameter specifies the reference directory /compare/path, and then the source directory /source/path is compared with the reference directory to find the changed files and copy them to the target directory /target/path. Those files that have not changed will generate hard links. The first backup of this command is a full backup, and all subsequent backups are incremental.

The following is an example of a script that backs up the user's home directory.

#!/bin/bash

# A script to perform incremental backups using rsync

set -o errexit
set -o nounset
set -o pipefail

readonly SOURCE_DIR="${HOME}"
readonly BACKUP_DIR="/mnt/data/backups"
readonly DATETIME="$(date'+%Y-%m-%d_%H:%M:%S')"
readonly BACKUP_PATH="${BACKUP_DIR}/${DATETIME}"
readonly LATEST_LINK="${BACKUP_DIR}/latest"

mkdir -p "${BACKUP_DIR}"

rsync -av --delete \
  "${SOURCE_DIR}/" \
  --link-dest "${LATEST_LINK}" \
  --exclude=".cache" \
  "${BACKUP_PATH}"

rm -rf "${LATEST_LINK}"
ln -s "${BACKUP_PATH}" "${LATEST_LINK}"

In the above script, each synchronization will generate a new directory ${BACKUP_DIR}/${DATETIME}, and point the soft link ${BACKUP_DIR}/latest to this directory. In the next backup, use ${BACKUP_DIR}/latest as the base directory to generate a new backup directory. Finally, point the soft link ${BACKUP_DIR}/latest to the new backup directory.

Configuration item

The -a and --archive parameters indicate the archive mode, which saves all metadata, such as modification time, permissions, owner, etc., and the soft link will also be synchronized.

The --append parameter specifies the file to continue the transfer where it was interrupted last time.

The --append-verify parameter is similar to the --append parameter, but a verification is performed on the file after the transfer is completed. If the verification fails, the entire file will be resent.

The -b, --backup parameters specify that when deleting or updating a file that already exists in the target directory, the file is renamed and backed up. The default behavior is to delete. The renaming rule is to add the file suffix specified by the --suffix parameter, the default is ~.

The --backup-dir parameter specifies the directory where files are stored during backup, such as --backup-dir=/path/to/backups.

The --bwlimit parameter specifies the bandwidth limit, the default unit is KB/s, such as --bwlimit=100.

-c, --checksum parameters change the check method of rsync. By default, rsync only checks whether the size of the file and the last modified date have changed. If there is a change, it will retransmit; after using this parameter, it will determine whether to retransmit by judging the checksum of the file content.

The --delete parameter deletes files that only exist in the target directory but not in the source target, which means that the target directory is a mirror image of the source target.

The -e parameter specifies the use of SSH protocol to transfer data.

The --exclude parameter specifies to exclude files that are not to be synchronized, such as --exclude="*.iso".

The --exclude-from parameter specifies a local file, which contains file patterns to be excluded, one line per pattern.

--existing, --ignore-non-existing parameters indicate that files and directories that do not exist in the target directory are not synchronized.

The -h parameter means output in a human-readable format.

The -h, --help parameters return help information.

The -i parameter indicates the details of the file difference between the output source directory and the target directory.

The --ignore-existing parameter means that as long as the file already exists in the target directory, it will skip over and no longer synchronize these files.

The --include parameter specifies the files to be included during synchronization, and is generally used in conjunction with --exclude.

The --link-dest parameter specifies the base directory for incremental backup.

The -m parameter specifies not to synchronize empty directories.

The --max-size parameter sets the size limit of the largest file to be transferred, such as not exceeding 200KB (--max-size='200k').

The --min-size parameter sets the size limit of the smallest file to be transferred, such as not less than 10KB (--min-size=10k).

The -n parameter or the --dry-run parameter simulates the operation to be performed, but does not actually execute it. With the use of the -v parameter, you can see what content will be synchronized.

The -P parameter is a combination of the two parameters --progress and --partial.

The --partial parameter allows the interrupted transmission to be resumed. When this parameter is not used, rsync will delete half of the interrupted files; after using this parameter, half of the transferred files will also be synchronized to the target directory, and the interrupted transfer will be resumed the next time it is synchronized. Generally need to be used in conjunction with --append or --append-verify.

The --partial-dir parameter specifies that half of the transferred files will be saved to a temporary directory, such as --partial-dir=.rsync-partial. Generally need to be used in conjunction with --append or --append-verify.

The --progress parameter indicates the display progress.

The -r parameter means recursion, that is, including subdirectories.

The --remove-source-files parameter indicates that the sender's files will be deleted after the transfer is successful.

The --size-only parameter means that only files whose size has changed will be synchronized, regardless of the difference in file modification time.

The --suffix parameter specifies the suffix added to the file name when the file name is backed up. The default is ~.

The -u, --update parameters indicate that the files whose modification time is updated in the target directory are skipped during synchronization, that is, these files with updated timestamps are not synchronized.

The -v parameter indicates the output details. -vv means output more detailed information, -vvv means output most detailed information.

The --version parameter returns the version of rsync.

The -z parameter specifies to compress data during synchronization.