When copying or moving large numbers of files, the generic UNIX utilities
cp and mv are actually dangerous. Since the operations can
take a long time, there is a fair chance something will happen to interrupt
or stop the copy or move. When that happens on a move operation, your
data will be in a inconsistant state with part of it still in the original
location and part of it in the target destination. Even for a plain copy,
restarting the copy is less than ideal as it will recopy everything that
already got copied. This same issue applies to the SSH copy utility scp.
The rsync utility is a very advanced file transfer utility that
solves these issues. It is installed on both OSX and Linux. If you look at the
man page for
rsync you will see it has a ton of options. Don't let that apparent
complexity scare you. Using it for most copy or move jobs is
Simple Copy/Move Example
Take a look at this example of copying /source/dir/to/copy into
rsync -avP /source/dir/to/copy /target/dir/
The end result will be a copy of /source/dir/to/copy located at
/target/dir/copy. You can at this point actually run the exactly same
command again. In fact you should do this to verify the copy. rsync
looks at each file and only copies over what is not present or is different at
the target destination. If you want to see each file that gets copied as it
happens, add the -v option. Also, you can add -P to
get a progress bar on each files which is helpful when you have very
If your intention was to move the data instead of just copy,
you would then just run
rm -r /source/dir/to/copy
GOTCHA WARNING: one thing to be careful of
is trailing slashes. Normally you NEVER want a
trailing slash on the source directory but you DO want a trailing slash
on the target directory. See the man page for more info.
With the -a option the rsync will try to preserve both the
exact permissions and group of the source. When you are copying data
to one of your share groups areas, this can be problematic as it will
ignore the sticky group bit as discussed in
Understanding Group Permissions in UNIX. So instead you should
run rsync with the following options:
rsync -rltP --chmod=ugo=rwX ...
On all Martinos CentOS7 machines, we have defined a global option alias
-Z that does the above. So on these machines you can just run:
rsync -aZP ...
The key rule here to remember is use the -Z option when you
are rsyncing files INTO your group storage areas
Other common options you may which to use:
||Try to maintain hard links within transfered files|
||Transfer also POSIX ACLS|
||Transfer also extended attributes|
File Syncing and Mirror Backup
In the simple example above, if there are files in the target destination
that are not present at the source, they will be left alone and not touched.
Sometimes you want to the target destination to become an exact copy of
the source, aka "a mirror". To do that you want files on the target destination
side to be deleted if they do not exist at the source. To do this you simply
add the --delete option to rsync.
rsync -aZP --delete /source/dir/to/copy /target/dir/
Now any files under /target/dir/copy that are not also present
under /source/dir/to/copy will be deleted.
File Transfer over the Network
If you want to copy/move files to a directory over the network to
another computer, you simply need to preface the destination directory
with the hostname of the remote computer to copy to followed by a colon.
rsync -aZP /source/dir/to/copy remotehost:/target/dir/
You will be prompted for your password on the remote host before
the copy starts. If your user name is different on the remotehost than
on the computer you are running rsync, then you need to specify
username@remotehost rather than just remotehost.
If you are using rsync from a remote site outside MGH, please use
the gateway server door.nmr.mgh.harvard.edu for your data transfer
Please check out the
for other examples of rsync usage and how to use more advanced options.