![]() |
|
IntroductionYou will
find here a collection of example of use for several features of dar
suite command-line tools.
ContentsDar and
remote backup server
dar and ssh Bytes, bits, kilo, mega etc. Running DAR in background Files' extension used Running command or scripts from DAR Convention for DUC files Convention for DBP files User target in DCF Using data protection with DAR & Parchive Examples of file filtering Decremental Backup Door inodes (Solaris) How to use "delta compression", "binary diff" or "rsync like increment" with dar Comparing the different way to perform remote backup Multi recipient signed archive weakness |
Dar and remote backup serverIn
the following you will find the three different available ways to have
dar working with remote repositories (or remote server if you prefer):
The situation is the following : you have a host (called local in the following), on which resides an operational system which you want to backup regularly, without perturbing users. To face hard disk failure or local disaster, you want to store the backup on another host (called remote host in the following). Of course you have not much space on local host to store the archive, else you could do the backup first and then copy the resulting slices afterward. For restoration you would first need to download the archive then proceed to restoration, that's of course possible but still requires enough local storage space. Between these two hosts, you could also use NFS and nothing more would be necessary to use dar as usually. but if for security reasons you don't want to use NFS (insecure network, backup over Internet, ...), but prefer to communicate through an encrypted session (using ssh for example), then you need to use dar features brought by version 1.1.0: 1 - Single pipe dar can output its archive to its standard output instead of a given file. To activate it, use "-" as basename. Here is an example : dar -c - -R / -z |
some_program or
dar -c - -R / -z >
named_pipe_or_file Note, that
file
splitting is not available as it has not much meaning when writing to a
pipe. (a pipe has no name, there is no way to skip (or seek) in a pipe,
while dar needs to set back a flag in a slice header when it is not the
last slice of the set). At the other end of the pipe (on the remote
host), the data can be redirected to a file, with proper filename
(something that matches "*.1.dar").
some_other_program >
backup_name.1.dar It is also
possible to redirect the output to dar_xform which can in turn on the
remote host split the data flow in several files, pausing between them
if necessary,
exactly as dar is able to do:
some_other_program |
dar_xform -s 100M - backup_name this will
create backup_name.1.dar and so on. The resulting archive is totally
compatible with those directly generated by dar. OK,
you are happy, you can backup the local filesystem to a remote server
through a secure socket session, in a full featured dar archive
without using NFS. But, now you want to make a differential backup
taking this archive as reference. How to do that? The
simplest way is to use the new feature called "isolation", which
extracts the catalogue from the archive and stores it in a little file.
On the remote backup server you would type:
dar -A backup_name -C
CAT_backup_name -z Note that without -z option no compression is used as by default with dar for any operation, use -z option: it worse compressing an isolated catalogue. if the isolated catalogue is too big to fit on a floppy [yes, this was written long ago :-)], you can split it as usually using dar: dar -A backup_name -C
CAT_backup_name -z -s 1440k the
generated
archive (CAT_backup_name.1.dar, and so on), only contains the
catalogue, but can still be used as reference for a new backup (or
backup of the internal catalogue of the archive, using -x and -A at the
same time). You
just need to transfer it back to the local host, either using floppies
or USB key, through a secured socket session, or even directly
isolating the
catalogue to a pipe that goes from the remote host to the local
host:
on remote host: dar -A backup_name -C - -z
| some_program on local host: some_other_program >
CAT_backup_name.1.dar or use dar_xform as previously if you need splitting : some_other_program |
dar_xform -s 1440k CAT_backup_name then you can make your differential backup as usual: dar -A CAT_backup_name -c -
-z -R / | some_program or if this time you prefer to save the archive locally: dar -A CAT_backup_name -c
backup_diff -z -R / To read an
archive from a remote host using a single pile, you will need to run
dar with the --sequential-read
option. In that so called sequential
read mode, dar sequentially reads the archive contents from the
beginning up to the end. This has some huge
drawback
when it comes to extract few files from a large archive especially when
such large archive has to be sent over the network. For that reason
let's see another method:
2 - Dual pipes We have previously seen how to use a single pipe to generate an archive over the network, how to fetch an archive with a single pipe either to restore some files or to make a differential backup (but that way is very inefficient) and how to make differential backup by use of an isolated catalogue. For differential backups instead of isolating the catalogue, it is also possible to read an archive or its extracted catalogue through pipes. Yes, two pipes are required for dar to be able to read an archive efficiently. The first goes from dar to the external program "dar_slave" and carries orders (asking some portions of the archive) while the other pipe goes from "dar_slave" back to "dar" and carries the requested data for reading. By default, if you specify "-" as basename for -l, -t, -d, -x, or to -A (used with -C or -c), dar and dar_slave will use their standard input and output to communicate. Thus you need additional program to make the input of the first going to the output to the second, and vice versa. Warning: you cannot use named pipe that way, because dar and dar_slave would get blocked upon opening of the first named pipe, waiting for the peer to open it also, even before they have started (dead lock at shell level). For named pipes, there is -i and -o options that help, they receive a filename as argument, which may be a named pipe. The argument provided to -i is used instead of stdin and the one provided to -o is used instead of stdout. Note that -i and -o options are only available if "-" is used as basename. Let's take an example: You now want to restore an archive from your remote backup server. Thus on it you have to run dar_slave this way on remote server: some_prog | dar_slave
backup_name | some_other_prog or
dar_slave -o
/tmp/pipe_todar -i /tmp/pipe_toslave backup_name and on the local host you have to run dar this way: some_prog | dar -x - -v ...
| some_other_prog or
dar -x - -i /tmp/pipe_todar
-o /tmp/pipe_toslave -v ... The
order is not important: you can run dar or dar_slave first. What is
important instead is to connect dar and dar_slave in a way that the
output of the first goes to the input of the second and viceversa. An
important point to note is also that communication support must be
perfect: no data
loss, no duplication, no order change, thus communication over TCP
should be fine.
Of course,
you can not only extract files (-x command as above) but
also isolate a catalogue through pipes, test an archive, make
difference, use as catalogue of reference and even then,
output the resulting archive to pipe! If using -C or -c with "-" while
using -A also with "-", it is then mandatory to use -o: The output
catalogue will be generated on standard output, thus to send order to
dar_slave you must use another channel thanks to -o option:
LOCAL
HOST
REMOTE HOST
+-----------------+
+-----------------------------+ | filesystem
|
| backup of reference | |
|
|
|
|
| |
|
|
|
|
| |
V
|
|
V
| |
+-----+ | backup of reference
|
+-----------+ | | | DAR
|--<-]=========================[-<--| DAR_SLAVE
| | | |
|-->-]=========================[->--|
| | |
+-----+ | orders to dar_slave
|
+-----------+ | |
|
|
|
+-----------+ | |
+--->---]=========================[->--| DAR_XFORM |--> backup|
|
| saved data
| +-----------+ to slices|
+-----------------+
+-----------------------------+ on local host : dar -c - -A - -i
/tmp/pipe_todar -o /tmp/pipe_toslave | some_prog on the remote host : dar_slave -i
/tmp/pipe_toslave -o /tmp/pipe_todar full_backup dar_slave provides the full_backup for dar's -A option... some_other_prog | dar_xform
- diff -s 140M -p ... ...while dar_xform make slices of the output archive provided by dar 3 - Native SFTP and FTP support Since release 2.6.0, you can use an URL-like archive basename. Assuming you have slices test.1.dar, test.2.dar ... available in the directory Archive of an FTP server you could read, extract, list, test, ... that archive using the following syntax: dar -t
ftp://login@ftp.server.some.where/Archive/test ... <other options> Same thing with -l, -x, -A and -@ options. Note that you still need to provide the archive base name not a slice name. This option is also compatible with slicing and slice hashing, they will be generated on remote server: dar -c
sftp://login:password@secured.server.some.where/Archive/day2/incremental
-A ftp://login@ftp.server.some.where/Archive/CAT_test --hash sha512 -@
sftp://login2:password2@secured.server.some.where/Archive/day2/CAT_incremental
<other options> By default if no password is given, dar asks the user interactively. If no login is used, dar assumes the login to be "anonymous". When you add the -afile-auth option, in absence of password on command-line, dar checks for a password in the file ~/.netrc for both FTP and SFTP protocols to avoid exposing password on command-line while still have non interactive backup. See man netrc for this common file's syntax. In the next paragraph you will find examples of use with netcat and ssh relying on the two first options for remote operation, but just before find a summary table of the pro and cons of the different remote server access methods: |
Mode |
network
protocol supported |
slicing |
slice
hashing |
efficient
archive reading |
Note
|
single pipe |
any reliable one (*) |
using dar_xform for both reading
and writing |
NO |
NO |
only efficient for archive
creation |
dual pipes |
any reliable one (*) |
using dar_xform for writing and dar_slave for reading |
NO |
YES |
efficient for archive reading
and using archive as reference |
dar native
support |
FTP and SFTP only |
-s and -S options |
YES (--hash option) |
YES |
FTP is insecure but less CPU
resources |
Running DAR in backgroundDAR can be run in background:
dar [command-line
arguments] < /dev/null & |
Files' extension useddar suite programs use several
type of files:
If for slice the extension and
even the filename format cannot be
customized, (basename.slicenumber.dar) there is not mandatory rule for
the other type of files.
In the case you have no idea how to name these, here is the extensions I use: "*.dcf":
Dar Configuration file, aka DCF files (used with dar's -B option)
"*.dmd": Dar Manager Database, aka DMD files (used with dar_manager's -B and -C options) "*.duc": Dar User Command, aka DUC files (used with dar's -E, -F, -~ options) "*.dbp": Dar Backup Preparation, aka DBP files (used with dar's -= option) "*.dfl": Dar Filter List, aka DFL files (used with dar's -[ or -] options) but, you are totally free to use the filename you want ! ;-) |
Running command or scripts from DARYou can run command from dar at
two different places:
A - Between slices:This
concerns -E,
-F and -~ options. They all receive a string as
argument. Thus, if the argument must be a command with its own
arguments, you have to put these between quotes for they appear as a
single string to the shell that interprets the dar command-line. For
example if you want to call
df . [This is two worlds: "df" (the command) and "." its argument] then you have to use the following on DAR command-line: -E "df ." or
-E 'df .' DAR provides several substitution strings in that context:
The number
of the slice (
%n )
is either the just written slice or the next slice to be read. For
example if you create an new archive (either using -c, -C or -+), in -E
option, the %n macro is the number of the last
slice completed. Else (using -t, -d, -A (with -c or -C), -l or -x),
this is the number of the slice that will be required very soon. While
%c (the context) is substituted by "init", "operation" or "last_slice".
What the use of this feature? For example you want to burn the brand-new slices on CD as soon as they are available. let's build a little script for that: %cat burner
#!/bin/bash
if [ "$1" == "" -o "$2" == "" ] ; then echo "usage: $0 <filename> <number>" exit 1 fi
mkdir T mv $1 T mkisofs -o /tmp/image.iso -r -J -V "archive_$2" T cdrecord dev=0,0 speed=8 -data /tmp/image.iso rm /tmp/image.iso if diff /mnt/cdrom/$1 T/$1 ; then rm -rf T else endif
% This little
script, receive the
slice
filename, and its number as argument, what it does is to burn a CD with
it, and compare the resulting CD with the original slice. Upon failure,
the script return 2 (or 1 if syntax is not correct on the
command-line). Note that this script is only here for illustration,
there are many more interesting user scripts made by several dar users.
These are available in the examples
part of the documentation.
One could then use it this way: -E "./burner %p/%b.%n.dar
%n" which can lead to the following DAR command-line: dar -c ~/tmp/example -z -R
/ usr/local -s 650M -E "./burner %p/%b.%n.dar %n" -p First note
that as our script
does
not change CD from the device, we need to pause between slices (-p
option). The pause take place after the execution of the command (-E
option). Thus we could add in the script a command to send a mail or
play a music to inform us that the slice is burned. The advantage, here
is that we don't have to come twice by slices, once the slice is
ready, and once the slice is burnt.
Another example: you want to
send a huge file by
email. (OK that's better to use FTP,
but sometimes, people think than the less you can do the more they
control you, and thus they disable many services, either by fear of the
unknown, either by stupidity). So let's suppose that you only have mail
available to
transfer your data:
dar -c toto -s 2M
my_huge_file -E
"uuencode %b.%n.dar %b.%n.dar | mail -s 'slice %n' your@email.address ;
rm %b.%n.dar ; sleep 300" Here we make
an archive with
slices of 2 Megabytes, because our mail
system does not allow larger emails. We save only one file:
"my_huge_file" (but we could even save the whole filesystem it would
also work). The command we execute each time a slice is ready is:
Note that we did not used the
%p
substitution string, as
the slices are saved in the current directory.Last example, is while
extracting: in
the case the slices cannot all be present in the filesystem, you need a
script or a command to fetch the next to be requested slice. It could
be using ftp, lynx, ssh, etc. I let you do the script as an exercise.
:-). Note, if you plan to share
your DUC files, thanks to use the convention
fo DUC files.
B - Before and after saving a file:This concerns the -=, -< and
-> options. The -< (include) and -> (exclude) options, let you
define which file will need a command to be run before and after their
backup. While the -= option, let you define which command to run for
those files.
Let's suppose you have a very
large file changing often that is located
in /home/my/big/file, and several databases that each consist of
several files
under /home/*/database/data that need to have a coherent status and are
also changing very often.
Saving them without precaution,
will most probably make your big file flagged as "dirty" in dar's
archive, which means that the saved
status of the file may be a status that never existed for that file:
when dar saves a file it reads the first byte, then the second, etc. up
to the end of file. While dar is reading the middle of the file, an
application may change the very begin and then the very end of
that file, but only modified ending of that file will be saved, leading
the archive to contain a copy of the file in a state it never had.
For a database this is even worse, two or more files may need to have a coherent status. If dar saves one first file while another file is modified at the same time, this will not lead having the currently saved files flagged as "dirty", but may lead the database to have its files saved in incoherent states between them, thus leading you to have saved the database in a corrupted state. For that situation not to occur, we will use the following options: -R / "-<" home/my/big/file
"-<" "home/*/database/data"
First,
you must pay attention to quote the -< and -> options for the
shell not to consider you ask for redirection to stdout or from stdin.
Back to the example, that says that
for the files /home/my/big/file and for any "database/data" directory
(or file) in the home directory of a user, a command will be run before
and after saving that directory of file. We need thus to define the
command to run using the following option:
-=
"/root/scripts/before_after_backup.sh %f %p %c"
Well as you see, here too we may
(and should) use substitutions macro:
And our script here could
look like this:
cat
/root/scripts/before_after_backup.sh
#!/bin/sh
if [ "$filename" = "data" ]; then if ["$context" = "start" ]; then # action to stop the database located in "$2" else # action to restart the database located in "$2" fi else if ["$path_file" = "/home/my/big/file"]; then if ["$context" = "start" ]; then # suspend the application that writes to that file else # resume the application that writes to that file fi else # do nothing, or warn that no action is defined for that file fi So now, if we run dar with all these command, dar will execute our script once before entering any database/data directory located in a home directory of some user, and once all files of that directory will have been saved. It will run our script also before and after saving our /home/my/big/file file. If you plan to share your DBP
files, thanks to use the DBP convention.
|
Convention for DUC filesSince version 1.2.0 dar's user
can have dar calling a command or scripts between slices, thanks to
the -E, -F and -~ options, called DUC files. To be able to easily
share your DUC commands or
scripts, I propose you the following convention:
- use the ".duc" extension to show anyone the script/command respect the following - must be called from dar with the following arguments: example.duc
%p %b %n %e %c [other optional arguments] - when called without argument, it must provide brief help on what it does and what are the expected arguments. This is the standard "usage:" convention. Then, any user, could share their DUC files and don't bother much about how to use them. Moreover it would be easy to chain them: if for example two persons created their own script, one "burn.duc" which burns a slice onDVD-R(W) and "par.duc" which makes a Parchive redundancy file from a slice, anybody could use both at a time giving the following argument to dar: -E
"par.duc %p %b %n %e %c 1 ; burn.duc %p %b %n %e %c" or since version 2.1.0 with the following argument: -E
"par.duc %p %b %n %e %c 1" -E "burn.duc %p %b %n %e %c" of course a script has not to use all its arguments, in the case of burn.duc for example, the %c (context) is probably useless, and not used inside the script, while it is still possible to give it all the "normal" arguments of a DUC file, extra not used argument are simply ignored. If you have interesting DUC scripts, you are welcome to contact me by email, for I add them on the web site and in the following releases. For now, check doc/samples directory for a few examples of DUC files. Note that all DUC scripts are expected to return a exit status of zero meaning that the operation has succeeded. If another exit status has been returned, dar asks the user for decision (or aborts if no user has been identified, for example, dar is not ran under a controlling terminal). |
Convention for DBP filesSame as above, the following
convention is proposed to ease the sharing of Dar Backup Preparation
files:
- use the ".dbp" extension to show anyone the script/command respect the following - must be called from dar with the following arguments: example.dbp
%p %f %u %g %c [other optional arguments] - when called without argument, it must provide brief help on what it does and what are the expected arguments. This is the standard "usage:" convention. Identically to DUC files, DBP files are expected to return a exist status of zero, else the backup process is suspended for the user to decide wether to retry, ignore the failure or abort the whole backup process. |
Using data protection with DAR & ParchiveParchive
(PAR in the following)
is a
very nice program that makes possible to recover a file which has been
corrupted. It creates redundancy data stored in a separated file (or
set of files), which can be used to repair the original file. This
additional data may also be damaged, PAR will be able to repair the
original file as well as the redundancy files, up to a certain point,
of course. This point is defined by the percentage of redundancy you
defined for a given file. But,... check the official PAR site here:
http://parchive.sourceforge.net (original site no more maintained today) https://github.com/BlackIkeEagle/par2cmdline (fork from the official site maintained since decembre 2013) Since
version 2.4.0, dar is provided with a default /etc/darrc file. It
contains a set of user
targets among which is "par2". This user target invokes the dar_par.dcf
file provided beside dar that automatically creates parity file for
each slice during backup and verifies and if necessary repaires slices
when testing an archive. So now you only need to use dar this way to
activate Parchive with dar:
dar [options] par2
Simple no?
|
Examples of file filteringFile
filtering is what defines
which
files are saved, listed, restored, compared, tested, and so on. In
brief, in the following we will say which file are elected for the
operated, meaning by "operation", either a backup, a restoration, an
archive contents listing, an archive comparison, etc.
File filtering is done using the following options -X, -I, -P, -R, -[, -] or -g. OK, Let's start with some concretes examples:
dar -c
toto this will backup the current directory and all what is located into it to build the toto archive, also located in the current directory. Usually you should get a warning telling you that you are about to backup the archive itself Now let's see something less obvious: dar -c
toto -R / -g home/ftp the -R option tell dar to consider all file under the / root directory, while the -g "home/ftp" argument tells dar to restrict the operation only on the home/ftp subdirectory of the given root directory thus here /home/ftp. But this is a little bit different from the following:
dar -c
toto -R /home/ftp here dar will save any file under /home/ftp without any restriction. So what is the difference? Yes, exactly the same files will be saved as just above, but the file /home/ftp/welcome.msg for example, will be stored as <ROOT>/welcome.msg . Where <ROOT> will be replaced by the argument given to -R option (which defaults to "."), at restoration or comparison time. While in the previous example the same file would have been stored with the following path <ROOT>/home/ftp/welcome.msg . dar -c
toto -R / -P home/ftp/pub -g home/ftp -g etc as previously, but the -P option make all files under the /home/ftp/pub not to be considered for the operation. Additionally the /etc directory and its subdirectories are saved. dar -c
toto -R / -P etc/password -g etc here we save all the /etc except the /etc/password file. Arguments given to -P can be plain files also. But when they are directory this exclusion applies to the directory itself and its contents. Note that using -X to exclude "password" does have the same effect: dar -c
toto -R / -X "password" -g etc will save all the /etc directory except any file with name equal to "password". thus of course /etc/password will no be saved, but if it exists, /etc/rc.d/password will not be saved neither if it is not a directory. Yes, if a directory /etc/rc.d/password exist, it will not be affected by the -X option. As well as -I option, -X option do not apply to directories. The reason is to be able to filter some kind of file without excluding a particular directory for example you want to save all mp3 files and only MP3 files, dar -c
toto -R / -I "*.mp3" -I "*.MP3" home/ftp will save any mp3 or MP3 ending files under the /home/ftp directories and subdirectories. If instead -I (or -X) applied to directories, we would only be able to recurse in subdirectories ending by ".mp3" or ".MP3". If you had a directory named "/home/ftp/Music" for example, full of mp3, you would not have been able to save it. Note that the glob expressions (where comes the shell-like wild-card '*' '?' and so on), can do much more complicated things like "*.[mM][pP]3". You could thus replace the previous example by: dar -c
toto -R / -I "*.[mM][pP]3" home/ftp this would cover all .mp3 .mP3 .Mp3 and .MP3 files. One step further, the -acase option makes following filtering arguments become case sensitive (which is the default), while the -ano-case (alias -an in short) set to case insensitive mode filters arguments that follows it. In shorter we could have: dar -c toto -R / -an
-I "*.mp3' home/ftp And, instead of using glob expression, you can use regular expressions (regex) using the -aregex option. You can also use alternatively both of them using -aglob to return back to glob expressions. Each option -aregex / -aglob define the expected type of expression in the -I/-X/-P/-g/-u/-U/-Z/-Y options that follows, up to end of line or to the next -aregex / -aglob option. Last a more complete example: dar -c
toto -R / -P "*/.mozilla/*/[Cc]ache" -X ".*~" -X ".*~" -I
"*.[Mm][pP][123]" -g home/ftp -g "fake" so what ? OK, here we save all under /home/ftp and /fake but we do not save the contents of "*/.mozilla/*/[Cc]ache" like for example "/home/ftp/.mozilla/ftp/abcd.slt/Cache" directory and its contents. In these directories we save any file matching "*.[Mm][pP][123]" files except those ending by a tilde (~ character), Thus for example file which name is "toto.mp3" or ".bloup.Mp2" Now the inside algorithm: a file is elected for operation if 1 - its name does not match any -X option or it is a directory *and* 2 - if some -I is given, file is either a directory or match at least one of the -I option given. *and* 3 - path and filename do not match any -P option *and* 4 - if some -g options are given, the path to the file matches at least one of the -g options. The algorithm we detailed above is the default one, which is historical and called the unordered method, since version 2.2.x there is also an ordered method (activated adding -am option) which gives even more power to filters, the dar man mage will give you all the details. In parallel of file filtering, you will find Extended Attributes filtering thanks to the -u and -U options (they work the same as -X and -I option but apply to EA), you will also find the file compression filtering (-Z and -Y options) that defines which file to compress or to not compress, here too the way they work is the same as seen with -X and -I options, the -ano-case / -acase options do also apply here, as well as the -am option. Last all these filtering (file, EA, compression) can also use regular expression in place of glob expression (thanks to the -ag / -ar options). Note in very last point, that the --backup-hook-include and --backup-hook-exclude options act the same as -P and -g options but apply to the files about to be saved and provides to the user the possibility to perform an action (--backup-hook-execute) before and after saving files matching the masks options. The dar man page will give you all the necessary details to use this new feature. |
Door inodes (Solaris)A door inode is a dynamic
object that is created on top of an empty file, it does exist only when
a process has a reference to it, it is thus not possible to restore it.
But the empty file it is mounted on can be restored instead. As such,
dar restores an door inode with an empty file having the same
parameters as the door inode.
If an door inode is hard linked several times in the file system dar will restore a plain file having as much hard links to the corresponding locations. Dar is also able to handle Extended Attributes associated to a door file, if any. Last, if you list an archive containing door inodes, you will see the 'D' letter as their type (by opposition to 'd' for directories), this is conform to what the 'ls' command displays for such entries. How to use "delta compression", "binary diff" or "rsync like increment" with darTerminology"delta compression", "binary
diff" or "rsync increment" all point to the same feature: a way to
avoid resaving a whole file during a differential/incremental backup
and only save the modified part of it instead. This solution is of
course
interesting for large files that change often but only for little parts
of them (Microsoft exchange mailboxes, for example). Dar implements
this feature relying on librsync
library, feature which we will call binary delta in
the following
Librsync specific conceptsBefore looking at the way to use dar, several concepts from librsync have to be known:In order to make a binary delta
of a file "foo" which at time t1 contained data F1 and at
time t2 containted data F2, librsync requires first that a "delta signature" be made against F1.
Then using that signature and data F2, librsync is able to build a delta patch P1 that, if applied to
F1 will provide content F2:
backing up file "foo"
| V time t1 content = F1 ---------> delta signature of F1 | | | | | +-------------> ) building delta patch "P1" V )----> containing the difference time t2 content = F2 ----------------------------------> ) from F1 to F2 At restoration time dar has
then first to restore F1, from a full backup or from a previous
differential backup, then using librsync applying the patch "P1" to
modify F1 into F2.
restoring file "foo"
| V time t3 content = F1 <--- from a previous backup | +------>--------------->----------------+ . | . V . + <----- applying patch "P1" . | +-----<---------------<-------------<---+ | V time t4 content = F2 Usage with darFirst, delta signature is not
activated by default, you have to tell dar you want to generate delta
signature using the --delta sig
option at archive creation/isolation/merging time. Then as soon as a
file has a delta signature in the archive of reference, dar will
perform a delta binary if such file has changed since the archive of
reference was done. But better an example than a long explanation:
Case of a differential backup scheme: First, doing
a full backup, we add the --delta sig
option for the resulting archive to contain the necessary signatures to
be provided to librsync later on in order to setup delta patches. This
has the drawback of additional space requirement but the advantage of
space economy at incremental/differential backups:
dar -c full -R / -z --delta sig
Then
there is nothing more specific to delta signature, this is the same way
as you were used to do with previous releases of dar: you just need to
rely on a archive of reference containing delta signatures for dar
activating delta binary. Here below, diff1 archive will eventually
contain delta patches of modified files since full archive was created,
but will not contain any delta signature.
dar -c diff1 -A full -R / -z
The next differential backups
will be done the same, based on the full backup:
dar -c diff2 -A full -R / -z
Looking at archive content, you will see the "[Delta]" flag in place of the "[Saved]" flag for files that have been saved as a delta patch rather than having their whole data saved in the backup: [Data ][D][ EA ][FSA][Compr][S]|
Permission | User | Group | Size | Date | filename
--------------------------------+------------+-------+-------+-------+-------------------------------+------------ [Delta][ ] [-L-][ 99%][X] -rwxr-xr-x 1000 1000 919 kio Tue Mar 22 20:22:34 2016 bash Case of incremental backup scheme: Doing incremental backups, the
first one is always a full backup and is done the same as above for
differential backup:
dar -c full -R / -z --delta sig
At the opposit of differential
backups, incremental backups are also used as reference for the next
backup. Thus if you want to continue doing binary delta, delta
signatures must be present beside the delta patch in the resulting
archives:
dar -c incr1 -A full -R / -z --delta sig
Here the --delta sig
switch leads dar to copy from the full backup into the new backup all
the delta signatures of unchanged files and to recompute new delta
signature of files that have changed.
Case of catalogue isolation: If you do not want having the
previous differential, incremental or full backup around in order to
make a new backup, you can still use isolated catalogues to do so. The
point to take care about here is the way to build this isolated
catalogue: If you want to perform a binary difference, the signature of
reference files must be present in the isolated catalogue:
dar -C
CAT_full -A full -z --delta
sig
Note that if the archive of
reference does not hold any delta
signature, the previous command will lead dar to compute on-fly delta
signature of saved files while performing catalogue isolation. You can
thus chose not to include delta signature inside full backup while
still being able to let dar use binary delta. However as dar cannot
compute delta signature without data, file that have been recorded as
unchanged since the archive of reference was made cannot have their
delta signature computed at isolation time. Same point if a file is
stored as a delta patch without delta signature associated with it, dar
will not be able to add a delta signature at isolation time for that
file.
Yes, this is as simple as
adding --delta sig
to what you were used to do before. The resulting isolated catalogue
will be much larger than without delta signatures but still much
smaller than the full backup itself. The incremental or differential
backup can then be done the same as before but using CAT_full in place
of full:
dar -c diff1 -A CAT_full -R / -z
dar -c incr1 -A CAT_full -R / -z --delta sig Case of archive merging: You may need to merge two
archive or make a subset of a single archive or even a mix of these two
operations, which is available using the --merge operation for a long
time now. Here too if you want to keep the delta signatures that could
be present in the source archives you will have to use --delta sig option:
dar --merge
merged_backup -A archive1 -@archive2 -z --delta sig
Case of restoration: No special
option has to be provided at restoration time. Dar will figure out by
itself whether the data for a file is a plain data and can replace the whole
current data when overwriting is allowed or is a delta patch
that has to be applied to the existing file lying on filesystem. Before
patching the file dar will calculate and check its CRC. if the CRC is
the expected one, the file will be patched else a warning will be
issued and the file will not be modified at all.
The point with restoration is to *always* restore all previous backups in order from the full backup to the latest incremental one (or the full backup and the latest differential one), for dar be able to apply stored patches. Else restoration can fail for some or all files. Dar_manager can be of great help here as it will know which archive to skip and which not to skip in order to restore a particular set of files. Performing binary difference only for some files and normal backup for others: You can exclude files from
delta difference operation by
avoiding creating a delta signature for them in the archive of
reference, using the option --exclude-delta-sig.
You can also include only some files for delta signatures by use of --include-delta-sig
option. Of course as with other masks-related options like -I, -X, -U,
-u, -Z, -Y, ... it is possible to combine them to have an even greater
and more accurate definition of files for which you want to have delta
signature being built for.
dar -c full -R / -z --delta sig --include-delta-sig "*.opt" --include-delta-sig "*.pst" --exclude-delta-sig "home/joe/*" Independently of this filtering
mechanism based on path+filename, delta signature is never calculated
for files smaller than 10 kio because it does not
worse performing delta difference for them. You can change that
behavior using the option --delta-sig-min-size
<size in byte>
dar -c full -R / -z --delta sig --delta-sig-min-size 20k Archive listing: Archive
listing received adhoc addition to show which file have delta signature
and which one have been saved as delta patch. The [Data ] column shows [Delta] in
place of [Saved]
when a delta patch is used, and a new column entitled[D] shows [D] when a
delta signature is present for that file.
See man page about --delta
related options for even more details.
Comparing the different way to perform remote backup Since release 2.6.0 dar can directly use ftp or sftp to operate remotely. This new feature has sometime some advantage over the methods descibed above with ssh or netcat sometimes it has not, the objective here is to clarify this situation.
Multi recipient signed archive weakness As described in the usage notes it is possible to encrypt an archive that can be readable by several recipients using their own gnupg private key. So far, so good!
It is also possible to embed your gnupg signature within such archive
for your recipient to have a proof the archive comes from you. But
there is a known weakness in this signing approach as implemented in
libdar, weakness that could be exploited by an expert to fake your
signature with a different archive.
Well, if this type of attack should be accessible by an expert guy with some constraints, it can only take place between a set of friends! Exchanging secret data within a group implicitely means having a certain trust level of the members of that group for this secret data not to become public, this is in that sense I mean "friends". So if you do not fully trust well one person in a group and want to share data by mean of signed/gnupg encrypted dar archive you have several options:
dar -c my_secret_group_stuff -z -K gnupg:recipents1@group.group,recipient2@group.group -R /home/secret --hash sha512 my_secret_group_stuff.1.dar |